Azure Performance - Ping Test - Inconsistent values between Availability and Performance - performance

In Azure you can create simple ping test of your app. It's call ping but it is Get request of a url.
By default, the url is your root url.
The thing is the responses times of thes results are in the range of 2 to 10ms. However, I can never reach these response times nor with Fiddler or Postman. My range is more 100 to 400ms. And I'm closer to the datacenter than the computers running the ping tests in Azure.
It is a bit as if the ping tests where not downloading the content page.
Does anyone know?
UPDATE
I have setup my ping test in the Availability section. The responses times I mention above appear in the Performance section. Back in the Availability section, average response time is 1,6 sec. These two sections show inconsistent values.

UPDATED ANSWER:
The Performance section lists how long it took from your server receiving the request to sending something back to the client, it doesn't count network latency at all.
I believe they only check the response status without downloading the content if you don't require a content match.
Below is an example of the configuration for my blog.
If you wish, you can make sure the test downloads the content by ticking the Content match checkbox, and specifying that the content must contain the text that is somewhere near the end of your index page (like in a footer).

Related

Whats the impact of response code 400,503 ? Can we ignore these codes if my primary focus is to measure loading time of web application?

I am testing a web application login page loading time with 300 thread users and ramp up period of 300 secs.Most of my samples return response code 200.But few of them return response code 400,503.
My goal is to just check the performance of the web application if 300 users start using it.
I am new to Jmeter and have basic knowledge of programming.
My Question :-
1.Can i ignore these errors and focus just on timings from the summary report ?
2.If i really need to fix these errors, how to fix it ?
There are 2 different problems indicated by these errors:
HTTP Status 400 stands for Bad Request - it means that you're sending malformed requests which cannot be understood by the server. You should inspect request details and amend JMeter configuration as it is the problem in your script.
HTTP Status 503 stands for Service Unavailable - it indicates the problem on server side, i.e. server is not capable of handling the load you're generating. This is something you can already report as the application issue. You can try to identify the underlying cause by:
looking into your application log files
checking whether your application has enough headroom to operate in terms of CPU, RAM, Network, Disk, etc. It can be done using APM tool or JMeter PerfMon Plugin
re-running your test with profiler tool telemetry to deep dive into what's under the hood of the longest response times
So first of all you should ensure that your test is doing what it is supposed to be doing by running it with 1-2 users/loops and inspecting requests/response details. At this stage you should not be having any errors.
Going forward you should increase the load gradually and correlate the increasing number of virtual users with the increasing response time/number of errors
`
Performance testing is different from load testing. What you are doing is load testing.
Performance testing is more about how quickly an action takes. I typically capture performance on a system not under load for a given action.
This gives a baseline that I can then refer to during load tests.
Hopefully, you’ve been given some performance figures to test. E.g. must be able to handle 300 requests in two minutes.
When moving onto load, I run a series of load tests with increasing number of users/threads and capture the results from each test.
Armed with this, I can see how load degrades performance to the point where errors start to show up. This gives you an idea of how much typical load the system can handle.
I’d also look to run soak tests too. This where I’d run JMeter for a long period with typical (not peak) load to make sure the system can handle sustained load.
In terms of the errors you’re seeing, no I would not ignore them. Assuming your test is calling the same endpoint, it seems safe to say the code is fine, its the infrastructure struggling with the load you’re throwing at it.

What constitutes a true error in JMeter when the server always returns 200?

I am load testing an e-commerce site. Under medium to heavy loads, Jmeter reports an unusually high number of errors, i.e. succes=FALSE in the resulting .csv file which translates to a fairly high % errors in the JMeter dashboard report.
On inspecting the Kibana logs, other than a few WARN, there aren't any errors and the status for all the http requests are 200. In fact, I am able to verify that the test users created in my flow are able to login and view whatever the test was supposed to do.
My question is - how is JMeter determining this error (non-200 ?) when the server side returns nothing but 200? For fewer threads, the errors are 0%.
First I would check if any assertions can fail for any reasons. So disabling any assertions and running most primitive version of the script without any possibility of failure (other than based on sampler response) would be my first step. If it doesn't help...
HTTP Sampler will fail for return code < 200, or return code >= 400, or any Java exception, which will happen when request could not be sent, or response from the server was not received (in full). Most commonly those are timeouts related to various underlying problems:
Lack of available ports on JMeter side. Since you indicated that problem happens only with more JMeter users, it is a possibility. On paper you have 65535 ports, but both Windows and Linux are limiting number of available ports to few thousands by default. Make sure you have enough ports to run the load, and monitor port usage as you ramp threads up.
Even if all ports are available, many ports may be sitting for a few minutes in TIME_WAIT or CLOSE_WAIT state, depending on how your client and server interact and other network issues. So theoretically you may have 65k ports, but practically you don't have enough. In this case it's worth checking why it happens (can point to a bug) and possibly reduce the time ports will be waiting in this *_WAIT state.
You also need to make sure you have enough RAM, JVM heap and CPU: monitor to see if there's a bottleneck that can cause such slowness that breaks the ability of a Sampler to finish its work successfully.
If all of the above is working as expected, then either server or someone between client and server (load balancer, proxy, firewall) is causing this issue.
I would start from the server, and verify that all requests you are sending are received by a server. E.g. if you send 2000 requests, and server receives 1000, which finish successfully, then from server perspective everything is fine, but Jmeter will be at 50% error rate.
Also check your JMeter log definitions and make sure failures and exceptions are written to them properly (they are multi-line messages, and may cause log definitions to break when getting to them, and thus you wont' see them). For example run HTTP Sampler that points to non-available server. See if you can diagnose what returned exception was.
Beyond that - figuring out which part is "lying" to you would be an approach, but hard to offer any more specific advice based on current information

JMeter Load Testing Time Verification

I use JMeter for checking load testing.
I note a time with stopwatch when i check load time personally it was
8.5 seconds
when i run same case with JMeter it gave load time of 2 seconds
There is huge difference between them, How can i verify the actual time?
e.g : if one user taking 9 seconds to load the form while in JMeter it is given load time 2 seconds
Client time is a complex item, as you can see from the clip from the Chrome Developer tools, performance tab, above. There's lots going on at the client which does lead to a difference between the time you see with an HTTP protocol test tool, such as JMETER (and most of the other performance test tools on the planet) and the actual client render.
You can address this Delta in a number of ways:
Run a single GUI Virtual user. Name your timing records such as "Login" and "login_GUI." The delta between the two is your client weight. Make sure to run the GUI virtual user on a dedicated host to avoid resource contention
Run a test with all browsers. This was state of the art in 1995. Because of the resource cost and the skew imposed on trying to figure out the cost of the server response the entire industry shifted to protocol level virtual users. Some are trying to bring back this model as "state of the art." It is not
Ask a performance question earlier, also known as "shift left..." Every developer has these developer tools at their disposal, as does every functional tester. If you find that a client is slow for one user, be curious and use the developer tools to identify, "why?" If you are waiting to multi user performance testing to answer questions related to client weight, then you have waited too long and often will not have the time or resources to change the page architecture in meaningful ways to reduce the client page cost. This is where understanding earlier has tremendous advantages for making changes.
I picked the graphic above deliberately to illustrate the precise challenge you have. Notice, the loading of the components takes less than a tenth of a second. These are the requests that JMETER would be making. But the page takes almost five seconds to "render." Jmeter is not broken, it is working as designed. It is your understanding that needs to change on which tools can be used to pull particular stats for analysis.
You can't compare JMeter load time to browser as is, also because your browser will load JavaScript files and can call JavaScript functions on page load while JMeter doesn't execute JavaScript.
JMeter is not a browser, it works at protocol level. As far as
web-services and remote services are concerned, JMeter looks like a
browser (or rather, multiple browsers); however JMeter does not
perform all the actions supported by browsers. In particular, JMeter
does not execute the Javascript found in HTML pages. Nor does it
render the HTML pages as a browser does (it's possible to view the
response as HTML etc., but the timings are not included in any
samples, and only one sample in one thread is ever displayed at a
time).
Just a side note - you can use plugin to check exact load time in chrome.
Well-behaved JMeter test timing should be equal or similar to real user timing, if there is a 4x times difference - most probably your JMeter configuration is not correct.
Probably the most important. Make sure your HTTP Request samplers are configured to retrieve so called "embedded resources" (images, scripts, styles) which are referenced in the web page
If your application is using AJAX technology make sure you execute AJAX-driven requests as well and add their elapsed time to main sampler using i.e. Transaction Controller.
Make sure you mimic browser's:
Cookies via HTTP Cookie Manager
Headers via HTTP Header Manager
Cache via HTTP Cache Manager
Assuming all above you should be receiving similar to real user experience page load time. See How to make JMeter behave more like a real browser article for more detailed information on the above tips.
In addition to the answers provided by James and user7294900, please find these images to help you understand the reason behind the difference in time given by your stop watch and JMeter.
Below image gives the ideology behind how JMeter provides the time.
Below image gives the ideology behind how you have measured the time with
your stop watch.
Notice that there are additional actions performed by the browser when you are taking the time using your stop watch. This is the reason behind the huge difference in time between JMeter and your stop watch.
In addition to this, ensure that you are using the same test environmental conditions for both the tests (like same network conditions, same LG etc.)
Hope this helps!

Time reported in WILY is very much less compared to Load Runner Time. Why?

I am trying to monitor the time spent on server using WILY Introscope but i observe that the time mentioned in WILY for each of the servers is in the range of 100 to 1000 ms. But the time taken for a page to load in browser is almost 5 seconds.
Why is the tool reporting incorrect value ? how to get the complete time in WILY ?
time mentioned in WILY for each of the servers is in the range of 100
to 1000 ms. But the time taken for a page to load in browser is almost
5 seconds.
Reason is - In Browser, you see all the outgoing traffic from the browser. Ideally, any web page would contain 1 POST request followed by multiple GET requests. POST could be your text/html data while Get could be image, CSS, javascript etc.
Mostly these Get requests would be answered by the Web server and post request would be served by involving app server.
The time reported in WILY is only the time spent on server to serve the POST request. Your GET request calls will not be captured by WILY.
Why is the tool reporting incorrect value ? how to get the complete
time in WILY ?
Tool is not reporting incorrect value. Tool sits on a JVM ideally. So it monitors the activity of the JVM and provides the metrics. That is the expected behavior.
A page is a complex item, requiring parsing of the page contents and then requests to multiple servers/sources. So, your page load time will be made up request time for an individual component, processing time for the page parsing and javascript (depending upon virtual user type), requests for the page components, where they are served from, etc... Compare this to your Wily monitoring, which may only be on one of the tiers involved.
For instance, you may have static components being served from a CDN which has zero visibility in your Wily Model. You might also be looking at your app server when the majority of the time is spent serving static components off of a web server, which is oft ignored from a monitoring perspective. Your page could have third party components which are loading which get counted in the Loadrunner time, but do not get counted in the Wily time.
It all comes down a a question of sampling. It is very common for what you see in your deep diag tool to be a piece of the total page load, or an individual request which makes up a page where there are many more components to be loaded. If you want and even more interesting look then enable the w3c time-taken field in your web HTTP request logs and look to see the cost of every individual request. You can do this in the web layer of your app servers as well. Wily will then provide internal breakdown for those items which are "slow."

IIS 7 request duration monitoring

i am curious if there is a way of monitoring the request duration time on an iis server. Personally I have came up with a solution but it's really resource intensive and that is why i'm asking the question, just to gather more opinions.
My plan is to extract the duration time of each request and send it to graphite so as to have a real time overview of the performance of the webserver. The idea i've came up with is to use poweshell with its webadministration module. And if you run get-item IIS:\AppPools\DefaultAppPool | Get-WebRequest for example you get all the requests on that app pool with a lot of info including the time info.
The thing is that i should have a script which runs every 100 ms to get all requests and that is kinda wasteful. Is there a way to tell iis to put the request duration time(in miliseconds) in the logs? Because then it would be much easier to get the information I need.
I don't know if there is such a feature on IIS, but I've done the same (sending iis page times to graphite) by using a reverse proxy between internet and the iis server, like nginx.
The proxy module from nginx allow you to log on each request the time the backend took to produce the page.
Also, having a proxy like nginx in fron of an IIS could be very helpful if you have to deal with visits with slow connections, nginx will store the reply from backend, drop backend connection and wait until visitor gets all the content. Highly recommended.
In case you go this route, you should use logster (also from etsy guys) or logstash to parse nginx logs each period of time you want (likely every minute).
Seems that there is a feature that logs requests based on a regex, and it's called Advanced Logging Module. You can specify from a number of fields what you want to get loged and it's W3C compliant. In my case i had time take as a filed which can be specified and that was what i was looking for. After that i written a script in powershell which parses the logs and gets the information i need, constructs a metric and sends it to statsd which in term sends it to powershell.
The method i chose for the log parsing was the following: in the script i used get-content comandlet from powershell to gather all the logs in one file(yes iis breaks the logs in multiple files, and i'm guessing the number of logs is dependent on the number of your working processes but i'm not sure). This was the first iteration in a second iteration i gather all the logs in another file and make a diff between the first file and the latter and only the difference gets processed.
I chose this method because it's i thought it wold be better to have the minimum regex processing. The next step is erasing the first file of accumulated logs and moving the second one in pace of the first that was erased and running the script again, so to have always a method of comparison. Also the log rollover is at one hour, after which the logs are erased.

Resources