I am currently doing a load test to simulate concurrency on apache on an internal network. Below is the response time that i am getting based on 10/50/100/200/500/1000 people. My first question is, how do I deduce whether this load is too much or too little. and latter:
Attached below is the error rate
a) It seems to me that when the error rate hits 100%, the response time will fluctuate between 30 - 40 ms even for other tests.
b) And when the error rate is higher for apache, the response time seem to be faster.
Could someone shed some light on why will this is so for a) when the error rate hits 100%, why the response time will fluactuate at 30/40ms and b) why will the response time decrease when the error rate increases.
Thanks for taking your time in this matter.
I can't help with (a). However, (b) is fairly common in load testing - particularly for apps where the errors happen early in the request service cycle and generating an error message requires much less work than generating a correct response to the request. BUT, I don't see evidence of that in your results. The response time is increasing as the user load increase, as does the error rate.
Have you increased the connection limits in Apache? It's pretty well know that by default, Apache is not configured to handle a large number of simultaneous connections. IIRC, Nginx is. At the load levels your results indicate, this could be impacting your results. Also, is your testing tool using persistent connections?
On rule of thumb to tell whether the load is too much is check if there are errors or response time is too long. In your case, you have non-trivial amount of errors at 50 people (VUser), this means something is wrong. You may want to investigate before increasing the people/VUser.
#CMerrill's answer for b) sounds right. When there is error, the processing load on server may be less (in most cases). For a), the fluctuation of response time around 30ms to 40ms sounds normal. The key issue is to investigate on the errors.
Related
I am testing a scenario with 400 threads. Although I am almost getting no errors, I have very high average response. What can bring about this problem? Seems like server gives no time-out but gives response so late. I've addded the summary report. It is as follows:
This table doesn't tell the full story, if response time seems "so high" to you - this is definitely the bottleneck and you can report it already.
What you can do to localize the problem is:
Consider using a longer ramp-up period, i.e. start with 1 user and add 1 more user every 5 seconds (adjust these numbers according to your scenario) so you would have arrival phase, the "plateau" and the load decrease phase. This approach will allow you to correlate increasing load and increasing response time by looking at Active Threads Over Time and Response Times Over Time charts. This way you will be able to state that:
response time remains the same up to X concurrent users
after X concurrent users it starts growing so throughput is going down
after Z concurrent users response time exceeds acceptable threshold
It would also be good to see CPU, RAM, etc. usage on the server side as increased response time might be due to lack of resources, you can use JMeter PerfMon Plugin for this
Inspect your server configuration as you might need to tune it for high loads (same applies to JMeter, make sure to follow JMeter Best Practices)
Use a profiler tool on server side during the next test execution, it will show you the slowest places in your application code
I am running my jmeter script for almost a week and observed an interesting thing today. Below is the scenario:
Overview: I am gradually increasing the load on the application. In my last test I gave load of 100 users on the app and today I increased the load to 150 users.
Result of 150 users test:
Response time of the requests decreased compared to the last test. (Which is a good sign)
Throughput decreased drastically to half of what I got in the previous test with less load.
Received 225 errors while executing the test.
My questions are:
What could be the possible reason for such strange behavior of throughput? Why did throughput decrease instead of increasing with the increasing load?
Did I get good response time as many of my requests failed?
NOTE: Till 100 users test throughput was increasing with the increasing load of users.
Can anyone please help me with this question. I am a new bee in performance testing. Thanks in Advance!!
Also, would like to request if anyone can suggest good articles/site etc on finding performance bottleneck and learning crucial things in performance.
Most probably these 225 requests which failed returned failure immediately therefore average response time decreased, that's why you should be looking into i.e. Response Times Over Time chart and pay more attention to percentiles as mean response time can mask the real problem.
With regards to the bottleneck discovery, make sure to collect as much information from the server side as you can, i.e.
CPU, RAM, Network, Disk usage from JMeter PerfMon Plugin
Slow queries log from the database
"heaviest" functions and largest objects from the profiling tool for your application
What could be the possible reason for throughput remaining the same though load has increased much compared to previous tests? NOTE: I even received the error "Internal Server Error" while running my performance test.
It means you have reached the Saturation Point - the point for maximum performance!
A certain amount of concurrent users adjoining with maximum CPU utilization and peak throughput. Adding any more concurrent users will lead to degradation of response time and throughput, and will cause peak CPU utilization. Also, it can throw some errors!
After that, If you continue increasing the number of virtual users you may see these:
Response time is increasing.
Some of your requests got failed.
Throughput is either remains the same or decreases - this indicates the performance bottleneck!
Ideal load test in ideal world looks like:
response time remains the same, not matter how many virtual users are hitting the server
throughput increases by the same factor as the load increases, i.e:
100 virtual users - 500 requests/second
200 virtual users - 1000 requests/second
etc.
In reality application under test might scale up to certain extent, but finally you will reach the point when you will be increasing the load but the response time will be going up and throughput will remain the same (or going down)
Looking into HTTP Status Code 500
The HyperText Transfer Protocol (HTTP) 500 Internal Server Error server error response code indicates that the server encountered an unexpected condition that prevented it from fulfilling the request.
Most probably it indicates the fact that the application under test is overloaded, the next step would be to find out the reason which may be in:
Application is not properly configured for the high loads (including all the middleware: application server, database, load balancer, etc.)
Application lacks resources (CPU, RAM, etc.), can be checked using i.e. JMeter PerfMon Plugin
Application uses inefficient functions/algorithms, can be checked using profiling tools
What is the best practice for Max Wait (ms) value in JDBC Connection Configuration?
JDBC
I am executing 2 types of tests:
20 loops for each number of threads - to get max Throupught
30min runtime for each number of Threads - to get Response time
With Max Wait = 10000ms I can execute JDBC request with 10,20,30,40,60 and 80 Threads without an error. With Max Wait = 20000ms I can go higher and execute with 100, 120, 140 Threads without an error. It seems to be logical behaviour.
Now question.
Can I increase Max Wait value as desired? Is it correct way how to get more test results?
Should I stop testing and do not increase number of Threads if any error occur in some Report? I got e.g. 0.06% errors from 10000 samples. Is this stop for my testing?
Thanks.
Everything depends on what your requirements are and how you defined performance baseline.
Can I increase Max Wait value as desired? Is it correct way how to get more test results?
If you are OK with higher response times and the functionality should be working, then you can keep max time as much as you want. But, practically, there will be the threshold to response times (like, 2 seconds to perform a login transaction), which you define as part of your performance SLA or performance baseline. So, though you are making your requests successful by increasing max time, eventually it is considered as failed request due to high response time (by crossing threshold values)
Note: Higher response times for DB operations eventually results in higher response times for web applications (or end users)
Should I stop testing and do not increase number of Threads if any error occur in some Report?
Same applies to error rates as well. If SLA says, some % error rate is agreed, then you can consider that the test is meeting SLA or performance baseline if the actual error rate is less that that. eg: If requirements says 0% error rate, then 0.1% is also considered as failed.
Is this stop for my testing?
You can stop the test at whatever the point you want. It is completely based on what metrics you want to capture. From my knowledge, It is suggested to continue the test, till it reaches a point where there is no point in continuing the test, like error rate reached 99% etc. If you are getting error rate as 0.6%, then I suggest to continue with the test, to know the breaking point of the system like server crash, response times reached to unacceptable values, memory issues etc.
Following are some good references:
https://www.nngroup.com/articles/response-times-3-important-limits/
http://calendar.perfplanet.com/2011/how-response-times-impact-business/
difference between baseline and benchmark in performance of an application
https://msdn.microsoft.com/en-us/library/ms190943.aspx
https://msdn.microsoft.com/en-us/library/bb924375.aspx
http://searchitchannel.techtarget.com/definition/service-level-agreement
This setting maps to DBCP -> BasicDataSource -> maxWaitMillis parameter, according to the documentation:
The maximum number of milliseconds that the pool will wait (when there are no available connections) for a connection to be returned before throwing an exception, or -1 to wait indefinitely
It should match the relevant setting of your application database configuration. If your goal is to determine the maximum performance - just put -1 there and the timeout will be disabled.
In regards to Is this stop for my testing? - it depends on multiple factors like what application is doing, what you are trying to achieve and what type of testing is being conducted. If you test database which orchestrates nuclear plant operation than zero error threshold is the only acceptable. And if this is a picture gallery of cats, this error level can be considered acceptable.
In majority of cases performance testing is divided into several test executions like:
Load Testing - putting the system under anticipated load to see if it capable to handle forecasted amount of users
Soak Testing - basically the same as Load Testing but keeping the load for a prolonged duration. This allows to detect e.g. memory leaks
Stress testing - determining boundaries of the application, saturation points, bottlenecks, etc. Starting from zero load and gradually increasing it until it breaks mentioning the maximum amount of users, correlation of other metrics like Response Time, Throughput, Error Rate, etc. with the increasing amount of users, checking whether application recovers when load gets back to normal, etc.
See Why ‘Normal’ Load Testing Isn’t Enough article for above testing types described in details.
I use Visual Studio Team System 2008 Team Suite for load testing of my Web-application (it uses ASP.MVC technology).
Load pattern:Constant (this means I have constant amount of virtual users all the time).
I specify coniguratiton of 1000 users to analyze perfomance of my Web-application in really stress conditions.I run the same load test multiple times while making some changes in my application.
But while analyzing load test results I come to a strange dependency: when average page response time becomes larger,the requests per second value increases too!And vice versa:when average page response time is less,requests per second value is less.This situation does not reproduce when the amount of users is small (5-50 users).
How can you explain such results?
Perhaps there is a misunderstanding on the term Requests/Sec here. Requests/Sec as per my understanding is just a representation of how any number of requests that the test is pushing into the application (not the number of requests completed per second).
If you look at it that way. This might make sense.
High Requests/Sec will cause higher Avg. Response Time (due to bottleneck somewhere, i.e. CPU bound, memory bound or IO bound).
So as your Requests/Sec goes up, and you have tons of object in memory, the memory is under pressure, thus triggering the Garbage Collection that will slow down your Response time.
Or as your Requests/Sec goes up, and your CPU got hammered, you might have to wait for CPU time, thus making your Response Time higher.
Or as your Request/Sec goes up, your SQL is not tuned properly, and blocking and deadlocking occurs, thus making your Response Time higher.
These are just examples of why you might see these correlation. You might have to track it down some more in term of CPU, Memory usage and IO (network, disk, SQL, etc.)
A few more details about the problem: we are load testing our rendering engine [NDjango][1] against the standard ASP.NET aspx. The web app we are using to load test is very basic - it consists of 2 static pages - no database, no heavy processing, just rendering. What we see is that in terms of avg response time aspx as expected is considerably faster, but to my surprise the number of requests per second as well as total number of requests for the duration of the test is much lower.
Leaving aside what we are testing against what, I agree with Jimmy, that higher request rate can clog the server in many ways. But it is my understanding that this would cause the response time to go up - right?
If the numbers we are getting really reflect what's happening on the server, I do not see how this rule can be broken. So for now the only explanation I have is that the numbers are skewed - something is wrong with the way we are configuring the tool.
[1]: http://www.ndjango.org NDjango
This is a normal result as the number of users increases you will load the server with higher numbers of requests per second. Any server will take longer to deal with more requests per second, meaning the average page response time increases.
Requests per second is a measure of the load being applied to the application and average page response time is a measure of the applications performance where high number=slow response.
You will be better off using a stepped number of users or a warmup period where the load is applied gradually to the server.
Also, with 1000 virtual users on a single test machine, the CPU of the test machine will be absolutely maxed out. That will most likely be the thing that is skewing the results of your testing. Playing with the number of virtual users you will find that there will be a point where the requests per second are maxed out. Adding or taking away virtual users will result in less requests per second from the app.