I am testing a scenario with 400 threads. Although I am almost getting no errors, I have very high average response. What can bring about this problem? Seems like server gives no time-out but gives response so late. I've addded the summary report. It is as follows:
This table doesn't tell the full story, if response time seems "so high" to you - this is definitely the bottleneck and you can report it already.
What you can do to localize the problem is:
Consider using a longer ramp-up period, i.e. start with 1 user and add 1 more user every 5 seconds (adjust these numbers according to your scenario) so you would have arrival phase, the "plateau" and the load decrease phase. This approach will allow you to correlate increasing load and increasing response time by looking at Active Threads Over Time and Response Times Over Time charts. This way you will be able to state that:
response time remains the same up to X concurrent users
after X concurrent users it starts growing so throughput is going down
after Z concurrent users response time exceeds acceptable threshold
It would also be good to see CPU, RAM, etc. usage on the server side as increased response time might be due to lack of resources, you can use JMeter PerfMon Plugin for this
Inspect your server configuration as you might need to tune it for high loads (same applies to JMeter, make sure to follow JMeter Best Practices)
Use a profiler tool on server side during the next test execution, it will show you the slowest places in your application code
Related
I am running my jmeter script for almost a week and observed an interesting thing today. Below is the scenario:
Overview: I am gradually increasing the load on the application. In my last test I gave load of 100 users on the app and today I increased the load to 150 users.
Result of 150 users test:
Response time of the requests decreased compared to the last test. (Which is a good sign)
Throughput decreased drastically to half of what I got in the previous test with less load.
Received 225 errors while executing the test.
My questions are:
What could be the possible reason for such strange behavior of throughput? Why did throughput decrease instead of increasing with the increasing load?
Did I get good response time as many of my requests failed?
NOTE: Till 100 users test throughput was increasing with the increasing load of users.
Can anyone please help me with this question. I am a new bee in performance testing. Thanks in Advance!!
Also, would like to request if anyone can suggest good articles/site etc on finding performance bottleneck and learning crucial things in performance.
Most probably these 225 requests which failed returned failure immediately therefore average response time decreased, that's why you should be looking into i.e. Response Times Over Time chart and pay more attention to percentiles as mean response time can mask the real problem.
With regards to the bottleneck discovery, make sure to collect as much information from the server side as you can, i.e.
CPU, RAM, Network, Disk usage from JMeter PerfMon Plugin
Slow queries log from the database
"heaviest" functions and largest objects from the profiling tool for your application
I'm using jmeter to generate a performance test, to keep things short and straight i read the initial data from a json file, i have a single thread group in which after reading the data i randomize certain values to prevent data duplication when i need it, then i'm passing the final data to the endpoint using variables, this will end up in a json body that is recieved by the endpoint and it will basically generate a new transaction in the database. Also i added a constant timer to add a 7 seconds delay between requests, with a test duration of 10 minutes and no ramp up, i calculated the requests per second like this:
1 minute has 60 seconds and i have a delay of 7 seconds per request then it's logical to say that every minute i'm sending approximately 8.5 requests per minute, this is my calculation (60/7) = 8.5 now if the test lasts for 10 minutes then i multiply (8.5*10) = 85 giving me a total of 85 transactions in 10 minutes, so i should be able to see that exact same amount of transactions created in the database after the test completes.
This is true when i'm running 10-20-40 users, after the load test run i query the db and i get the exact same number of transaction however, as i increase the users in the thread group this doesn't happen anymore, for example if i set 1000 users i should be able to generate 8500 transactions in 10 minutes, but this is not the case, the db only creates around 5.1k transactions.
What is happening, what is wrong? Why it initially works as expected and as i increase the users it doesn't? I can provide more information if needed. Please help.
There could be 2 possible reasons for this:
You discovered your application bottleneck. When you add more users the application response time increases therefore throughput decreases. There is a term called saturation point which stands for the maximum performance of the system, if you go beyond this point - the system will respond slower and you will get less TPS than initially. From the application under test side you should take a look into the following areas:
It might be the case your application simply lacks resources (CPU, RAM, Network, etc.), make sure that it has enough headroom to operate using i.e. JMeter PerfMon Plugin
Your application middleware (application server, database, load balancer, etc.) are not properly set up for the high loads. Identify your application infrastructure stack and make sure to follow performance tuning guidelines for each component
It is also possible that your application code needs optimization, you can detect the most time/resource consuming functions, largest objects, slowest DB queries, idle times, etc. using profiling tools
JMeter is not sending requests fast enough
Just like for the application under test check that JMeter machine(s) have enough resources (CPU, RAM, etc.)
Make sure to follow JMeter Best Practices
Consider going for Distributed Testing
Can you please check once CPU and Memory utilization(RAM and java heap utilization) of jmeter load generator while running jemter for 1000 users? If it is higher or reaching to max then it may affect requests/sec. Also just to confirm requests/sec from Jmeter side, can you please add listener in Jmeter script to track Hit/sec or TPS?
This will also be true(8.5K requests in 10 mins test duration) if your API response time is 1 second and also you have provided enough ramp-up time for those 1000 users.
So possible reason is:
You did not provide enough ramp-up time for 1000 users.
Your API average response time is more than 1 second while you performing tests for 1000 users.
Possible workarounds:
First, try to measure the API response time for 1 user.
Then calculate accordingly that how many users you need to reach 8500 requests in 10 mins. Use this formula:
TPS* max response time in second
Give proper ramp-up time for 1000 users. Check this thread to understand how you should calculate ramp-up time.
Check that your load generator is able to generate 1000 users without any memory or health (i.e CPU usage) issues. If requires, try to use distributed architecture.
Do we need to adjust Throughput given by jmeter, to find out the actual tps of the system
For eg : I am getting 100 tps for concurrent 250 users. This ran for 10 hrs. Can I come to a conclusion like my software can handle 100 transactions per second. Or else do I need to do some adjustment and need to get a value. Why i am asking this because when load started, system will take sometime to perform in adequate level (warm up time). If so how to do this. Please help me to understand this.
By default JMeter sends requests as fast as it can, the main factor which are affecting TPS rate are:
number of threads (virtual users) - this you can define in Thread Group
your application response time - this is not something you can control
Ideally when you increase number of threads the number of TPS should increase by the same factor, i.e. if you have 250 users and getting 100 tps you should get 200 tps for 500 users. If this is not the case - these 500 users are beyond saturation point and your application bottleneck is somewhere between 250 and 500 users (if not earlier).
With regards to "warm up" time - the recommended approach of conducting the load is doing it gradually, this way you will allow your application to get prepared to increasing load, warm up caches, let JIT compiler/optimizer to go their work, etc. Moreover this way you will be able to correlate the increasing load with increasing/decreasing throughput, response time, number of errors, etc. while having 250 users released at once doesn't tell the full story. See
The system warmup period varies from one system to the other. Warm up period is where configurations are cached, different libraries are initialized (eg. Builder.init()) and other initial functions that usually don't happen for subsequent calls. If you study results of the load test, there is a slow period at the very beginning. For most systems, it could be as small as 5 to 10 minutes. These values could be even negligible if the test is as long as 10 hours. But then again, average calculation can be effected if the results give extremely low values at the start (it always depend on the jump from initial warming up period to normal operations).
As per jmeter configurations this thread may explain the configuration. How to exclude warmup time from JMeter summary?
In Starting of script sample time is less and then it starts increasing as the load increasing, is it the correct way to do load testing for website?
Please help, which is the correct way to do load testing for website
Not really, in ideal world response time should remain the same as the load increases like:
1 user - response time 1 second - throughput 1 request per second
100 users - response time 1 second - throughput 100 requests per second
200 users - response time 1 second - throughput 200 requests per second
etc.
The situation when response time doesn't start increasing is called saturation point - it is the maximum throughput your application can support.
The situation when response time starts increasing as you start more threads (virtual users) is known as the bottleneck and the question is: whether performance is still acceptable for that number of users that is defined in NFR and/or SLA. If yes - you're good to go, if not - you need to report this issue (it would be beneficial if you could try to determine reason for this)
The correct way of load testing the website is simulating end users activity as close as possible including workload model. Remember to increase the load gradually, this way you will be able to correlate increasing load with metrics like response time, throughput, number of errors. It is also good to decrease the load gradually as well to see whether your website recovers when the load gets back to normal/zero.
What is the best practice for Max Wait (ms) value in JDBC Connection Configuration?
JDBC
I am executing 2 types of tests:
20 loops for each number of threads - to get max Throupught
30min runtime for each number of Threads - to get Response time
With Max Wait = 10000ms I can execute JDBC request with 10,20,30,40,60 and 80 Threads without an error. With Max Wait = 20000ms I can go higher and execute with 100, 120, 140 Threads without an error. It seems to be logical behaviour.
Now question.
Can I increase Max Wait value as desired? Is it correct way how to get more test results?
Should I stop testing and do not increase number of Threads if any error occur in some Report? I got e.g. 0.06% errors from 10000 samples. Is this stop for my testing?
Thanks.
Everything depends on what your requirements are and how you defined performance baseline.
Can I increase Max Wait value as desired? Is it correct way how to get more test results?
If you are OK with higher response times and the functionality should be working, then you can keep max time as much as you want. But, practically, there will be the threshold to response times (like, 2 seconds to perform a login transaction), which you define as part of your performance SLA or performance baseline. So, though you are making your requests successful by increasing max time, eventually it is considered as failed request due to high response time (by crossing threshold values)
Note: Higher response times for DB operations eventually results in higher response times for web applications (or end users)
Should I stop testing and do not increase number of Threads if any error occur in some Report?
Same applies to error rates as well. If SLA says, some % error rate is agreed, then you can consider that the test is meeting SLA or performance baseline if the actual error rate is less that that. eg: If requirements says 0% error rate, then 0.1% is also considered as failed.
Is this stop for my testing?
You can stop the test at whatever the point you want. It is completely based on what metrics you want to capture. From my knowledge, It is suggested to continue the test, till it reaches a point where there is no point in continuing the test, like error rate reached 99% etc. If you are getting error rate as 0.6%, then I suggest to continue with the test, to know the breaking point of the system like server crash, response times reached to unacceptable values, memory issues etc.
Following are some good references:
https://www.nngroup.com/articles/response-times-3-important-limits/
http://calendar.perfplanet.com/2011/how-response-times-impact-business/
difference between baseline and benchmark in performance of an application
https://msdn.microsoft.com/en-us/library/ms190943.aspx
https://msdn.microsoft.com/en-us/library/bb924375.aspx
http://searchitchannel.techtarget.com/definition/service-level-agreement
This setting maps to DBCP -> BasicDataSource -> maxWaitMillis parameter, according to the documentation:
The maximum number of milliseconds that the pool will wait (when there are no available connections) for a connection to be returned before throwing an exception, or -1 to wait indefinitely
It should match the relevant setting of your application database configuration. If your goal is to determine the maximum performance - just put -1 there and the timeout will be disabled.
In regards to Is this stop for my testing? - it depends on multiple factors like what application is doing, what you are trying to achieve and what type of testing is being conducted. If you test database which orchestrates nuclear plant operation than zero error threshold is the only acceptable. And if this is a picture gallery of cats, this error level can be considered acceptable.
In majority of cases performance testing is divided into several test executions like:
Load Testing - putting the system under anticipated load to see if it capable to handle forecasted amount of users
Soak Testing - basically the same as Load Testing but keeping the load for a prolonged duration. This allows to detect e.g. memory leaks
Stress testing - determining boundaries of the application, saturation points, bottlenecks, etc. Starting from zero load and gradually increasing it until it breaks mentioning the maximum amount of users, correlation of other metrics like Response Time, Throughput, Error Rate, etc. with the increasing amount of users, checking whether application recovers when load gets back to normal, etc.
See Why ‘Normal’ Load Testing Isn’t Enough article for above testing types described in details.