Concurrent users : 2000
Ramp up period:10 sec
loop :1
The summary report for above is:
Average :2000
min :4
max :19964
Error:17.50% // this error are Java.net.SocketTimeoutException
Through put:67.2 sec
From this result can I infer that my server cannot handle load of 2000 users ?
Assuming 20 seconds response time and 20% error rate my expectation is that your application under test is overloaded. Personally I would go for the following steps:
Re-run the test using larger ramp-up time, generate Reporting Dashboard after the run and correlate increasing load with the other KPIs like:
what was the maximum throughput (how many requests per second could system server) and what was amount of active users at this time
what was the number of users when error start occurring
what is relationship between amount of users and response time
etc.
The next step would be determining the bottleneck i.e. why application responds slowly or not responds at all. The reasons could be in:
Application simply lacks resources (CPU, RAM, network or disk IO or starts swapping). A good practices is monitoring the system health i.e. with JMeter PerfMon Plugin
Configuration of software (application/web server, database, load balancer, etc.) is not appropriate for high loads. Each of software components needs to be appropriately tuned for high loads
Your application code is not optimal, you can re-run your test with profiler tool telemetry to see what are the largest objects, the slowest functions, etc.
It can be something absolutely external like faulty router, bad network cable, using WiFi instead of LAN, hitting corporate proxy limits, etc.
Related
Below is the graph which I received after the performance test execution.
I am confused about the fluctuated response time graph.
NOTE: 1) Throughput graph is also fluctuating. 2) I did not receive any error during test.
It normally indicates that either application under test or JMeter engine is overloaded hence it cannot handle/produce stable load pattern.
Your response time is around 1.5 minutes which seems little bit high to me so I would suggest that you need to monitor the application under test and check:
whether it has enough headroom to operate in terms of CPU, RAM, Network IO, etc. as it might be the case the application is short on RAM and goes swapping and disk IO is much slower than RAM, it can be checked using i.e. JMeter PerfMon Plugin
whether it is properly configured for high loads as its middleware (database, application server, load balancer, etc. need to be tuned, spike-like response time pattern may stand for intensive GC activity
in any case ensure that JMeter is also properly configured for high load and isn't short on resources as if JMeter isn't able to send/receive requests fast enough you will get false-negative results
Single chart never tells the full story, you need to correlate information from all the possible sources, collect log files, etc.
-
What could be the possible reason for throughput remaining the same though load has increased much compared to previous tests? NOTE: I even received the error "Internal Server Error" while running my performance test.
It means you have reached the Saturation Point - the point for maximum performance!
A certain amount of concurrent users adjoining with maximum CPU utilization and peak throughput. Adding any more concurrent users will lead to degradation of response time and throughput, and will cause peak CPU utilization. Also, it can throw some errors!
After that, If you continue increasing the number of virtual users you may see these:
Response time is increasing.
Some of your requests got failed.
Throughput is either remains the same or decreases - this indicates the performance bottleneck!
Ideal load test in ideal world looks like:
response time remains the same, not matter how many virtual users are hitting the server
throughput increases by the same factor as the load increases, i.e:
100 virtual users - 500 requests/second
200 virtual users - 1000 requests/second
etc.
In reality application under test might scale up to certain extent, but finally you will reach the point when you will be increasing the load but the response time will be going up and throughput will remain the same (or going down)
Looking into HTTP Status Code 500
The HyperText Transfer Protocol (HTTP) 500 Internal Server Error server error response code indicates that the server encountered an unexpected condition that prevented it from fulfilling the request.
Most probably it indicates the fact that the application under test is overloaded, the next step would be to find out the reason which may be in:
Application is not properly configured for the high loads (including all the middleware: application server, database, load balancer, etc.)
Application lacks resources (CPU, RAM, etc.), can be checked using i.e. JMeter PerfMon Plugin
Application uses inefficient functions/algorithms, can be checked using profiling tools
I'm using jmeter to generate a performance test, to keep things short and straight i read the initial data from a json file, i have a single thread group in which after reading the data i randomize certain values to prevent data duplication when i need it, then i'm passing the final data to the endpoint using variables, this will end up in a json body that is recieved by the endpoint and it will basically generate a new transaction in the database. Also i added a constant timer to add a 7 seconds delay between requests, with a test duration of 10 minutes and no ramp up, i calculated the requests per second like this:
1 minute has 60 seconds and i have a delay of 7 seconds per request then it's logical to say that every minute i'm sending approximately 8.5 requests per minute, this is my calculation (60/7) = 8.5 now if the test lasts for 10 minutes then i multiply (8.5*10) = 85 giving me a total of 85 transactions in 10 minutes, so i should be able to see that exact same amount of transactions created in the database after the test completes.
This is true when i'm running 10-20-40 users, after the load test run i query the db and i get the exact same number of transaction however, as i increase the users in the thread group this doesn't happen anymore, for example if i set 1000 users i should be able to generate 8500 transactions in 10 minutes, but this is not the case, the db only creates around 5.1k transactions.
What is happening, what is wrong? Why it initially works as expected and as i increase the users it doesn't? I can provide more information if needed. Please help.
There could be 2 possible reasons for this:
You discovered your application bottleneck. When you add more users the application response time increases therefore throughput decreases. There is a term called saturation point which stands for the maximum performance of the system, if you go beyond this point - the system will respond slower and you will get less TPS than initially. From the application under test side you should take a look into the following areas:
It might be the case your application simply lacks resources (CPU, RAM, Network, etc.), make sure that it has enough headroom to operate using i.e. JMeter PerfMon Plugin
Your application middleware (application server, database, load balancer, etc.) are not properly set up for the high loads. Identify your application infrastructure stack and make sure to follow performance tuning guidelines for each component
It is also possible that your application code needs optimization, you can detect the most time/resource consuming functions, largest objects, slowest DB queries, idle times, etc. using profiling tools
JMeter is not sending requests fast enough
Just like for the application under test check that JMeter machine(s) have enough resources (CPU, RAM, etc.)
Make sure to follow JMeter Best Practices
Consider going for Distributed Testing
Can you please check once CPU and Memory utilization(RAM and java heap utilization) of jmeter load generator while running jemter for 1000 users? If it is higher or reaching to max then it may affect requests/sec. Also just to confirm requests/sec from Jmeter side, can you please add listener in Jmeter script to track Hit/sec or TPS?
This will also be true(8.5K requests in 10 mins test duration) if your API response time is 1 second and also you have provided enough ramp-up time for those 1000 users.
So possible reason is:
You did not provide enough ramp-up time for 1000 users.
Your API average response time is more than 1 second while you performing tests for 1000 users.
Possible workarounds:
First, try to measure the API response time for 1 user.
Then calculate accordingly that how many users you need to reach 8500 requests in 10 mins. Use this formula:
TPS* max response time in second
Give proper ramp-up time for 1000 users. Check this thread to understand how you should calculate ramp-up time.
Check that your load generator is able to generate 1000 users without any memory or health (i.e CPU usage) issues. If requires, try to use distributed architecture.
Initially i have ran a load test with 100 users for 10 minutes and 1000 records got inserted in the database for the below scenarios.
Employee Creation -- Test script design took 1 minute
Employee Update -- Test script design took 2 minutes
And then I ran the same load test with 200 users for 10 minutes and 1100 records got inserted without any error logs or deadlocks.
My question is when we increase/double the thread group count from 100 to 200, Records insertion also should be double or approximately double. then why is it not happening? Same case with the number requests/samples.
You reached a maximum in your test throughput at about 110 records per min. In other words, you have a bottleneck on client or server, which doesn't allow 200 users to process request concurrently and/or within the same amount of time (either some users wait until they can start processing a request, or each request takes longer, so total number of requests is lower).
Some bottlenecks can be resolved by you (if they are related to script, JMeter configuration or JMeter machine), others have to be resolved on server side (by whoever has access to it), and some cannot be resolved at all (they are true bottlenecks of your app).
Without knowing your application, it's hard to suggest anything beyond general "checklist" items:
Verify JMeter script and check if it has any places where it may wait, take a long time, and so on. For example if your ramp-up period is too high, it may be that "first" user will finish execution, before "last" user even started it. Scriptable samplers, pre- and post-processors may cause delays as well.
Make sure JMeter is configured properly to handle 200 concurrent threads. For example if JMeter heap is set too low, it could be that JMeter is very slow, as it constantly needs to run GC. See this question for how to look at and configure memory (it discusses out of memory error, but even without that error inadequate memory can cause slowness)
Make sure JMeter machine is configured correctly to allow creation of 200+ HTTP connections concurrently. A common issue on both Windows and Linux machine is that people assume that they can have 65535 connections (as maximal number of ports), but in reality, both Windows and Linux limit number of ports they allow by default to be used. Also after the use port may remain in TIME_WAIT or CLOSE_WAIT state for several minutes, which makes it unusable. As a result, running out of ports is quite common. Here's how to monitor and resolve this issue on Windows and Linux.
Check JMeter machine performance as a whole: does it have enough CPU, memory; is it swapping memory, etc.
If none of the above is a problem, you need to look at how requests arrive to the server. If client is capable of sending 200 concurrent requests (which you should have established in previous steps), but server receives them at slower rate, then maybe something in the network slows things down. For example something like slow DNS resolution or slow routing between JMeter and server can cause issues.
Also Item #3 on the client is also applicable to the server.
If requests do arrive to the server at the same speed as they are sent from the client, then probably their processing by the server slows down as number of parallel requests goes up. This is where you are on dev and devOP territory, and probably need to work with them to identify bottlenecks on server side. It could be configuration of the web or application server, application itself, ... anything on app way pretty much.
Performance testing is 10% execution, and 90% analysis and identification of bottlenecks, so here you go.
I'm trying to stress-test my Spring RESTful Web Service.
I run my Tomcat server on a Intel Core 2 Duo notebook, 4 GB of RAM. I know it's not a real server machine, but i've only this and it's only for study purpose.
For the test, I run JMeter on a remote machine and communication is through a private WLAN with a central wireless router. I prefer to test this from wireless connection because it would be accessed from mobile clients. With JMeter i run a group of 50 threads, starting one thread per second, then after 50 seconds all threads are running. Each thread sends repeatedly an HTTP request to the server, containing a small JSON object to be processed, and sleeping on each iteration for an amount of time equals to the sum of a 100 milliseconds constant delay and a random value of gaussian distribution with standard deviation of 100 milliseconds. I use some JMeter plugins for graphs.
Here are the results:
I can't figure out why mi hits per seconds doesn't pass the 100 threshold (in the graph they are multiplied per 10), beacuse with this configuration it should have been higher than this value (50 thread sending at least three times would generate 150 hit/sec). I don't get any error message from server, and all seems to work well. I've tried even more and more configurations, but i can't get more than 100 hit/sec.
Why?
[EDIT] Many time I notice a substantial performance degradation from some point on without any visible cause: no error response messages on client, only ok http response messages, and all seems to work well on the server too, but looking at the reports:
As you can notice, something happens between 01:54 and 02:14: hits per sec decreases, and response time increase, okay it could be a server overload, but what about the cpu decreasing? This is not compatible with the congestion hypothesis.
I want to notice that you've chosen very well which rows to display on Composite Graph. It's enough to make some conclusions:
Make note that Hits Per Second perfectly correlates with CPU usage. This means you have "CPU-bound" system, and the maximum performance is mostly limited by CPU. This is very important to remember: server resources spent by Hits, not active users. You may disable your sleep timers at all and still will receive the same 80-90 Hits/s.
The maximum level of CPU is somewhere at 80%, so I assume you run Windows OS (Win7?) on your machine. I used to see that it's impossible to achieve 100% CPU utilization on Windows machine, it just does not allow to spend the last 20%. And if you achieved the maximum, then you see your installation's capacity limit. It just has not enough CPU resources to serve more requests. To fight this bottleneck you should either give more CPU (use another server with higher level CPU hardware), or configure OS to let you use up to 100% (I don't know if it is applicable), or optimize your system (code, OS settings) to spend less CPU to serve single request.
For the second graph I'd suppose something is downloaded via the router, or something happens on JMeter machine. "Something happens" means some task is running. This may be your friend who just wanted to do some "grep error.log", or some scheduled task is running. To pin this down you should look at the router resources and jmeter machine resources at the degradation situation. There must be a process that swallows CPU/DISK/Network.