What is acceptable error rate in load testing? - jmeter

I have done a load test for a web site with JMeter. I have simulated 500 online active users and had 0,7 % error in Aggregate report. I'm getting only errors because of latency exceeding.
Is it acceptable for such amount of users? And what is an acceptable error rate in Aggregate report?

acceptable error rate Depends/varies on the SLA's provided by your customer,developer, stakeholders (people who manages their website in your case)
as a general practice 0.5-1.0% is acceptable in general applications/websites but if its mission critical/some very important application then error rate is of high priority (maybe 0%, that will be provided as first point in things to achieve list by your stakeholders)

One of possible reasons of these errors could be the insufficient memory at your localhost. May be your JMeter script isn't optimized or contains listeners or a lot of assertions. If so, remove listeners and minimize assertions as you can. Follow this performance and tuning tips to decrease memory usage.

Related

How to tell if I'm hitting the limits of Jmeter & my hardware?

I have seen some questions about the limits of jmeter, like "What is the highest number of threads?" and "What are the physical limits of jmeter?". As some answers indicate, there's no specific limit to jmeter, but rather to jmeter configurations used on specific hardware setups. However, folks do indicate there's a limit & give tips on how to optimize.
My question is more basic - "how can I tell if I'm hitting the limits of my client (Jmeter + hardware)?"
I'm not talking about OOM errors (like described in this blog post), which are pretty obvious, but rather if jmeter is lagging. In the aggregate report, I can see throughput, and I could also count number of responses received in csv output & divide by time. Should I just check if that's equal to my desired QPS? Achieving a desired QPS in jmeter generally seems trickier than just blasting the server with users though, and the math from number of users -> QPS seems a bit tricky.
Finally, how can I tell if it's my server lagging or jmeter lagging? I'm wondering if I can test with some simple static webpage first to confirm jmeter's behavior, and then test my actual server. Any recommendations for a simple static page that can take a high amount of QPS?
Apologies if that's too many questions, but feel free to ask for more details or only answer the primary "how to tell if I'm hitting limits" question.
JMeter doesn't have many "limits", at least they're too high to worry about, you can kick off as many as 2,147,483,647 threads given the underlying hardware/OS allows it and JMeter is properly configured
The easiest solution is switching to Distributed Mode of JMeter execution, i.e. if you're "hitting the limits of JMeter" when you add another instance of JMeter as an extra load generator the throughput should go up given the server is capable of handling the load.
Another option is first of all making sure that you're following JMeter Best Practices and setting up monitoring of baseline resources usage like CPU, RAM, Network, Disk, etc. on the machine where JMeter is running, if any of monitored metrics exceeds i.e. 90% of maximum available capacity - you're "hitting the limits" of the machine where JMeter is running.
I was able to confirm my setup worked by checking the Aggregate Report's throughput metric reached my desired QPS. Initially, when I did not reach the desired throughput and was testing against my server, I was not able to confirm whether the problem was my server or my load testing setup.
To confirm the load testing worked, I swapped the load test to hit a very simple 'hello world' service with an excess of resources. Here, the desired throughput was met.
For reference on actual setup, I ran jmeter on a 5.2xlarge EC2 instance which had 8vCPUs, up to 10Gbps network bandwidth, and 16GiB of RAM and reached 1K QPS. I have yet to see how much further I can push this particular setup.

Jmeter: Response time decreased AND throughput also decreased

I am running my jmeter script for almost a week and observed an interesting thing today. Below is the scenario:
Overview: I am gradually increasing the load on the application. In my last test I gave load of 100 users on the app and today I increased the load to 150 users.
Result of 150 users test:
Response time of the requests decreased compared to the last test. (Which is a good sign)
Throughput decreased drastically to half of what I got in the previous test with less load.
Received 225 errors while executing the test.
My questions are:
What could be the possible reason for such strange behavior of throughput? Why did throughput decrease instead of increasing with the increasing load?
Did I get good response time as many of my requests failed?
NOTE: Till 100 users test throughput was increasing with the increasing load of users.
Can anyone please help me with this question. I am a new bee in performance testing. Thanks in Advance!!
Also, would like to request if anyone can suggest good articles/site etc on finding performance bottleneck and learning crucial things in performance.
Most probably these 225 requests which failed returned failure immediately therefore average response time decreased, that's why you should be looking into i.e. Response Times Over Time chart and pay more attention to percentiles as mean response time can mask the real problem.
With regards to the bottleneck discovery, make sure to collect as much information from the server side as you can, i.e.
CPU, RAM, Network, Disk usage from JMeter PerfMon Plugin
Slow queries log from the database
"heaviest" functions and largest objects from the profiling tool for your application

How can we determine how much web requests per second a machine can handle without load testing?

What are some of the things that determine how many web requests a single machine can handle? In general what's an average number (requests per second) that most machines should be able to handle? For example, I see some answers that say 2k requests/s can easily be handled. How about 5k? 10k? etc.
I'm basically trying to do my best at estimating how many machines I'd need to scale to some high throughput, before I dive into load testing.
Yes, That is possible through performance modelling but the answers will have 5-10% error margin.
If you know exact size of web request then probably you can find your nw limit and thus this gives you maximum possible request acceptance limit. some exploratory test with sample test you can get the cpu time required by each request roughly(in terms of response time or thoughput). thus you can extrapolate the results for higher number of requests using many theorems examples, little's law. Using this theorem you can find maximum no of users (request here) can be supported on a give hw for a give acceptable response time.
but this all is done after tuning your application to expected level otherwise you will end up with lot of hw because of lack of tuning.

Load test error rate failure

I am currently doing a load test to simulate concurrency on apache on an internal network. Below is the response time that i am getting based on 10/50/100/200/500/1000 people. My first question is, how do I deduce whether this load is too much or too little. and latter:
Attached below is the error rate
a) It seems to me that when the error rate hits 100%, the response time will fluctuate between 30 - 40 ms even for other tests.
b) And when the error rate is higher for apache, the response time seem to be faster.
Could someone shed some light on why will this is so for a) when the error rate hits 100%, why the response time will fluactuate at 30/40ms and b) why will the response time decrease when the error rate increases.
Thanks for taking your time in this matter.
I can't help with (a). However, (b) is fairly common in load testing - particularly for apps where the errors happen early in the request service cycle and generating an error message requires much less work than generating a correct response to the request. BUT, I don't see evidence of that in your results. The response time is increasing as the user load increase, as does the error rate.
Have you increased the connection limits in Apache? It's pretty well know that by default, Apache is not configured to handle a large number of simultaneous connections. IIRC, Nginx is. At the load levels your results indicate, this could be impacting your results. Also, is your testing tool using persistent connections?
On rule of thumb to tell whether the load is too much is check if there are errors or response time is too long. In your case, you have non-trivial amount of errors at 50 people (VUser), this means something is wrong. You may want to investigate before increasing the people/VUser.
#CMerrill's answer for b) sounds right. When there is error, the processing load on server may be less (in most cases). For a), the fluctuation of response time around 30ms to 40ms sounds normal. The key issue is to investigate on the errors.

Recommendations for Web application performance benchmarks

I'm about to start testing an intranet web application. Specifically, I've to determine the application's performance.
Please could someone suggest formal/informal standards for how I can judge the application's performance.
Use some tool for stress and load testing. If you're using Java take a look at JMeter. It provides different methods to test you application performance. You should focus on:
Response time: How fast your application is running for normal requests. Test some read/write use case
Load test: How your application behaves in high traffic times. The tool will submit several requests (you can configure that properly) during a period of time.
Stress test: Do your application can operate during a long period of time? This test will push your application to the limits
Start with this, if you're interested, there are other kinds of tests.
"Specifically, I have to determine the application's performance...."
This comes full circle to the issue of requirements, the captured expectations of your user community for what is considered reasonable and effective. Requirements have a number of components
General Response time, " Under a load of .... The Site shall have a general response time of less than x, y% of the time..."
Specific Response times, " Under a load of .... Credit Card processing shall take less than z seconds, a% of the time..."
System Capacity items, " Under a load of .... CPU|Network|RAM|DISK shall not exceed n% of capacity.... "
The load profile, which is the mix of the number of users and transactions which will take place under which the specific, objective, measures are collected to determine system performance.
You will notice the the response times and other measures are no absolutes. Taking a page from six sigma manufacturing principals, the cost to move from 1 exception in a million to 1 exception in a billion is extraordinary and the cost to move to zero exceptions is usually a cost not bearable by the average organization. What is considered acceptable response time for a unique application for your organization will likely be entirely different from a highly commoditized offering which is a public internet facing application. For highly competitive solutions response time expectations on the internet are trending towards the 2-3 second range where user abandonment picks up severely. This has dropped over the past decade from 8 seconds, to 4 seconds and now into the 2-3 second range. Some applications, like Facebook, shoot for almost imperceptible response times in the sub one second range for competitive reasons. If you are looking for a hard standard, they just don't exist.
Something that will help your understanding is to read through a couple of industry benchmarks for style, form, function.
TPC-C Database Benchmark Document
SpecWeb2009 Benchmark Design Document
Setting up a solid set of performance tests which represents your needs is a non-trivial matter. You may want to bring in a specialist to handle this phase of your QA efforts.
On your tool selection, make sure you get one that can
Exercise your interface
Report against your requirements
You or your team has the skills to use
You can get training on and will attend with management's blessing
Misfire on any of the four elements above and you as well have purchased the most expensive tool on the market and hired the most expensive firm to deploy it.
Good luck!
To test the front-end then YSlow is great for getting statistics for how long your pages take to load from a user perspective. It breaks down into stats for each specfic HTTP request, the time it took, etc. Get it at http://developer.yahoo.com/yslow/
Firebug, of course, also is essential. You can profile your JS explicitly or in real time by hitting the profile button. Making optimisations where necessary and seeing how long all your functions take to run. This changed the way I measure the performance of my JS code. http://getfirebug.com/js.html
Really the big thing I would think is response time, but other indicators I would look at are processor and memory usage vs. the number of concurrent users/processes. I would also check to see that everything is performing as expected under normal and then peak load. You might encounter scenarios where higher load causes application errors due to various requests stepping on each other.
If you really want to get detailed information you'll want to run different types of load/stress tests. You'll probably want to look at a step load test (a gradual increase of users on system over time) and a spike test (a significant number of users all accessing at the same time where almost no one was accessing it before). I would also run tests against the server right after it's been rebooted to see how that affects the system.
You'll also probably want to look at a concept called HEAT (Hostile Environment Application Testing). Really this shows what happens when some part of the system goes offline. Does the system degrade successfully? This should be a key standard.
My one really big piece of suggestion is to establish what the system is supposed to do before doing the testing. The main reason is accountability. Get people to admit that the system is supposed to do something and then test to see if it holds true. This is key because because people will immediately see the results and that will be the base benchmark for what is acceptable.

Resources