Unable to Load an IIS based webservice more than 30% CPU. - jmeter

I am load testing an IIS based webservice
I need to find out what max. throughput it can support
both the server and the load generators are setup in AWS
The problem is that throughput of the webservice is not going beyond 1500 req/sec even on increasing the users from 500 to 3000, only response time increases (PS: i am using 15GB ram 8 core AWS machines for load generation).
Eore wierd part is CPU usage is not 100%, it is merely30-40%
Even the memory utilization is not high it is 20%.
I tried many counters in PerfMon and did not see anything which could show possible bottleneck
When I use a single machine to generate load it shows ~1500 throughput, if I add one more load generator then the throughput visibly drops to half on the original machine, still giving me a combined total of ~1500 requests/sec.
WHat am I missing here?
Thanks for your help in advance

Check the IIS configurations and thread pool settings in IIS. This is quite
known issue. If available threads are less and even if CPU or memory
is available throughput wont grow as the requests are queued up waiting.
Also check processor queue length counter in perfmon. It could be
some IO issue if the queue is long throughout the test

Related

Spring Boot High Heap Usage

We have a spring boot application that runs on 20 servers and we have a balancer that redirects the requests to our servers.
Since last week we are having huge problems with CPU usage (100% in all VM's) almost simultaneously without having any noticeable increase in the incoming requests.
Before that, we had 8 VM's without any issue.
In peak hours we have 3-4 thousand users with 15-20k requests per 15 minutes.
I am sure it has something to do with the heap usage since all the CPU usage comes from the GC that tries to free up some memory.
At the moment, we isolated some requests that we thought might cause the problem to specific VM's and proved that those VM's are stable even though there is traffic. (In the picture below you can see the OLD_gen memory is stable in those VM's)
The Heap memory looked something like this
The memory continues to increase and there are two scenarios, it will either reach a point and after 10 minutes it will drop on its own at 500MB or it will stay there cause 100% CPU usage and stay there forever.
From the heap dumps that we have taken, it seems that most of the memory has been allocated in char[] instances.
We have used a variety of tools (VisualVM, JProfiler etc) in order to try to debug the issue without any luck.
I don't know if I am missing something obvious, or something else.
I also tried, to change GC algorithm to G1 from the default and disable hibernate query cache plan since a lot of our queries are using the in parameter for filtering.
UPDATE
We managed to reduce the number of requests in our most heavily used API Call and the OLD gen looks like that now. Is that normal?

What methodology would you use to measure the load capacity of a software server application?

I have a high-performance software server application that is expected to get increased traffic in the next few months.
I was wondering what approach or methodology is good to use in order to gauge if the server still has the capacity to handle this increased load?
I think you're looking for Stress Testing and the scenario would be something like:
Create a load test simulating current real application usage
Start with current number of users and gradually increase the load until
you reach the "increased traffic" amount
or errors start occurring
or you start observing performance degradation
whatever comes the first
Depending on the outcome you either can state that your server can handle the increased load without any issues or you will come up with the saturation point and the first bottleneck
You might also want to execute a Soak Test - leave the system under high prolonged load for several hours or days, this way you can detect memory leaks or other capacity problems.
More information: Why ‘Normal’ Load Testing Isn’t Enough
Test the product with one-tenth the data and traffic. Be sure the activity is 'realistic'.
Then consider what will happen as traffic grows -- with the RAM, disk, cpu, network, etc, grow linearly or not?
While you are doing that, look for "hot spots". Optimize them.
Will you be using web pages? Databases? Etc. Each of these things scales differently. (In other words, you have not provided enough details in your question.)
Most canned benchmarks focus on one small aspect of computing; applying the results to a specific application is iffy.
I would start by collecting base line data on critical resources - typically, CPU, memory usage, disk usage, network usage - and track them over time. If any of those resources show regular spikes where they remain at 100% capacity for more than a fraction of a second, under current usage, you have a bottleneck somewhere. In this case, you cannot accept additional load without likely outages.
Next, I'd start figuring out what the bottleneck resource for your application is - it varies between applications, but in most cases it's the bottleneck resource that stops you from scaling further. Your CPU might be almost idle, but you're thrashing the disk I/O, for instance. That's a tricky process - load and stress testing are the way to go.
If you can resolve the bottleneck by buying better hardware, do so - it's much cheaper than rewriting the software. If you can't buy better hardware, look at load balancing. If you can't load balance, you've got to look at application architecture and implementation and see if there are ways to move the bottleneck.
It's quite typical for the bottleneck to move from one resource to the next - you've got CPU to behave, but now when you increase traffic, you're spiking disk I/O; once you resolve that, you may get another CPU challenge.

Uncaught Exception java.lang.OutOfMemoryError: "unable to create new native thread" error occurring while running jmeter in non gui mode

My scenario,
Step1: I have set my thread group for 1000:threads & 500:seconds
Step2:Configure heep space : HEAP=-Xms1024m -Xmx1024m
Step3:Now, running jmeter for non gui mode.
In this scenario,"Uncaught Exception java.lang.OutOfMemoryError: unable to create new native thread" error occuring in my system.
My system configuration
Processor:Intel® Pentium(R) CPU G2010 # 2.80GHz × 2
OS Type:32 bit
Disc:252.6GB
Memory:3.4 GiB
kindly give me a solution for this scenario.
Thanks,
Vairamuthu.
You don't have enough memory in your machine to consume 1000 threads. It is clearly visible from the error that your machine can not create 1000 threads. You should tweak your machine to resolve this situation.
You have to consider these points:
JMeter is a Java tool it runs with JVM. To obtain maximum capability, we need to provide maximum resources to JMeter during execution.First, we need to increase heap size (Inside JMeter bin directory, we get jmeter.bat/sh)
HEAP=-Xms512m –Xmx512m
It means default allocated heap size is minimum 512MB, maximum 512MB. Configure it as per you own PC configuration. Keep in mind, OS also need some amount of memory, so don't allocate all of you physical RAM.
Then, add memory allocation rate
NEW=-XX:NewSize=128m -XX:MaxNewSize=512m
This means memory will be increased at this rate. You should be careful, because, if your load generation is very high at the beginning, this might need to increase. Keep in mind, it will fragment your heap space inside JVM if the range too broad. If so Garbage Collector needs to work harder to clean up.
JMeter is Java GUI application. It also has the non-GUI edition which is very resource intensive(CPU/RAM). If we run Jmeter in non-GUI mode , it will consume less resource and we can run more thread.
Disable all the Listeners during the Test Run. They are only for debugging and use them to design your desired script.
Listeners should be disabled during load tests. Enabling them causes additional overheads, which consume valuable resources (more memory) that are needed by more important elements of your test.
Always try to use the Up-to-date software. Keep your Java and JMeter updated.
Don’t forget that when it comes to storing requests and response headers, assertion results and response data can consume a lot of memory! So try not to store these values on JMeter unless it’s absolutely necessary.
Also, you need to monitor whether your machine's Memory consumption, CPU usages are running below 80 % or not. If these usages exceed 80 % consider those tests as unreliable as report.
After all of these, if you can't generate 1000 threads from your machine, then you must try with the Distributed Load Testing.
Here is a document for JMeter Distributed Testing Step-by-step.
For better and more elaborated understanding these two blogs How many users JMeter can support? and 9 Easy Solutions for a JMeter Load Test “Out of Memory” Failure must help.
I have also found this article very helpful to understand and how to handle them.
The error is due to lack of free RAM.
Looking into your hardware, it doesn't seem you will be able to produce the load of 1k users so I would recommend reconsidering your approach.
For example, you anticipate 1000 simultaneous users working with your application. However it doesn't necessarily mean 100 concurrent users as:
real users don't hammer application non-stop, they need some time to "think" between operations, this "think time" differs depending on application nature, but you should keep it as close to reality as possible
application response time should be added to think time
So given you have 1000 users, each of them "thinks" 10 seconds between operations and application response time is 2 seconds, each user will be able to send 5 requests per minute (60 / (10 + 2)).
Assuming above scenario 1000 users will send 5000 requests per minute which gives us ~83 requests per second and it seems to be achievable with your current hardware.
So if you are not in position to get more powerful hardware or more similar machines to use JMeter in distributed more, the options are in:
Add "think times" between operations using i.e. Constant Timer or Uniform Random Timer
Change your test scenario logic to simulate "requests per second" rather than "concurrent users". You can do it using Constant Throughput Timer or Throughput Shaping Timer.
Your issue is due to using a 32 bit OS, in this mode you are limited both in what you can allocate as Heap (depending on OS you will not be able to exceed 1.6 to 2.1 g) and native threads creation.
I'd suggest switching to 64 Bits OS + 64 bits Jdk.
But if you don't have any other option try setting in jmeter.sh in JVM_ARGS:
-Xss128k
Or if too low:
-Xss256k

Java 8 Concurrent Hash Map get Performance/Alternative

I have a high throughput low latency application (3000 Request/Sec, 100ms per request), and we heavily use Java 8 ConcurrentHashMap for performing lookups. Usually these maps are updated by a single background thread and multiple threads read from these maps.
I am seeing a performance bottleneck, and on profiling I find ConcurrentHashMap.get as being the hotspot and taking majority of the time.
I another case, I see ConcurrentHashMap.computeIfAbsent being the hotspot, although the mapping-function has very small latency and the profile shows computeIfAbsent spending 90% of the time executing itself, and very less time in executing the mapping-function.
My question is there any way i could improve the performance? I have around 80 threads concurrently reading from CHM.
I have around 80 threads concurrently reading from CHM.
The simplest things to do are
if you have a CPU bound process, don't have more active threads than you have CPUs, otherwise this will only add overhead and if these threads hold a lock while not running, because you have too many threads, it will really not help.
increase the number of partitions. You will want to have at least 4x the number of segments/partitions and you have threads accessing a single map. However, you will get strange behaviour in CHM if you access it with more than 40 threads due to the way cache coherency works. I suggest using a more efficient data structure for higher degrees of concurrency. In Java 8 the concurrencyLevel is a hint, but it is better than leaving the default initialise size of 16.
don't spend so much time in CHM. Find a way to do useful work without hitting a shared resource and your threads will run much more efficiently.
If you have any latencies you can see in a low latency system, you have a problem IMHO.

Reaching limits of Apache Storm

We are trying to implement a web application with Apache Storm.
Applicationreceives a huge load of ad-requests (100 TPS - a hundred transactions / second ),makes some simple calculation on them and then stores the result in a NoSQL database with a maximum latency of 10 ms.
We are using Cassandra as a sink for its writing capabilities.
However, we have already overpassed the 8 ms requirement, we are in 100ms.
We tried to minimize the size of buffers (Disruptor buffers) and to well balance the topology, using the parallelism of bolts.
But we still in 20ms.
With 4 worker ( 8 cores / 16GB ) we are at 20k TPS which is still very low.
Is there any suggestions for optimization orare we just reaching the limits of Apache Storm(limits of Java)?
I don't know the platform you're using, but in C++ 10ms is eternity. I would think you are using the wrong tools for the job.
Using C++, serving some local query should take under a microsecond.
Non-local queries that touch multiple memory locations and/or have to wait for disk or network I/O, have no choice but taking more time. In this case parallelism is your best friend.
You have to find the bottleneck.
Is it I/O?
Is it CPU?
Is it memory bandwidth?
Is it memory access time?
After you've found the bottleneck, you can either improve it, async it and/or multiply (=parallelize) it.
There's a trade-off between low latency and high throughput.
If you really need to have high throughput, you should rely on batching adjusting size of buffers bigger, or using Trident.
Trying to avoid transmitting tuples to other workers helps low latency. (localOrShuffleGrouping)
Please don't forget to monitor GC which causes stop-the-world. If you need low-latency, it should be minimized.

Resources