Web Server Performance Degradation - spring-boot

The web application is running on Springboot and deployed on WebLogic.
We have assigned 400 as max threads and JDBC to be 100 connections.
When we perform load testing on the web application, the performance is optimal when the load is low (the response time is less than 200ms for most of the http request that we called).
When we increase the load, we can see that the thread count increases and jdbc count also increases gradually but no where near to max. However, the response time is getting much longer and it could take more than 5 seconds to response.
CPU usage, thread count, memory, JDBC connection seems to be normal during these period.
Another observation is that during testing and we saw that the performance is degrading, we used another machine to make a http call to the server that is only retrieving text without any DB or logic, and even this simple http call will take 10s to respond. (And the server resources is still not MAX!)
So, we are wondering what keep them waiting ?
Any other possible bottleneck?

If the server doesn't lack resources like CPU/RAM/etc. only a profiler can tell you where your application spends the most time which might be in:
Waiting in a queue for next thread/db connection from the pool to be available
Slow database query
Inefficient functions/algorithms which a subject to optimization
WebLogic configuration not suitable for high loads
JVM configuration not suitable for high loads (i.e. system is doing garbage collection to often/too long)
So I would recommend re-running your test with profiler tool telemetry enabled and at the same time monitoring essential JVM metrics using i.e. JMXMon Sample Collector which can be used for monitoring your application-specific metrics as well. It's a plugin which can be installed using JMeter Plugins Manager

For a detailed approach on how ago about identifying poor thread performance I suggest you take look at the TSA Method by Brendan Gregg.

Related

detect scaling problem form performance test result

I conducted performance testing on e-commerce website and I have the test results with some matrices. I already found some problems on some component for example on checkout or post login with high response time and error. But I also would like to find issues that are limiting the application to scale. I only did the testing on the application server. And I observed that CPU , I/O rate are very stable as well. But still the application gives high response time. Is there any other way I can determine from the test result why it is not scaling well? Thank!
From JMeter test result only - unlikely, JMeter just sends requests, waits for the responses and measures the time in-between plus collects some extra metrics like connect time and latency, see the JMeter Glossary for full list with explanations
The integrated system acts at the speed of its slowest component, possible reasons could be in:
Network issues (i.e. lack of bandwidth, faulty router, long DNS resolution time, etc.)
Your application is not properly configured for high loads. Inspect the current setup of the application in terms of thread pools, maximum number of open connections, any limitations on resource usage, etc. Look for documentation on performance tuning of individual middleware compoments as well.
Repeat your test run with a profiler tool telemetry enabled or look at the APM tool output for the test time frame if the tool is in place, it will allow you do perform a deep dive into what's going on under the hood of this or that function call as it might be inefficient algorithm or a slow database query

503 Error while Running JMeter for Thread 400,Is it Because of Server issues?

Getting 503 Error while Running the JMeter for the Thread User 400,Is it Because of Server issues.? When I run the thread group for 100 user with ramp up period 25 seconds then it will be working fine but for the user 400 users its giving 503 error.
Given you don't experience any issues with 100 users and have issues with 400 users most probably it's a server issue connected with the overload so congratulations on finding the bottleneck.
You can either report it as is or perform a little bit deeper investigation in order to find the cause, suggested steps:
Instead of kicking off 400 users at once try increasing the load gradually at the same time looking at Response Times vs Threads and Transaction Throughput vs Threads charts. Ideally response time should remain the same and throughput should be growing as the number of threads increase. When response time starts increasing and throughput starts decreasing it indicates the saturation point and at this stage you can state that this is the maximum number of users your application can support
Check your application logs and configuration as it might be not properly tuned for the high loads, you can use 15 Simple ASP.NET Performance Tuning Tips as a reference or look for a similar guide for your application technology stack
Ensure that your application has enough headroom to operate in terms of CPU, RAM, Network, etc. as it might be the case that it's basically a lack of resources, it can be done using i.e. JMeter PerfMon Plugin
Repeat your test with profiler tool telemetry in place, this way you will be able to localize the problem and state where is the problematic piece of code or inefficient algo lives.
If server isn't down/restarted, then yes, 503 indicate overload
Common causes are a server that is down for maintenance or that is overloaded
You need to find what stop server from serving 400 concurrent requests/users
Notice that if you are testing on a test environment which isn't equal/similar to production environment, it may not reflect the load that production server can endure

Performance testing bottneck microsercice

I am using jmeter to load test my APi server(running on tomcat) which inturn calls a micrroservicr using thrift.(20k requests/min)
I am using new relic for monitoring . I have observed that a an abnormally high time is spent when API calls the microservice(ranging from 10-15seconds).So I observed the microservice over the same duration. The response time was almost negligible.(10-12 milliseconds)
So, I suspected probably API is queueing up the responses because it is unable to accept the rate at which its receiving response from the microservice.To address the same I doubled Xmx and Xms value of my API java application.
Still am observing the same , what could be the bottleneck which I am missing out.
Make sure that your API running on Tomcat has enough headroom in terms of CPU, Ram, Network, Disk, etc. as it might be slowing the things down. You can use JMeter PerfMon Plugin for this
Make sure that Tomcat itself is configured for high loads as the threads might be queuing up on Tomcat HTTP Connector, i.e. if threads in executor are less than the number of connections you establish - the requests will be queuing up even before reaching your API
Re-run your test using profiler tools telemetry, i.e. set up JProfiler or YourKit monitoring - this way you will learn where your API spends the most of time and what is the underlying reason

The Average response time differs largely while load testing the Same API Endpoint with same load using Jmeter

I am using JMeter to test performance of an API Endpoint.
The Number of Threads(users) applied is 100.
When I execute the test for the very first time the average response time is 35345 ms.
for all the following tests with the same number of threads on the same API Endpoint the average response time is somewhere around 4705 ms.
what is the reason for such a big difference in average response time ?
Does JMeter cache any files on the first test and use the same cached files on all following tests ?
If yes how do I avoid this ?
I am new to JMeter any help in this regards will be much appreciated.
JMeter doesn't cache anything when it comes to API testing, it has HTTP Cache Manager which can represent browser cache when it comes to handling embedded resources like images, scripts and styles when it comes to web testing (mimicking HTTP requests send by real browsers) but it is not your case.
So my expectation is that it is something on your application under test side, i.e. it needs to "warm up" its own caches and first few requests are processed longer due to lazy initialization of components on the first access.
So my recommendations are:
Make sure you use reasonable ramp-up period so the load increases gradually, this way you will be able to correlate main metrics like response time, hits per second, etc. with the increased load
Monitor your application under test health from OS perspective, i.e. CPU, RAM, Swap consumption, etc. You can use JMeter PerfMon Plugin for that
Use profiling tools which are relevant for your application to detect "heavy" methods, long-running DB queries, etc.

MaxConcurrentRequest in selfhost application

I have a selfhost signalr application, everything is ok but when users become more than 5000, users reconnected rapidly. I know that defalt value of appConcurrentRequestLimit is 5000. and i run this:
cd %windir%\system32\inetsrv
appcmd.exe set config /section:system.webserver/serverRuntime /appConcurrentRequestLimit:100000
but nothing changed. I increased maxConcurrentRequestsPerCPU and requestQueueLimit according to this
but i have got problem yet.
i'm using windows server 2012 and iis 8
You are shooting in the dark here, and you have no data about the actual performance and what's happening. The users could reconnect because of different reasons (server timeouts, regular interval reconnects, server errors). There are countless possibilities.
The correct way to know what's happening and measure performance is to run a Baseline performance load test using the default configuration, and collect the relevant performance counters like current requests, queued requests, current connections, max connections etc.
You should also collect any relevant Error logs on the server that could help you figure out what's happening.
You can find the full list of performance counters you need below:
Memory
.NET CLR Memory# bytes in all Heaps (for w3wp)
ASP.NET
ASP.NET\Requests Current
ASP.NET\Queued
ASP.NET\Rejected
CPU
Processor Information\Processor Time
TCP/IP
TCPv6\Connections Established
TCPv4\Connections Established
Web Service
Web Service\Current Connections
Web Service\Maximum Connections
Threading
.NET CLR LocksAndThreads\ # of current logical Threads
.NET CLR LocksAndThreads\ # of current physical Threads
Once you have your baseline performance results on a graph, then you can modify configuration (e.g. modify the number of concurrent requests like you tried above) and then re-run your test, and collect again the same performance counters.
The performance counter results will speak for themselves, and they will lead you to a solution.
You can generate the load with a tool like Crank:
https://github.com/SignalR/SignalR/tree/dev/src/Microsoft.AspNet.SignalR.Crank
In addition you can also check the SignalR troubleshooting guide:
http://www.asp.net/signalr/overview/testing-and-debugging/troubleshooting

Resources