Spring boot service higher response times under heavy load - spring-boot

the response time of my spring boot rest service running on embedded tomcat sometimes goes really high. I have isolated the external dependencies and all of that is pretty quick.
I am at a point that I think that it is something to do with tomcat's default 200 thread pool size that it reserves only for incoming requests for the service.
What I believe is that all 200 threads under heavy load (100 requests per second) are held up and other requests are queued and lead to higher response time.
I was wondering if there is a definitive way to find out if the incoming requests are really getting queued? I have done an extensive research on tomcat documentation, spring boot embedded container documentation. Unfortunately I don't see anything relevant.
Does anyone have any ideas on how to check this

Related

Usage of micrometer-registry-prometheus slow down my Spring Boot application

I have Spring Boot application 2.5.7 where I set up a micrometer to scrape metrics
runtimeOnly("io.micrometer:micrometer-registry-prometheus")
When I make a request locally http://localhost:8081/actuator/prometheus
There are no performance problems with my application
But when I make a request to the actuator on the server with a high load
https://myserver:8081/actuator/prometheus
it returns a lot more data in response and it also slows down all request that is currently running on my server.
The problem appears even after one request to /actuator/prometheus
Is there any way to optimize the micrometer work(while returning the same ammount of metrics), so it will not slow down my application?
Without sufficient data it is hard to give a recommendation. If the slowness is due to insufficient memory/garbage collection, try increasing the memory of your application.
Reviewing the metrics being returned may also give you some ideas, for example if you have a high thread count, I think there is a pause when Micrometer iterates over the thread statuses. You could look into disabling that metric.

Request Handling Capacity of springboot application with 1 instance

The number of requests that can be handled by a deployed spring boot application depends on configuration server.tomcat.threads.max. It is default as 200.
However, I believe the request handling capacity of an application also depends on various other capacities of the server, such as CPU, RAM, Disk capacity, etc.
So, the deployed instance of spring boot application with higher capacity should be able to handle more requests than lower capacity one. However, I am not clear how server.tomcat.threads.max decide this for different server sizes. Can somebody please clarify that?

Webflux: CancelledServerWebExchangeException appears in metrics for seemingly no reason

After upgrading to spring-boot 2.5, CancelledServerWebExchangeException started to appear in prometheus http_server_requests_seconds metrics quite frequently (up to 10% server responses end up with it, according to graphics). It appears in my own API metrics, as well as actuator endpoints metrics (health, info, prometheus).
Example:
http_server_requests_seconds_count{exception="CancelledServerWebExchangeException",method="GET",outcome="UNKNOWN",status="200",uri="/actuator/health"} 137.0
Kind of strange combination of outcome="UNKNOWN" & status="200"
The problem is: all these requests have successful responses.
Questions: what is this exception for and why may it occur so often?
How to reproduce: start application locally and put some load on it (I used 50 threads in jmeter accessing actuator endpoints)

Spring Boot Actuator to run in separate thread pool

Is it possible to handle actuator requests like health within a separate thread pool from the "main" application?
Why am I asking?
I've got an application that might sometimes use up all available threads, and the Kubernetes health check is failing due to the unavailability of a thread to compute the health endpoint request.
I want to make sure that every health request is processed no matter how much load the application is under.
I was thinking about maybe defining a separate thread pool for the actuators to operate with, but I am not sure how to do this.
We had a similar problem with some of our apps when running in Kubernetes. We looked at different ways of creating multiple tomcat connectors and changing the spring management port to get the desired affect, but never quite got it.
In the end, we attacked the root of the problem, which was resource starvation within the pod. We found that the apps experiencing the health check timeouts had lots of extra threads for various 3rd party thread pools. In some cases we had apps with close to 500 threads, so even under what we considered moderate load, the tomcat pools would get starved and couldn't handle new requests.
FWIW, the biggest culprit we found was the effect of CPU request on a pod and the JDK. When we didn't set any request, the JDK would see every CPU on the node when it queried for numbers of processors. We found there are lots of places in the Java ecosystem where number of processors is used to initialize different thread pools.
In our case, each node had 36 processors, and we found around 10-12 thread pools using this number to determine size...not hard to see to how an app could quickly grow to 500 threads.
I believe that switching to the nonblocking stack (Webflux) could solve your issue, should this be an option for you. If you rely on some blocking API (e.g. JDBC) you can publish it on a separate thread pool (e.g. Schedulers.elastic()). Thus, the HTTP request threads should always be available for processing the incoming trafic (including health check) and the long-running, blocking operations would be processed in a dedicated thread pool. I believe that similar effect should be possible using the asynchronous servlets API or anything that builds on top of it.
If you are using Spring Boot >= 2.2, you can use the separate library spring-boot-async-health-indicator to run your healthchecks on a separate thread pool.
Simply annotate your HealthIndicator with #AsyncHealth:
#AsyncHealth
#Component
public class AsynchronousHealthCheck implements HealthIndicator {
#Override
public Health health() { //will be executed on a separate thread pool
actualCheck();
return Health.up().build();
}
}
Disclaimer: I created this library for this exact purpose

How to ensure my Reactive application is running in event loop style

I am using spring boot 2.0.4.RELEASE. My doubt is whether my application is running in event loop style or not. I am using tomcat as my server.
I am running some performance tests in my application and after a certain time I see a strange behaviour. After the request reaches 500 req/second , my application is not able to serve more than 500 req/second. Via prometheus I was able to figure out max thread for tomcat were 200 by default. Looks like all the threads were consumed and that's why , it was not able to server more than 500 req/second. Please correct me if am wrong.
Can the tomcat server run in event-loop style ?
How can I change the event-loop size for tomcat server if possible.
Tried changing it to jetty still the same issue. Wondering if my application is running in event loop style.
Hey i think that you are doing something wrong in your project maybe one of your dependency does not support reactive programming. If you want to benefit from async programing(reactive) your code must be 100 reactive even for security you must use reactive spring security.
Normally a reactive spring application will run on netty not in tomcat so check your dependency because tomcat is not reactive
This is more of a analysis. After running some performance test on my local machine , I was able to figure out what was actually happening inside my application.
What I did was, ran performance test on my local machine and analysed the application through JConsole.
As I said I scheduled all my blocking dB calls to schedulers.elastic. What I realised that I it is causing the bottleneck. since my dB connections are limited and I am using hikari for connection pooling so it doesn’t matter the number of threads I create out of elastic pool.
Since reactive programming is more about consuming resource to the fullest with lesser number of threads, since the threads were being created in unbounded way so it was no different from normal application .
So what I did as part of resolution limited the number of threads to 100 that were supposed to be used by for dB calls. And bang number jumped from 500 tps to 2300 tps.
I know this is not the number which one should expect out of reactive application , it has much more capability. Since right now I do not have any choice but to bear with non reactive drivers .Waiting for production grade availability of reactive drivers for mssql server.

Resources