I have a tomcat application running on an 8 core system. I observed that when changed maxthread count from 16 to 2 ,there was a dramatic improvement of performance for throughputs of 13 req/sec
So, started printing the active thread count , it seems that when maxthread of tomcat was set to 2 , the active threads on an average for 8 , so basically 8 threads operating on 8 cores , best possible outcome
However, when I increased the throughput to 30-40 req/sec I saw requests queueing up . So , what happened here is that due to only maxthreads 2 requests started piling up .
And when I then set maxThreads to very high value like 10k I saw JVM taking long again context switching .
My question is , is there any property in tomcat wherein I can specify how many requests are to be picked up to process in JVM parallely .
acceptCount property wont help cause it only defines threshold of request up .
There is another property called acceptorThreadCount which is defined as number of threads to be used to accept connections , is this the property I need to tune , or is there any other property , or anything I am missing here?
According to the Connector documentation for maxThreads (I'm assuming that this is where you changed your maxThreads configuration):
The maximum number of request processing threads to be created by this
Connector, which therefore determines the maximum number of
simultaneous requests that can be handled. If not specified, this
attribute is set to 200. If an executor is associated with this
connector, this attribute is ignored as the connector will execute
tasks using the executor rather than an internal thread pool. Note
that if an executor is configured any value set for this attribute
will be recorded correctly but it will be reported (e.g. via JMX) as
-1 to make clear that it is not used.
There's no problem (quite the opposite) setting the thread count to higher than the number of available cores, as not every core always is working (quite often they're waiting for external input, e.g. data from a database).
In case I've missed the point and you change a different maxThreads configuration, please clarify. On the other hand, your question is about the configuration that specifies how many requests are handled in parallel: If you referred to a different maxThreads, then tomcat's default is 200, and it can be changed in the Connector's configuration (or, as the documentation says, with Executors)
Related
There are around 1000+ jobs running through our service in a day and around 70-80 jobs starting at the same time and running parallelly.
To handle this, we looked that increasing the number of max threads to a large number to server.tomcat.max-threads property of our Spring application should work but I do not have full confidence as to what all can be the side effects of having a huge number like 800 to this property.
Can you please help here.
The default installation of Tomcat sets the maximum number of HTTP servicing threads at 200. Effectively, this means that the system can handle a maximum of 200 simultaneous HTTP requests. When the number of simultaneous HTTP requests exceeds this count, the unhandled requests are placed in a queue, and the requests in this queue are serviced as processing threads become available. This default queue length is 100. At these default settings, a large web load that can generate over 300 simultaneous requests will surpass the thread availability, resulting in service unavailable (HTTP 503).
More reference: https://docs.bmc.com/docs/brid91/en/tomcat-container-workload-configuration-825210082.html
How to run multiple servlets execution in parallel for Tomcat?
If this is a batch job like configuration, you can use spring batch.
I am stuck with a strange problem and not able to find out its root cause. This is my rest template thread pool executor :
connectionRequestTimeout: 60000
connectTimeout: 60000
socketTimeout: 60000
responseTimeout: 60000
connectionpoolmax: 900
defaultMaxPerRoute: 20
corePoolSize: 10
maxPoolSize: 300
queueCapacity: 0
keepAliveSeconds: 1
allowCoreThreadTimeOut: true
1) I know as the queueCapacity is 0 thread pool executor is going to create SynchronusQueue. The first issue is if I give its value positive integer value such as 50, application performance is decreasing. As per my understanding, we should only be using SynchronouseQueue in rare cases not in a spring boot rest API based application like mine.
2) Second thing is, I want to understand how SynchronousQueue works in a spring boot rest API application deployed on a server (tomcat). I know A SynchronousQueue has zero capacity so a producer blocks until a consumer is available, or a thread is created. But who consumer and producer in this case as all the requests are served by a web or application server. How does SynchronousQueue will basically work in this case?
I am checking the performance by running JMeter script on my machine. This script can handle more cases with queueCapacity 0 rather than some > 0.
I really appreciate any insight.
1) Don't set the queueCapacity explicitly otherwise, it is bound to degrade performance. Since we're limiting the incoming requests that can reside in the queue and it will be taken up once one of the thread becomes available from the fixed threadpool.
ThreadPoolTaskExecutor has a default configuration of the core pool
size of 1, with unlimited max pool size and unlimited queue capacity.
https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/scheduling/concurrent/ThreadPoolTaskExecutor.html
2) In a SynchronousQueue, pairs of insert and remove operations always occur simultaneously, so the queue never actually contains anything. It passes data synchronously to other thread, it waits for the other party to take the data instead of just putting data and returning.
Read more:
https://javarevisited.blogspot.com/2014/06/synchronousqueue-example-in-java.html#ixzz6PFz4Akom
https://www.baeldung.com/thread-pool-java-and-guava
I hope my answer can help you in one way.
Context:
My Spring-Boot app runs as expected on Cloud Run when I deploy it with max-instances set to 1: It receives a constant stream of pubsub messages via push, and makes anywhere from 0 to 5 writes to an associated CloudSQL instance, depending on the message payload. Typically it handles between 20 and 40 messages per second. Latency/response-time varies between 50ms and 60sec, probably due to some resource contention.
In order to increase throughput/ decrease resource contention, I'm looking to experiment with the connection pool size per app-instance, as well as the concurrency and max-instances parameters for my cloud run app.
I understand that due to Spring-Boot, my app has a relatively high cold-start time of about 30-40 seconds. This is acceptable for how this service is used.
Problem:
I'm experiencing problems when deploying a spring-boot app to cloud run with max-instances set to a value greater than 1:
Instances start, handle a single request successfully, and then produce no more logs.
This happens a few times per minute, leading me to believe that instances get started (cold-start), handle a single request, die, and then get started again. They are not being reused as described in the docs, and as is happening when I set max-instances to 1. Official docs on concurrency
Instead, I expect 3 container instances to be started, which then each requests according to max-concurrency setting.
Billable container time at max-instances=3:
As shown in the graph, the number of instances is fluctuating wildly, once the new revision with max-instances=3 is deployed.
The graphs for CPU- and memory-usage also look like this.
There are no error logs. As before at max-instaces=1, there are warnings indicating that there are not enough instances available to handle requests (HTTP 429).
Connection Limit of CloudSQL instance has not been exceeded
Requests are handled at less than 10/s
Finally, this is the command used to deploy:
gcloud beta run deploy my-service --project=[...] --image=[...] --add-cloudsql-instances=[...] --region=[...] --platform=managed --memory=1Gi --max-instances=3 --concurrency=3 --no-allow-unauthenticated
What could cause this behavior?
Some month ago, in private Alpha, I performed tests and I observed the same behavior. After discussion with Google team, I understood that instances are over provisioned "in case of": an instances crashes, an instances is preempted, the traffic suddenly increase,...
The trade-off of this is that you will have more cold start that your max instances values. Worse, you will be charged for this over provisioned cold start -> this is not an issue because Cloud Run has a huge free tier that covers this kind of glitches.
Going deeper in the logs (you can do it by creating a sink of Cloud Run logs into BigQuery and then by requesting them), even if there is more instances up than your max instances, only your max instances are active in the same time. I'm not sure to be clear. With your parameters, that means, if you have 5 instances up in the same time, only 3 serve the traffic at the same point of time
This part is not documented because it evolves constantly for find the best balance between over-provisioning and lack of ressources (and 429 errors).
#Steren #AhmetB can you confirm or correct me?
When Cloud Run receives and processes requests rapidly, it predicts how many instances it needs, and will try to scale to the amount. If a sudden burst of requests occur, Cloud Run will instantiate a larger number of instances as a response. This is done in order to adapt to a possible higher number of network requests beyond what it is currently serving, with attempts to take into consideration the length of time it will take for the existing instance to complete loading the request. Per the documentation, it is possible that the amount of container instances can go above the max instance value when it spikes.
You mentioned with max-instances set to 1 it was running fine, but later you mentioned it was in fact producing 429s with it set to 1 as well. Seeing behavior of 429s as well as the instances spiking could indicate that the amount of traffic is not being handled fluidly.
It is also worth noting, because of the cold start time you mention, when instances are serving the first request(s), by design, the number of concurrent requests is actually hard set to 1. Once things are fully ready,only then the concurrency setting you have chosen is applied.
Was there some specific reason you chose 3 and 3 for Max Instance settings and concurrency? Also how was the concurrency set when you had max instance set to 1? Perhaps you could try tinkering up further the concurrency (max 80) and /or Max instances (high limit up to 1000) and see if that removes the 429s.
We are creating a new hosted server for one of our APIs on managed containers (Kubernetes) and we're trying to validate that it can handle at least the same amount of traffic load requests.
We've started with one of the APIs, where we would need to handle at least 140k requests per minute, all endpoints combined.
To verify this, I created a simple JMeter test as follows:
-Test Plan
---Thread Group Endpoint1
-----HTTP Request -> a GET request with query params for /path1
---Thread Group Endpoint2
-----HTTP Request -> a GET request with query params for /path2
For a local test, I used the following setup:
Thread Groups Endpoint1 and Endpoint2 are set to 200 threads (users), ramp-up period of 1s, loop count = forever and duration 60s.
Using a Summary Report listener when running the test gets me a total of ~9300 # Samples.
Using this approach, is it safe to just increase the number of threads (users) for the Thread Groups until I reach the desired 140k requests per minute?
Note: I only used JMeter a little before, so I'm aware that the entire approach may be wrong, therefore any suggestions and steering to the right path are more than welcomed.
Your approach is viable as long as it represents real-life application usage. If it has 2 endpoints with equally/evenly distributed load - your setup is just fine. If there are more endpoints and some of them are used more than the others - consider defining the workload correspondingly either using different Thread Groups or other distribution mechanism such as Throughput Controller
Increasing the number of threads is also fine, however consider increasing the load gradually, to wit increase ramp-up time so your test could have:
Arrivals phase
Time to hold the load
Ramp-down phase
This way you will be able to correlate various metrics like increasing response time, throughput, number of errors, etc. with the increasing load. Also you will be able to state what was the number of threads/requests per second when the system reached saturation point/breaking point and does it recover when the load gets back.
Also make sure you're following JMeter Best Practices as 2300/2500 requests per second is not something JMeter can support out of the box and you will need to do some tuning, at least increase JVM Heap size allocated to JMeter.
You may not be able to achieve the desired 140k requests per minute using a single Jmeter Machine, in that case you'll need Distributed Load Testing approach here.
refer: http://jmeter.apache.org/usermanual/jmeter_distributed_testing_step_by_step.html
Also keeping the ramp-up period of 1 second will lead to spike and unrealistic load in the system which will not give proper result unless you've pre-warmed your server, you should gradually increase the load as per real/estimated traffic pattern.
I am testing the server with spring-boot.
However, I got some problems during doing test.
my test is
How many memories server use with increasing the web socket sessions(the number of client).
1,000 clients(lesser than 9000 sessions) has no issues with doing the test.
but, When I tried to test 10k connections, server made connections almost until 10,000.(sometimes creating sessions until 9990, sometimes 9988, 9996 like this, not the specific limit the number of socket)
after that, it just stopped creating sessions, no errors just not responding.
If some clients get timeout and release the connection, other clients which were waiting to connect are able to get connections.
'environment'
tomcat : 8.0.36
spring-boot : 1.3.3
java : 1.8
for solutions, I tried
increasing heap size.
I increase jvm heap memory by 5GB. but heap memories which are used for connections are only 2GB. So, I think that it is not related to JVM memory.
I set server.tomcat.max-thread = 20000 in application.porperties.
but it was failed, no difference before.
I am really curious about this issue. If you guys knows this problem and have ideas, let me know the reason.
Thanks.
Tomcat - maxThreads vs maxConnections
Try to set maxConnections property to be more than 10000.
From the doc:
The maximum number of connections that the server will accept and process at any given time. When this number has been reached, the server will accept, but not process, one further connection. This additional connection be blocked until the number of connections being processed falls below maxConnections at which point the server will start accepting and processing new connections again. Note that once the limit has been reached, the operating system may still accept connections based on the acceptCount setting. The default value varies by connector type. For BIO the default is the value of maxThreads unless an Executor is used in which case the default will be the value of maxThreads from the executor. For NIO the default is 10000. For APR/native, the default is 8192.
Note that for APR/native on Windows, the configured value will be reduced to the highest multiple of 1024 that is less than or equal to maxConnections. This is done for performance reasons.
If set to a value of -1, the maxConnections feature is disabled and connections are not counted.
There is a properties for spring boot, tomcat max-connection, which needs to be set in application.properties file
server.tomcat.max-connections= # Maximum number of connections that the server will accept and process at any given time.