on my spring boot app I have following tomcat configuration(tomcat verion 9) :
maxConnections : 200
maxThreads : 200
acceptCount : 100
It works perfectly fine and stable with desired tps(hosted on GCP, 1 pod, 4 cpu cores, 4Gb memory)
After increasing maxThreads and max Connection together to 250, 300 or 500 value, the cpu utilizations growing up to 100%(at the same tps ratio)and server is being restarted.
What can be the reason of such behaviour ? How to explain that ? To few cpu cores ? Memory consumption didn't change significantly.
Not sure about BIO/NIO mode, we use default one so I guess BIO.
Related
I am running a Kotlin Spring Boot based service in a Kubernetes cluster that connects to a PostgreSQL database. Each request takes around 3-5 database calls which partially run in parallel via Kotlin coroutines (with a threadpool backed coroutine context present).
No matter the configuration this services gets throttled heavily after getting hit by real traffic after just starting up. This slowness sometimes persists for 2-3 minutes and often only affects some fresh pods, but not all.
I am looking for new avenues to analyze the problem - here's a succinct list of circumstances / stuff I am already doing:
The usual response time of my service is around 7-20ms while serving 300-400 requests / second per pod
New / autoscaled instances warmup themselfes by doing 15000 HTTP requests against themselfs. The readiness probe is not "up" before this process finishes
We are currently setting a cpu request and limit of 2000m, changing this to 3000m does reduce the issue but the latency still spikes to around 300-400ms which is not acceptable (at most 100ms would be great, 50ms ideal)
The memory is set to 2gb, changing this to 3gb has no significant impact
The pods are allocating 200-300mb/s during peak load, the GC activity does not seem abnormal to me
Switching between GCs (G1 and ZGC) has no impact
We are experiencing pod throttling of around 25-50% (calculated via Kubernetes metrics) while the pod CPU usage is around 40-50%
New pods struggle to take 200-300 requests / sec even though we warm up, curiously enough some pods suffer for long periods. All external factors have been analyzed and disabling most baggage has no impact (this includes testing with disabled tracing, metric collection, disabling Kafka integration and verifying our database load is not maxing out - it's sitting at around 20-30% CPU usage while network and memory usage are way lower)
The throttling is observed in custom load tests which replicates the warmup requests described above
Connecting with visualvm during the load tests and checking the CPU time spent yields no striking issues
This is all done on a managed kubernetes by AWS
All the nodes in our cluster are of the same type (c5.2xlarge of AWS)
Any tools / avenues to investigate are appreciated - thank you! I am still puzzled why my service is getting throttled although its CPU usage is way below 100%. Our nodes are also not affected by the old kernel cfs bug from before kernel 5.6 (not entirely sure in which version it got fixed, we are very recent on our nodes kernel version though).
In the end this all boiled down to missing one part of the equation: I/O bounds.
Imagine if one request takes 10 DB calls, each taking 3 milliseconds to fulfill (including network latency etc.). A single request then takes 10*3 = 30 milliseconds of I/O. The request throughput of one request is then 1000ms / 30ms = 33,33 requests / second. Now if one service instance uses 10 threads to handle requests we get 333,3 requests / seconds as our upper bound of throughput. We can't get any faster than this because we are I/O bottlenecked in regards to our thread count.
And this leaves out multiple factors like:
thread pool size vs. db connection pool size
our service doing non-db related tasks (actual logic, json serialization when the response get fulfilled)
database capacity (was not an issue for us)
TL;DR: You can't get faster when you are I/O bottlenecked, no matter much how CPU you provide. I/O has to be improve if you want your single service instance to have more throughput, this is mostly done by db connection pool sizing in relation to thread pool sizing in relation to db calls per request. We missed this basic (and well known) relation between resources!
i want to support 7k requests per minute for my system . Considering there are network calls and database calls which might take around 4-5 seconds to complete . how should i configure task max threads and max connections to achieve that ?
This is just math.
7k requests/minute is roughly 120 requests/second.
If each request is taking 5s then you will have roughly 5 x 120 = 600 inflight requests.
That's 600 HTTP connections, 600 threads and possibly 600 database connections.
These numbers are a little simplistic but I think you get the picture.
Note the standard Linux stack size for each thread is 8MB, therefore 600 threads is going to want nearly 5GB of memory just for the stacks. This is configurable at the OS level - but how do you size it?
Therefore you're going to be up for some serious OS tuning if you're planning to run this on a single server instance.
I checked my key cache hit rate via nodetool and opscenter, the first shows a hit rate of 0.907 percent.
Key Cache : entries 1152104, size 96.73 MB, capacity 100 MB, 52543777 hits, 57954469 requests, 0.907 recent hit rate, 14400 save period in seconds
but in opscenter the graph shows 100%.
any one understands why the difference?
Cassandra has a perhaps bug (or at least typo) here, it lists it as recent hit cache but its of all time:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tools/nodetool/Info.java#L95
Its grabbing the value of the "total" hitrate:
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/metrics/CacheMetrics.java#L66
So although you may be getting 100% hit rate for the last 19 minutes according to opscenter it wasn't always 100%. The total number of hits / total number of requests of all time is ~90%.
This is shown from:
52543777 hits, 57954469 requests
52543777 / 57954469 = 0.907
In a Single Node Elastic Search along with logstash, We tested with 20mb and 200mb file parsing to Elastic Search on Different types of the AWS instance i.e Medium, Large and Xlarge.
Environment Details : Medium instance 3.75 RAM 1 cores Storage :4 GB SSD 64-bit Network Performance: Moderate
Instance running with : Logstash, Elastic search
Scenario: 1
**With default settings**
Result :
20mb logfile 23 mins Events Per/second 175
200mb logfile 3 hrs 3 mins Events Per/second 175
Added the following to settings:
Java heap size : 2GB
bootstrap.mlockall: true
indices.fielddata.cache.size: "30%"
indices.cache.filter.size: "30%"
index.translog.flush_threshold_ops: 50000
indices.memory.index_buffer_size: 50%
# Search thread pool
threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 100
**With added settings**
Result:
20mb logfile 22 mins Events Per/second 180
200mb logfile 3 hrs 07 mins Events Per/second 180
Scenario 2
Environment Details : R3 Large 15.25 RAM 2 cores Storage :32 GB SSD 64-bit Network Performance: Moderate
Instance running with : Logstash, Elastic search
**With default settings**
Result :
20mb logfile 7 mins Events Per/second 750
200mb logfile 65 mins Events Per/second 800
Added the following to settings:
Java heap size: 7gb
other parameters same as above
**With added settings**
Result:
20mb logfile 7 mins Events Per/second 800
200mb logfile 55 mins Events Per/second 800
Scenario 3
Environment Details :
R3 High-Memory Extra Large r3.xlarge 30.5 RAM 4 cores Storage :32 GB SSD 64-bit Network Performance: Moderate
Instance running with : Logstash, Elastic search
**With default settings**
Result:
20mb logfile 7 mins Events Per/second 1200
200mb logfile 34 mins Events Per/second 1200
Added the following to settings:
Java heap size: 15gb
other parameters same as above
**With added settings**
Result:
20mb logfile 7 mins Events Per/second 1200
200mb logfile 34 mins Events Per/second 1200
I wanted to know
What is the benchmark for the performance?
Is the performance meets the benchmark or is it below the benchmark
Why even after i increased the elasticsearch JVM iam not able to find the difference?
how do i monitor Logstash and improve its performance?
appreciate any help on this as iam new to logstash and elastic search.
I think this situation is related to the fact that Logstash uses fixed size queues (The Logstash event processing pipeline)
Logstash sets the size of each queue to 20. This means a maximum of 20 events can be pending for the next stage. The small queue sizes mean that Logstash simply blocks and stalls safely when there’s a heavy load or temporary pipeline problems. The alternatives would be to either have an unlimited queue or drop messages when there’s a problem. An unlimited queue can grow unbounded and eventually exceed memory, causing a crash that loses all of the queued messages.
I think what you should try is to increase the worker count with the '-w' flag.
On the other hand many people say that Logstash should be scaled horizontally, rather that adding more cores and GB of ram (How to improve Logstash performance)
You have given Java Heap size correctly with respect to your total memory, but I think you are not utilizing it properly. I hope you have idea about what is fielddata size, the default is 60% of Heap size and you are reducing it to 30%.
I don't know why you are doing this, my perception might be wrong for your use-case but its good habit to allocate indices.fielddata.cache.size: "70%" or even 75%, but with this setting you must have to set something like indices.breaker.total.limit: "80%" to avoid Out Of Memory(OOM) exception. You can check this for further details on Limiting Memory Usage.
We have an application that uses Windows Server AppFabric Caching. The cache is on the local machine, local cache is not enabled. Here is the configuration in code, none in .config.
DataCacheFactoryConfiguration configuration= new DataCacheFactoryConfiguration();
configuration.Servers= servers;
configuration.MaxConnectionsToServer= 100; // 100 is maximum
configuration.RequestTimeout= TimeSpan.FromMilliseconds( 1000);
Object expiration on PutAndUnLock is two minutes.
Here are some typical performance monitor values:
Total Data Size Bytes 700MB
Total GetAndLock Requests /sec Average 4
Total Eviction Runs: 0
Total Eviced Objects: 0
Total Object COunt: either 0 or 1.8447e+019 (suspicious, eh?) I think the active object count should be about 500.
This is running on a virtual machine, I don't think we are hardware constrained at all.
The problem: every few minutes, varies from 1 to 20, for a period of one second or so, all requests (Get, GetAndLock, Put, PutAndLock) timeout.
The only remedy I've seen online is to increase RequestTimeout. If we increase to 2 seconds the problem seems to happen somewhat less frequently, but still occurs. We can't increase the timeout more because we need the time to create the object from scratch after the cache times out.