Kubernetes Pod CPU Throttling after few (3-4) runs - spring-boot

I am having a spring boot application with spring-kafka version 2.5.
Our application is containerized in Docker and deployed in Kubernetes cluster.
This consumer fetches a record and performs database aggregation where I am using EclipseLink as a persistence service and Oracle database.
I am publishing batches of 100k message to this consumer topic. What we are observing is for initial few batches CPU usage of application is below 500milli core. But after few runs, CPU spikes up to its limit 2000milli core and even it throttles for more 2000milli core. There is no change done in between runs which will increase cpu usage. And the only solution we figured out is to redeploy our application in order to reduce cpu usage and then again it behaves same way.
We have performed JVM profiling and couldn't found any issue at Memory or Garbage collection.
My CPU request is 1000milli core and CPU limit is set to 2000milli core.
Please help me in finding what could be the root cause of this weird cpu throttling issue after few runs.
Appreciate your time.
Please find attached screenshot of CPU usage dashboard which shows sudden spike after some time for CPU usage.
My suspect is some configuration required at Kubernetes level to resolve this issue.

Related

Tomcat maxthreads not increasing after 300, CPU under-utilised for high loads

I have a microservice using spring boot 2.7.0 with embedded NIO tomcat. The application is responsible for receiving requests and for each request it makes 6 parallel remote calls waits at most 2 seconds for response from any of the 6 requests.
While performance testing this microservice using jmeter I observed that the CPU remains under-utilised around 14-15% but the microservice's response time increases to more than a minute. Typically it shouldn't be more than 2-3 seconds.
There are 3 thread configurations in my microservice:
Tomcat threads here I tried various configuration of maxthreads, maxconnection,accept-like (5000,30000,2000), (500,10000,2000), (200,5000,2000) but the CPU is always under-utilised. Here are the properties I am changing
server.tomcat.max-threads=200
server.tomcat.max-connections=5000
server.tomcat.accept-count=2000
server.connection-timeout=3000
For each request received we create a ForkJoinPool with parallelism as 6 to make the 6 remote calls. We tried using an ExecutorService too with different configuration like newSingleThreadExecutor,newCachedThreadPool,newWorkStealingPool. Also increased pool size to around same as maxThreads of tomcat and beyond but the result was same CPU still underutilized but microservice taking more than a minute to respond.
On logging the active thread count here we saw that no matter how much thread pool size or tomcat maxthreads we increased the, active thread count went upto 300 then start declining. We tried with a 4core 8GB system and 8core 16GB system results were exactly same
For making remote calls we use spring rest template with maxConnTotal and maxConnTotalPerRoute same as maxthreads of tomcat. maxConnTotal and maxConnTotalPerRoute are same because all 6 remote calls are to the same server.
Here are the jmeter parameters used -GTHREADS=1000 -GRAMP_UP=180 -GDURATION=300
There are 3 instances of this microservice running, roughly after 2-2.5 minutes after jmeter starts, all 3 instance's response time goes beyond a minute for all requests while CPU remains at 14-15% only. Could someone please help figure out what CPU is not spiking if CPU would spike to 35% then autoscaling would kick in but since CPU is under-utilised no scaling is happening
Use a profiler tool like VisualVM, YourKit or JProfiler to see where your application spends the most time
CPU is not the only possible bottleneck, check Tomcat's connection pool utilization as it might be the case the requests are queuing up, memory usage, network usage, database pool usage, DB slow queries log and so on. If you don't have a better monitoring software or an APM tool in place you can consider using JMeter PerfMon Plugin
We replaced RestTemplate for remote calls with WebClient and introducted WebFlux Mono to make the complete request non-blocking. The request itself now returns our response wrapped in Mono. It solved our issue now there is no idle time as threads are not blocked on IO rather they are busy serving other requests.

Why is the Heap Memory Usage of a SpringBoot application keeps increasing?

I created and ran a simple SpringBoot application to accept some requests. I used jconsole to check the Heap Memory Usage and I saw this periodic increase followed by GC, I don't know the reason for the increases. Are there any Objects keep being created (because I think the instances are imported to container when the application starts)?
Spring boot has background processes which may consume your resources despite on absent requests to your app, for example jobs, etc.
Those spikes are expected and regular for any Java app based on any more less complex framework. GC algorithm depends on your jvm version, but could be overridden. The graph shows normal situation, from time to time memory consumed for some activities and after some time GC wake up and do the cleaning.
In case if you want to check what exactly objects caused memory spike you may try to use Java Flight Recorder or regular heap dump analysis using Eclipse memory analyser.
For current case Java Flight Recorder would be more convenient and suitable.

GKE: How to handle deployments with CPU intensive initialization?

I have a GKE cluster (n1-standard-1, master version 1.13.6-gke.13) with 3 nodes on which I have 7 deployments, each running a Spring Boot application. A default Horizontal Pod Autoscaler was created for each deployment, with target CPU 80% and min 1 / max 5 replicas.
During normal operation, there is typically 1 pod per deployment and CPU usage at 1-5%. But when the application starts, e.g after performing a rolling update, the CPU usage spikes and the HPA scales up to max number of replicas reporting CPU usage at 500% or more.
When multiple deployments are started at the same time, e.g after a cluster upgrade, it often causes various pods to be unschedulable because it's out of CPU, and some pods are at "Preemting" state.
I have changed the HPAs to max 2 replicas since currently that's enough. But I will be adding more deployments in the future and it would be nice to know how to handle this correctly. I'm quite new to Kubernetes and GCP so I'm not sure how to approach this.
Here is the CPU chart for one of the containers after a cluster upgrade earlier today:
Everything runs in the default namespace and I haven't touched the default LimitRange with 100m default CPU request. Should I modify this and set limits? Given that the initialization is resource demanding, what would the proper limits be? Or do I need to upgrade the machine type with more CPU?
HPA only takes into account ready pods. Since your pods only experience a spike in CPU usage during the early stages, your best bet is to configure a readiness probe that only shows as ready once the CPU usage comes down or has a initialDelaySeconds set longer than the startup period to ensure the spike in CPU usage is not taken into account for the HPA.

Realtime Collect GC metrics for .Net Core 2.1 App running on Linux

I want to realtime collect GC metrics from applications running on Linux in production. I have an assumption that it is not possible to obtain reliable data about the application memory state(all memory types) running from docker or kubernetes. Accordingly, the data received from the GC may also be not unreliable or completely unreliable.
Perhaps someone faced with similar issues and can share experiences
Very grateful for any answer.

High CPU utilisation in websphere liberty

We are migrating from websphere application server to websphere liberty.
When our application is deployed in WAS, the CPU utilisation is 8%. The same application when deployed in WLP, the CPU utilisation is more than 50% and was fluctuating.
Can anyone advise how to debug this issue and which parameters to check to minimise the CPU utilisation.
My advice would be to use your favorite monitoring / profiling tool:
Check that your application isn't spending a lot of time garbage collecting. That could be a sign of the heap being too small, or another GC tuning problem.
Check which non-GC threads are using a lot of time. Does that tell you something unexpected?
Profile the code to look for performance hotspots.
Without knowing the cause, we can't suggest JVM parameter changes.
I hope you have verified its the liberty process hogging on the CPU.
Can you turn on the verbose GC in liberty profile and see the logs for GC.

Resources