Apache Ignite: Possible too long JVM pause: 714 milliseconds - spring-boot

I have a setup of Apache Ignite server and having SpringBoot application as client in a Kubernetes cluster.
During performance test, I start to notice that the below log showing up frequently in SpringBoot application:
org.apache.ignite.internal.IgniteKernal: Possible too long JVM pause: 714 milliseconds
According to this post, this is due to "JVM is experiencing long garbage collection pauses", but Infrastructure team has confirmed to me that we have included +UseG1GC and +DisableExplicitGC in the Server JVM option and this line of log only show in SpringBoot application.
Please help on this following questions:
Is the GC happening in the Client(SpringBoot application) or Server node?
What will be that impact of long GC pause?
What should I do to prevent the impact?
Do I have to configure the JVM option in SpringBoot application as well?

Is the GC happening in the Client(SpringBoot application) or Server node?
GC error will be logged to the log of the node which suffers problems.
What will be that impact of long GC pause?
Such pauses decreases overall performance. Also if pause will be longer than failureDetectionTimeout node will be disconnected from cluster.
What should I do to prevent the impact?
General advises are collected here - https://apacheignite.readme.io/docs/jvm-and-system-tuning. Also you can enable GC logs to have full picture of what happens.
Do I have to configure the JVM option in SpringBoot application as well?
Looks like that you should, because you have problems with client's node.

Related

503 error on server load tests on Wildfly server on Jelastic

I have an app deployed on a wildfly server on the Jelastic PaaS. This app functions normally with a few users. I'm trying to do some load tests, by using JMeter, in this case calling a REST api 300 times in 1 second.
This leads to around 60% error rate on the requests, all of them being 503 (service temporarily unavailable). I don't know what things I have to tweak in the environment to get rid of those errors. I'm pretty sure it's not my app's fault, since it is not heavy and i get the same results even trying to test the load on the Index page.
The topology of the environment is simply 1 wildfly node (with 20 cloudlets) and a Postgres database with 20 cloudlets. I had fancier topologies, but trying to narrow the problem down I cut the load balancer (NGINX) and the multiple wildfly nodes.
Requests via the shared load balancer (i.e. when your internet facing node does not have a public IP) face strict QoS limits to protect platform stability. The whole point of the shared load balancer is it's shared by many users, so you can't take 100% of its resources for yourself.
With a public IP, your traffic goes straight from the internet to your node and therefore those QoS limits are not needed or applicable.
As stated in the documentation, you need a public IP for production workloads (a load test should be considered 'production' in this context).
I don't know what things I have to tweak in the environment to get rid of those errors
we don't know either and as your question doesn't provide sufficient level of details we can come up only with generic suggestions like:
Check WildFly log for any suspicious entries. HTTP 503 is a server-side error so it should be logged along with the stacktrace which will lead you to the root cause
Check whether Wildfly instance(s) have enough headroom to operate in terms of CPU, RAM, et, it can be done using i.e. JMeter PerfMon Plugin
Check JVM and WildFly specific JMX metrics using JVisualVM or the aforementioned JMeter PerfMon Plugin
Double check Undertow subsystem configuration for any connection/request/rate limiting entries
Use a profiler tool like JProfiler or YourKit to see what are the slowest functions, largest objects, etc.

Kubernetes Pod CPU Throttling after few (3-4) runs

I am having a spring boot application with spring-kafka version 2.5.
Our application is containerized in Docker and deployed in Kubernetes cluster.
This consumer fetches a record and performs database aggregation where I am using EclipseLink as a persistence service and Oracle database.
I am publishing batches of 100k message to this consumer topic. What we are observing is for initial few batches CPU usage of application is below 500milli core. But after few runs, CPU spikes up to its limit 2000milli core and even it throttles for more 2000milli core. There is no change done in between runs which will increase cpu usage. And the only solution we figured out is to redeploy our application in order to reduce cpu usage and then again it behaves same way.
We have performed JVM profiling and couldn't found any issue at Memory or Garbage collection.
My CPU request is 1000milli core and CPU limit is set to 2000milli core.
Please help me in finding what could be the root cause of this weird cpu throttling issue after few runs.
Appreciate your time.
Please find attached screenshot of CPU usage dashboard which shows sudden spike after some time for CPU usage.
My suspect is some configuration required at Kubernetes level to resolve this issue.

Performance testing bottneck microsercice

I am using jmeter to load test my APi server(running on tomcat) which inturn calls a micrroservicr using thrift.(20k requests/min)
I am using new relic for monitoring . I have observed that a an abnormally high time is spent when API calls the microservice(ranging from 10-15seconds).So I observed the microservice over the same duration. The response time was almost negligible.(10-12 milliseconds)
So, I suspected probably API is queueing up the responses because it is unable to accept the rate at which its receiving response from the microservice.To address the same I doubled Xmx and Xms value of my API java application.
Still am observing the same , what could be the bottleneck which I am missing out.
Make sure that your API running on Tomcat has enough headroom in terms of CPU, Ram, Network, Disk, etc. as it might be slowing the things down. You can use JMeter PerfMon Plugin for this
Make sure that Tomcat itself is configured for high loads as the threads might be queuing up on Tomcat HTTP Connector, i.e. if threads in executor are less than the number of connections you establish - the requests will be queuing up even before reaching your API
Re-run your test using profiler tools telemetry, i.e. set up JProfiler or YourKit monitoring - this way you will learn where your API spends the most of time and what is the underlying reason

Hazelcast Management Centre frequently gets down

I have three node cluster of Hazelcast version 3.7.5 deployed in Windows and deployed the mancenter-3.7.5 in tomcat server(7.0.70) as Windows Service.
The mancenter service is getting frequently down. On monitoring the tomcat server it is found that the heap consumption is increasing gradually which never falls down. GC is happening intermittently but the heap consumption grows very fast which suspends the application completely.
This happens when the data is flowing into map with heavy traffic.
Anyone have idea of like suggested Tomcat version for 3.7.5 of mancenter and also the jvm options. Any quick answer is very helpful.

Log file writing extremely delayed in WebSphere App Server

I am experiencing an issue with delayed writes to the application logs for a Java EE web application running in IBM WebSphere v. 7.x. Logging statements taking up to an hour to appear in the application logs.
The problem doesn't appear related to heavy loads; WAS is responding to page requests almost instantly, and I am testing against a box that isn't used for performance testing, and on a holiday no less -- there is very little activitiy on the server.
My guess would be that the thread associated with logging has been configured with very low priority, but I cant figure out where that would be configured via the admin console or the configuration files.
Has anyone else experienced this sort of issue with WebSphere?
it's possible you don't even enough available threads in the thread pool. Its consistant with the page requests being fast, as they are controlled by the WebContainer threads.
Try increasing it:
Servers > Application Servers > Thread pools > ...
Not sure exactly which one to increase its max value. In worst case, increase'em all. Increase it heavily, so to be sure.
Other options:
make sure you enough disk space / try to connect with jConsole to inquire.

Resources