Hazelcast Management Centre frequently gets down - tomcat7

I have three node cluster of Hazelcast version 3.7.5 deployed in Windows and deployed the mancenter-3.7.5 in tomcat server(7.0.70) as Windows Service.
The mancenter service is getting frequently down. On monitoring the tomcat server it is found that the heap consumption is increasing gradually which never falls down. GC is happening intermittently but the heap consumption grows very fast which suspends the application completely.
This happens when the data is flowing into map with heavy traffic.
Anyone have idea of like suggested Tomcat version for 3.7.5 of mancenter and also the jvm options. Any quick answer is very helpful.

Related

Apache Ignite: Possible too long JVM pause: 714 milliseconds

I have a setup of Apache Ignite server and having SpringBoot application as client in a Kubernetes cluster.
During performance test, I start to notice that the below log showing up frequently in SpringBoot application:
org.apache.ignite.internal.IgniteKernal: Possible too long JVM pause: 714 milliseconds
According to this post, this is due to "JVM is experiencing long garbage collection pauses", but Infrastructure team has confirmed to me that we have included +UseG1GC and +DisableExplicitGC in the Server JVM option and this line of log only show in SpringBoot application.
Please help on this following questions:
Is the GC happening in the Client(SpringBoot application) or Server node?
What will be that impact of long GC pause?
What should I do to prevent the impact?
Do I have to configure the JVM option in SpringBoot application as well?
Is the GC happening in the Client(SpringBoot application) or Server node?
GC error will be logged to the log of the node which suffers problems.
What will be that impact of long GC pause?
Such pauses decreases overall performance. Also if pause will be longer than failureDetectionTimeout node will be disconnected from cluster.
What should I do to prevent the impact?
General advises are collected here - https://apacheignite.readme.io/docs/jvm-and-system-tuning. Also you can enable GC logs to have full picture of what happens.
Do I have to configure the JVM option in SpringBoot application as well?
Looks like that you should, because you have problems with client's node.

Session Persistence Hazelcast client initialization when server is offline

We are trying to replicate the WebSphere Traditional (5/6/7/8/9) behaviour about session persistance for servlets and http, but with Hazelcast and Tomcat. Let me explain...
WebSphere, even when configured as client to a replication domain, keeps a local register of session data. And this local register works fine even if the server processes that should keep replicated data are shutdown from the very first moment. That is, you start the client, and session persistence works within the servlet container. Obviously, you cannot expect to recover your session in another servlet container if the first one crashes, but your applications work anyway.
On the other hand, Hazelcast client on Tomcat containers expect the Hazelcast server (at least one member of the cluster) to be up and running to initialize. If no cluster member is available, initialization fails, and ... web applications in the Tomcat servlet container do not start right. They won't answer any request.
Furthermore, once initialization fails, only way to recover is to shutdown and re-start the tomcat web containers (once a hazelcast cluster member is online).
This behaviour is a bit harsh on system administrators: no one can guarantee that a backup service as distributed session persistence is online all time. That means that launching a Tomcat client becomes a risky task, with a single point of failure by design, which is undesirable.
Now, maybe I overlooked something, maybe I got something wrong. So, ¿Did someone ever managed to start a Hazelcast client without servers, and how? For us, the difference is decisive: if we cannot make the web container start with the hazelcast server offline, then we must keep going on with WebSphere.
We have been trying it on a CentOS 7.5 on Virtual Box 5.2.22, and our Tomcat version is 8.5. Hazelcast client and server is 3.11.1/2.
<group>
<name>Integracion</name>
<password></password>
</group>
<network>
<cluster-members>
<address>hazelcastsrv1/address>
<address>hazelcastsrv2</address>
</cluster-members>
</network>
Sadly, we expect exactly what we get: the reading of the Hazelcast manual suggest that offline servers won't allow tomcat to serve applications. But we cannot beleive what we read, because it makes the library unsafe in a distributed context. We expect to be wrong, and that there are good news around the corner.
Hazelcast is not "a single point of failure by design". The design is to avoid a single point of failure. Data is mirrored across the nodes by default.
It's a data grid, you run as many nodes as capacity and resilience requires, and they cluster together.
If you need 3 nodes to be up for successful operations, and also anticipate that 1 might go down, then you need to run 4 in total. Should that 1 failure happen, you have a cluster surviving that is big enough.
Power-on/Power-off order is not relevant in Hazelcast, as long as you are providing remaining nodes, during power-off, enough time to let repartitioning complete. For example, in a 4 nodes cluster, if you take out 1 node and give the other 3 room to complete repartitioning then you dont loose the data. If you take out 2 nodes together then the cluster will be without the data whose backup was stored on 1 of the 2 nodes you took out.
For starting up, the startup sequence is not relevant as each node owns certain set of partitions that are determined based on consistent hashing. And this ownership continues to change even if there are nodes leaving/joining a running cluster.

web application runs much faster in embedded tomcat than in standalone tomcat

I have a spring-boot web application (mostly used through REST calls), that I can run using mvn exec that starts an embedded tomcat (8.5.11), or build a war and deploy it into a standalone tomcat (debian stock 8.5.14-1~bpo8+1). Both are configured the same way, using
To our utmost surprise, the embedded tomcat seems to be much faster for high loads (a small test sequence with 200+ threads using jmeter). At 600 threads, for example:
The standalone tomcat has very large response times, while having a relatively low load of 50-70 (the server has 64 cores and can run 128 threads), and a low IO usage.
The embedded tomcat has a load of 150-200 and faster response times, and high I/O usage (it seems that the database is the limiting factor here, but it degrades gracefully: 600 threads results in double as slow as 300 threads).
Supposedly, the configuration is the same for both tomcats, so currently I am quite troubled because of this. I really would not like to run embedded tomcat in production if I can help it.
Does anyone have an idea:
what the cause for this performance disparity may be, and
how we can reliably compare the configuration for two tomcats?
Update
I ran some more tests and discovered a significant difference after looking through the Garbage Collector logs: with 600 jmeter threads, the embedded tomcat spent about 5% of its time GCing, while the standalone tomcat spent about 50% of its time GCing. I calculated these numbers with an awk script, so they may be a bit mis-parsed, but manually checking the GC logs seems to corroborate them. It still does not explain why one of them is GCing all the time and the other is not...
One more update
I managed to speed up the standalone tomcat by switching the garbage collector to G1. Now, it uses about 20% of elapsed time for garbage collection, and never exceeds 1s for any single GC run. Now the standalone tomcat is only 20-30% slower than the embedded tomcat. Interestingly, using G1 in the embedded tomcat had no real effect on its performance, GC overhead is still around 15% there.
This is by no means a solution, but it helped to close the gap between the two tomcats and thus now the problem is not so critical.
Check the memory parameters for your standalone Tomcat and your spring boot application, especially the java heap size.
My guess is that your standalone Tomcat has a value for Xmx set in the startup script (catalina.sh and/or setenv.sh), say for example 1 Gb, which is much lower than what your Spring Boot app is using.
If you haven't specified a value for Xmx on the command line for your spring boot app, it will default to 25% of your physical memory. If your server has 16 Gb of RAM, that'll be 4Gb...
I'd recommend running your tests again after making sure the same JVM parameters are in use (Xms, Xmx, various GC options, ...). If unsure, inspect the running VMs with jVisualVm or similar tool.

Elasticsearch load is not distributed evenly

I am facing strange issue with Elasticsearch. I have 8 nodes with same configurations (16GB RAM and 8 core CPU).
One node "es53node6" has always high load as shown in the screenshot below. Also 5-6 nodes were getting stopped yesterday automatically after every 3-4 hours.
What could be the reason?
ES version : 5.3
there can be a fair share of reasons. Maybe all data is stored on that node (which should not happen by default), maybe you are sending all the requests to this single node.
Also, there is no automatic stopping of Elasticsearch built-in. You can configure Elasticsearch that it stops the JVM process when an out-of-memory exception occurs, but this is not enabled by default as it relies on a more recent JVM.
You can use the hot threads API to check where the CPU time is spent in Elasticsearch.

Jruby Rails app on Tomcat CPU Usage spikes

This might also belong on serverfault. It's kind of a combo between server config and code (I think)
Here's my setup:
Rails 2.3.5 app running on jruby 1.3.1
Service Oriented backend over JMS with activeMQ 5.3 and mule 2.2.1
Tomcat 5.5 with opts: "-Xmx1536m -Xms256m -XX:MaxPermSize=256m -XX:+CMSClassUnloadingEnabled"
Java jdk 1.5.0_19
Debian Etch 4.0
Running top, every time i click a link on my site, I see my java process CPU usage spike. If it's a small page, it's sometimes just 10% usage, but sometimes on a more complicated page, my CPU goes up to 44% (never above, not sure why). In this case, a request can take upwards of minutes while my server's load average steadily climbs up to 8 or greater. This is just from clicking one link that loads a few requests from some services, nothing too complicated. The java process memory hovers around 20% most of the time.
If I leave it for a bit, load average goes back down to nothing. Clicking a few more links, climbs back up.
I'm running a small amazon instance for the rails frontend and a large instance for all the services.
Now, this is obviously unacceptable. A single user can bring spike the load average to 8 and with two people using it, it maintains that load average for the duration of our using the site. I'm wondering what I can do to inspect what's going on? I'm at a complete loss as to how I can debug this. (it doesn't happen locally when I run the rails app through jruby, not inside the tomcat container)
Can someone enlighten me as to how I might inspect on my jruby app to find out how it could possibly be using up such huge resources?
Note, I noticed this a little bit before, seemingly at random, but now, after upgrading from Rails 2.2.2 to 2.3.5 I'm seeing it ALL THE TIME and it makes the site completely unusable.
Any tips on where to look are greatly appreciated. I don't even know where to start.
Make sure that there is no unexpected communication between the Tomcat and something else. I would check in the first place if:
ActiveMQ broker doesn't communicate with the other brokers in your network. By default AMQ broker start in OpenWire auto-discovery mode.
JGroups/Multicasts in general do not communicate with something in your network.
This unnecessary load may result from the processing of the messages coming from another application.

Resources