Gemfire cluster suddenly goes down because of ClusterConfigurationNotAvailableException: Unable to retrieve cluster configuration from the locator
We have a 2 locator and 2 server Gemfire cluster. We bootstrap Gemfire cache server using cache.xml and spring data gemfire xml using spring boot initializer.
We have a client spring boot service which connect to cluster.
Gemfire cluster suddenly goes down randomly due to ClusterConfigurationNotAvailableException: Unable to retrieve cluster configuration from the locator. What could be the reason for it?. After restart it works fine for a day or 2 without issues and then this issue comes. It impacts our High availability. Please help us fixing this.
org.apache.geode.GemFireConfigException: cluster configuration service not available
at org.apache.geode.internal.cache.GemFireCacheImpl.requestSharedConfiguration(GemFireCacheImpl.java:1025)
at org.apache.geode.internal.cache.GemFireCacheImpl.initialize(GemFireCacheImpl.java:1149)
at org.apache.geode.internal.cache.GemFireCacheImpl.basicCreate(GemFireCacheImpl.java:758)
at org.apache.geode.internal.cache.GemFireCacheImpl.create(GemFireCacheImpl.java:735)
at org.apache.geode.distributed.internal.InternalDistributedSystem.reconnect(InternalDistributedSystem.java:2748)
at org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect(InternalDistributedSystem.java:2518)
at org.apache.geode.distributed.internal.InternalDistributedSystem.disconnect(InternalDistributedSystem.java:993)
at org.apache.geode.distributed.internal.DistributionManager$MyListener.membershipFailure(DistributionManager.java:4354)
at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.uncleanShutdown(GMSMembershipManager.java:1556)
at org.apache.geode.distributed.internal.membership.gms.mgr.GMSMembershipManager.lambda$forceDisconnect$0(GMSMembershipManager.java:2593)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.geode.internal.config.ClusterConfigurationNotAvailableException: Unable to retrieve cluster configuration from the locator.
at org.apache.geode.internal.cache.ClusterConfigurationLoader.requestConfigurationFromLocators(ClusterConfigurationLoader.java:259)
at org.apache.geode.internal.cache.GemFireCacheImpl.requestSharedConfiguration(GemFireCacheImpl.java:988)
... 10 more
Expected behavior is high availability of Gemfire cluster
By default, whenever a GemFire server starts up (or automatically reconnects to the cluster after an unexpected shutdown), it tries to recover the Cluster Configuration from any locator, if it fails to do so then the member will just shutdown itself, which is what's happening looking at the stack trace attached (see the occurrence of org.apache.geode.distributed.internal.InternalDistributedSystem.tryReconnect in the stack). I'd focus my analysis in why the member was disconnected in the first place, the subsequent failure to reconnect is just a consequence and not the root cause of the issue.
Either way, if you're just using individual xml files to configure your members and don't want to use the Cluster Configuration Service at all, then you can just start your locator with the property --enable-cluster-configuration=false (the default is true) and your servers with --use-cluster-configuration=false (the default is also true), this will prevent the servers from trying to start up using the cluster configuration from the locators.
Hope this helps. Cheers.
Related
I have a application using Redis. This system implemented with java spring used jedis package for connection to the redis with the configuration as follow
jedis.pool.host=redisServer-IP
so the application connect to redis server on the redisServer-IP and works fine but, for the lack of memory on a single server and and HA capability I need to use a redis cluster I used docker compose to create a redis cluster using the here.
Also redis cluster working fine with three masters and three replicas.
I just need to understand, the Redis Cluster can work with the single endpoint, because I can only set single endpoint in the above jedis.pool.host configuration, or I need to have a proxy to deal with the redis cluster ?
NOTE: I can not make any changes in my application
I have two docker instances that I launch with docker-compose.
One holds a Cassandra instance
One holds a Spring Boot application that tries to connect to that application.
However, the Spring Boot application will always fail, because it's trying to connect to a Cassandra instance that is not ready yet to take connections.
I have tried:
Using restart:always in Docker-compose
This still doesn't always work, because the Cassandra might be up 'enough' to no longer crash the Spring Boot application, but not up 'enough' to have successfully created the Table/Column family. On top of that, this is a very hacky solution.
Using healthcheck
It seems like healthcheck in compose doesn't have restart capabilities
Using a bash script as entrypoint
In the hope that I could use netstat,ping,... whatever to determine that readiness state of Cassandra
Right now the only thing that really works is using that same bash script and sleep the process for x seconds, then start the jar. This is even more hacky...
Does anyone have an idea on how to solve this?
Thanks!
Does the spring boot service defined in the docker-compose.yml depends_on the cassandara service? If yes then the service is started only if the cassandra service is ready.
https://docs.docker.com/compose/compose-file/#depends_on
Take a look at this github repository, to find a healthcheck for the cassandra service.
https://github.com/docker-library/healthcheck
CONCLUSION
After some discussion we found out that docker-compose seems not to provide a functionality for waiting until services are up and healthy, such as Kubernetes and Openshift provide (See comments below). They recommend to use wrapper script (docker-entrypoint.sh) which waits for the depending service to come up, which make binaries necessary, the actual service shouldn't use such as the cassandra client binary. Additionally the service depending on cassandra could never get up if cassandra doesn't, which shouldn't happen.
A main thing with microservices is that they have to be resilient for failures and are not supposed to die or not to come up if a depending service is currently not available or unexpectedly disappears. Therefore the microservice should be implemented in a way so that it retries to get connection after startup or an unexpected disappearance. Unexpected is a word actually wrongly used in this context, because you should always expect such issues in a distributed environment, and even with docker-compose you will face issues like that as discussed in this topic.
The following link points to a tutorial which helped to integrate cassandra properly into a spring boot application. It provides a way to implement the retrieval of a cassandra connection with a retry behavior, therefore the service is resilient to a non existing cassandra database and will not fail to start anymore. Hope this helps others as well.
https://dzone.com/articles/containerising-a-spring-data-cassandra-application
I am setting up an application connecting to mongoDB with high availability.
I have studied the documentation and setup the replica set successfully through
spring.data.mongodb.uri=mongodb://user:secret#mongo1.example.com:12345,mongo2.example.com:23456/test
As the application property file is fixed, the application is required to restart if I change the spring.data.mongodb.uri.
What if I have a new replica member in mongo, should I need to restart my application with the update in application property?
Or, is it fair enough to use the old configuration? Mongo driver will automatically connect to the new replica member for me with the old configuration.
If you are loading properties from the file you need to restart the application once the property is updated.
Otherwise, you need to use some global property management apps like consul which when the properties are changed it will reload the properties value in the application(#RefreshScope).
In your case, once the property is changed you need to disconnect and reconnect to the mongodb by code.
I want to use ZooKeeper in order to synchronize my distributed services via ZooKeeper ephemeral nodes.
The idea is the following - every node in the topology on the startup will create ZooKeeper session and ephemeral nodes. On the node restart or failure, these nodes will disappear.
I'm going to implement it using Spring Boot. Right now I'm in doubt what project and Maven dependency to use in order to have ZooKeeper client autoconfiguration, be able to create ZooKeeper session on the application startup, be able to create from this client - ZooKeeper ephemeral nodes and use ZooKeeper transactions.
Right now I'm looking on Spring Cloud Zookeeper/ but I'm not sure is it a right one for this purpose. Could you please point me to the right Spring Boot ZooKeeper project and show the small example how to achieve that I have described above.
I want to create a distributed cluster in spring xd.
I am able to create a cluster with single admin, one zookeeper, one instance of redis and hsqldb.
But when i'm trying to do that with multiple instance of zookeeper , hsqldb, redis ,i'm not able to configure it correctly.
You should only have a single instance of zookeeper, hsqldb and redis. All xd-admins should be configured to connect to the same instance of each of these services and so should the xd-containers be.
Like Thomas has mentioned, the idea is that you have your (multiple) instances of admin and containers deployed, and all connect to the same zk,redis, hsqldb & rabbitmq.
Why do you want to start multiple instances of these applications?
Zookeeper provides the topology of the cluster and manages deployments. Also, it makes sure to note when nodes go up and down - avoiding single point of failures when you have many xd-admin instances (one is leader and the others replicate, they will become leader if the current one fails).
Or are you talking about making those instance parallel to avoid a SPOF? In that case, you should try to dedicate an entire VM for each of those applications.