CLUSTERDOWN The cluster is down in redis

CLUSTERDOWN The cluster is down in redis - caching

I am running 6 redis nodes ,3 masters and 3 slaves , every master has 1 slave .
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 172.17.0.5:6382 to 172.17.0.2:6379
Adding replica 172.17.0.6:6383 to 172.17.0.3:6380
Adding replica 172.17.0.7:6384 to 172.17.0.4:6381
The clustering is running and I can SET and GET Keys.
I shutdown master1 172.17.0.2:6379 , slave1 (172.17.0.5:6382) has became master ,cluster is still running .
I shutdown slave1 (172.17.0.5:6382) , I tried to SET keys I have got this error
(error) CLUSTERDOWN The cluster is down
What I expected when I shutdown master1 and slave1 , cluster will still running and accepts redis operations but the opposite has happened.
What is the reason behind this ?
Is it applicable to solve this problem without starting master1 or slave1 again ?

Because some slots are stored in master1 and slave1, if both of them are down, these slots will no longer been covered by any node in the cluster. When this happens, by default, the cluster is down. You can modify the behavior by changing the cluster-require-full-coverage option.
Quote from redis.conf:
By default Redis Cluster nodes stop accepting queries if they detect
there is at least an hash slot uncovered (no available node is serving
it). This way if the cluster is partially down (for example a range of
hash slots are no longer covered) all the cluster becomes, eventually,
unavailable. It automatically returns available as soon as all the
slots are covered again.
However sometimes you want the subset of the cluster which is working,
to continue to accept queries for the part of the key space that is
still covered. In order to do so, just set the
cluster-require-full-coverage option to no.
cluster-require-full-coverage yes
UPDATE:
In order to ensure all slots are covered, normally, you can set up a cluster with N masters and N + 1 slaves. Then assign a slave for each master, i.e. N -> N. The extra slave can replicate data from a random master. When one of you master is down, the corresponding slave will become the new master. Then you can make the extra slave to replicate data from the new master.
In a word, you must ensure each master has at least one slave at any time.

Related

Clustered elasticsearch setup (two master nodes)

We are currently setting up an environment with two elasticsearch instances (clustered servers).
Since it's clustered, we need to make sure that data (indexes) are synched between the two instances.
We do not have the possibility to setup an additional (3rd) server/instance to act as the 'master'.
Therefore we have configured both instances as master and data nodes. So instance 1 is master & node and instance 2 is also master & node.
The synchronization works fine when both instances are up and running. But when one instance is down, the other instance keeps trying to connect with the instance that is down, which obviously fails because the instance is down. Therefore the node that is up is also not functioning anymore, because it can not connect to his 'master' node (which is the node that is down), even though the instance itself is also a 'master'.
The following errors are logged in this case:
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/2/no master];
org.elasticsearch.transport.ConnectTransportException: [xxxxx-xxxxx-2][xx.xx.xx.xx:9300] connect_exception
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: xx.xx.xx.xx/xx.xx.xx.xx:9300
In short: two elasticsearch master instances in a clustered setup. When one is down, the other one does not function because it can not connect to the 'master' instance.
Desired result: If one of the master instances is down, the other should continue functioning (without throwing errors).
Any recommendations on how to solve this, without having to setup an additional server that is the 'master' and the other 2 the 'slaves'?
Thanks

To be able to vote, masters must be a minimum of 2.
That's why you must have a minimum of 3 master nodes if you want your cluster to resist to the loss of one node.
You can just add a specialized small master node by settings all other roles to false.
This node can have very few resources .
As describe in this post :
https://discuss.elastic.co/t/master-node-resource-requirement/84609
Dedicated master nodes need persistent storage, but not a lot of it. 1-2 CPU cores and 2-4GB RAM is often sufficient for smaller deployments. As dedicated master nodes do not store data you can also set the heap to a higher percentage (75%-80%) of total RAM that is recommended for data nodes.

If there are no options to increase 1 more node then you can set
minimum_master_nodes=1 . This will let your es cluster up even if 1 node is up. But it may lead to split brain issue as we restricted only 1 node to be visible to form cluster.
In that scenario you have to restart the cluster to resolve split brain.
I would suggest you to upgrade to elasticsearch 7.0 or above. There you can live with two nodes each master eligible and split brain issue will not come.

You should not have 2 master eligible nodes in the cluster as its a very risky thing and can lead to split brain issue.
Master nodes doesn't require much resources, but as you have just two data nodes, you can still live without having a dedicated master nodes(but please aware that it has downsides) to just save the cost.
So simply, remove master role from another node and you should be good to go.

Cluster health - On Premise Installation

I'm testing Azure Service Fabric On Premise functionalities and I've some troubles with the cluster installed with the default configuration files provided.
As soon as some of the nodes are offline (I shutdown the host), all the cluster became unresponsive (for example : the Service Fabric Explorer became unavailable on all nodes IPs).
For example :
If I create a 3 nodes cluster (BRONZE), all the cluster became unavailable when I shutdown one node
If I create a 5 nodes cluster (same behavior with BRONZE and SILVER model), all the cluster became unavailable when I shutdown three nodes
If I create a 6 nodes cluster, all the cluster became unavailable when I shutdown three nodes
I also test to disable nodes with Power-Shell after to shutdown it, but the result is the same.
I was thinking that as long as one node was still running, the cluster will be continue to work. But it seems, that the cluster became unavailable as soon as there is 50% of the nodes off, and that the cluster needs a minimum of 3 nodes to operate.
Is it the normal behavior or can I change the configuration ? How can I change it on a On-Premise installation ?
Regards

The minimum size of VMs for the primary node type is determined by the durability tier you choose.
The amount of nodes you can loose is determined by quorum.
Three nodes: with three nodes (N=3), the requirement to create a
quorum is still two nodes (3/2 + 1 = 2). This means that you can lose
an individual node and still maintain quorum
(So your remark about the 3 node cluster doesn't match with the documentation. Are you sure it really became unavailable, not just unhealthy?)

discovery.zen.minimum_master_nodes value for a cluster of two nodes

I have two dedicated Windows Servers (Windows Server 2012R2, 128GB memory on each server) for ES (2.2.0). If I have one node on each server and the two nodes form a cluster. What is the proper value for
discovery.zen.minimum_master_nodes
I read this general rule in elasticsearch.yml:
Prevent the "split brain" by configuring the majority of nodes (total
number of nodes / 2 + 1):
I saw this SO thread:
Proper value of ES_HEAP_SIZE for a dedicated machine with two nodes in a cluster
There is an answer saying:
As described in Elasticsearch Pre-Flight Checklist, you can set
discovery.zen.minimum_master_nodes to at least (N/2)+1 on clusters
with N > 2 nodes.
Please note "N > 2". What is the proper value in my case?

N is the number of ES nodes (not physical machines but ES processes) that can be part of the cluster.
In your case, with one node on two machines, N = 2 (note that it was 4 here), so the formula N/2 + 1 yields 2, which means that both of your nodes MUST be eligible as master nodes if you want to prevent split brain situations.
If you set that value to 1 (which is the default value!) and you experience networking issues and both of your nodes can't see each other for a brief moment, each node will think it is alone in the cluster and both will elect themselves as master. You end up in a situation where you have two masters and that's not a good thing. Whereas if you set that value to 2 and you experience networking issues, the current master node will stay elected and the second node will never decide to elect itself as new master. Whenever network is back up, the second node will rejoin the cluster and continue serving requests.
The ideal topology is to have 3 dedicated master nodes (i.e. with master: true and data:false) and have discovery.zen.minimum_master_nodes set to 2. That way you'll never have to change the setting whatever the number of data nodes are part of your cluster.
So the N > 2 constraint should indeed be N >= 2, but I guess it was somehow implied, because otherwise you're creating a fertile ground for split brain situations.

Interestingly, in ES 7 discovery.zen.minimum_master_nodes is no longer need to be defined.
discovery.zen.minimum_master_nodes value for a cluster of two nodes
https://www.elastic.co/blog/a-new-era-for-cluster-coordination-in-elasticsearch

Elasticsearch minimum master nodes

I have a 3 node cluster with minimum_master_nodes set to 2. If I shut down all nodes except the master, leaving one node online, the cluster is no longer operational.
Is this by design? It seems like the node that was the master shouldd remain operational, instead I get errors like this:
{"error":"MasterNotDiscoveredException[waited for [30s]]","status":503}
All the other settings are stock and I am using the aws cloud plugin.

Yes, this is intentional.
Split brain
Imagine a situation where the other 2 nodes were still running but couldn't communicate to the the third node - you'd end up with two clusters otherwise known as a "split brain".
As the two clusters could be updating and deleting data independently of each other then recovery would be very difficult - you wouldn't have a single source of truth for the data.
By setting minimum_master_nodes to (n/2)+1 (were n is the number of nodes) you can prevent a split brain.
Single Node
If you know that the first two nodes have definitely died and not coming back - you can set the minimum_master_nodesto 1 on the remaining node (and also set to one on the other nodes before you restart them).
There is also an option no master block that lets you control what happens when you don't have a valid cluster - e.g. you could make the remaining node read-only until the cluster is re-established.

Remove a node of Hadoop which is NameNode too

I recently created a cluster with five servers :
master
node01
node02
node03
node04
To have more "workers" I added the Nademode to the list of slaves in /etc/hadoop/slaves.
This works, the master perfoms some mapReduce jobs.
Today I want to remove this node from the workers list (this is too much CPU intensive for it). I want to set dfs.exclude in my hdfs-site.xml but I worried about the fact this is also the master server.
COuld someone confirm me that there is no risks to perform this operation ?
Thanks,
Romain.

If there is data stored in the master node (as there probably is because it's a DataNode), you will essentially lose that data. But if your replication factor is more than 1 (3 is the default), then it doesn't matter as Hadoop will notice that some data is missing (under-replicated) and will start replicating it again on other DataNodes to reach the replication factor.
So, if your replication factor is more than 1 (and the cluster is otherwise healthy), you can just remove the master's data (and make it again just a NameNode) and Hadoop will take care of the rest.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio