Geo cluster with pacemaker - quorum vs booth - high-availability

I configured a geo cluster using pacemaker and DRBD.
The cluster has 3 different nodes, each node is in a different geographic location.
The locations are pretty close to one another and the communication between them is fast enough for our requirements (around 80MB/s).
I have one master node, one slave node and the third node is an arbitrator.
I use AWS route 53 failover DNS record to do a failover between the nodes in the different sites.
A failover will happen from the master to the slave only if the slave has a quorum, thus ensuring it has communication to the outside world.
I have read that using booth is advised to perform failover between clusters/nodes in different locations - but having a quorum between different geographic locations seems to work very well.
I want to emphasize that I don't have a cluster of clusters - it is a single cluster, with each node in a different geo-location.
My question is - do I need booth in my case? If so - why? Am I missing something?

Booth helps in overlay cluster consisting of clusters running at different sites.
You have one single cluster and hence you should be okay with just Quorum.

Related

Clustered elasticsearch setup (two master nodes)

We are currently setting up an environment with two elasticsearch instances (clustered servers).
Since it's clustered, we need to make sure that data (indexes) are synched between the two instances.
We do not have the possibility to setup an additional (3rd) server/instance to act as the 'master'.
Therefore we have configured both instances as master and data nodes. So instance 1 is master & node and instance 2 is also master & node.
The synchronization works fine when both instances are up and running. But when one instance is down, the other instance keeps trying to connect with the instance that is down, which obviously fails because the instance is down. Therefore the node that is up is also not functioning anymore, because it can not connect to his 'master' node (which is the node that is down), even though the instance itself is also a 'master'.
The following errors are logged in this case:
org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/2/no master];
org.elasticsearch.transport.ConnectTransportException: [xxxxx-xxxxx-2][xx.xx.xx.xx:9300] connect_exception
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: xx.xx.xx.xx/xx.xx.xx.xx:9300
In short: two elasticsearch master instances in a clustered setup. When one is down, the other one does not function because it can not connect to the 'master' instance.
Desired result: If one of the master instances is down, the other should continue functioning (without throwing errors).
Any recommendations on how to solve this, without having to setup an additional server that is the 'master' and the other 2 the 'slaves'?
Thanks
To be able to vote, masters must be a minimum of 2.
That's why you must have a minimum of 3 master nodes if you want your cluster to resist to the loss of one node.
You can just add a specialized small master node by settings all other roles to false.
This node can have very few resources .
As describe in this post :
https://discuss.elastic.co/t/master-node-resource-requirement/84609
Dedicated master nodes need persistent storage, but not a lot of it. 1-2 CPU cores and 2-4GB RAM is often sufficient for smaller deployments. As dedicated master nodes do not store data you can also set the heap to a higher percentage (75%-80%) of total RAM that is recommended for data nodes.
If there are no options to increase 1 more node then you can set
minimum_master_nodes=1 . This will let your es cluster up even if 1 node is up. But it may lead to split brain issue as we restricted only 1 node to be visible to form cluster.
In that scenario you have to restart the cluster to resolve split brain.
I would suggest you to upgrade to elasticsearch 7.0 or above. There you can live with two nodes each master eligible and split brain issue will not come.
You should not have 2 master eligible nodes in the cluster as its a very risky thing and can lead to split brain issue.
Master nodes doesn't require much resources, but as you have just two data nodes, you can still live without having a dedicated master nodes(but please aware that it has downsides) to just save the cost.
So simply, remove master role from another node and you should be good to go.

How to deal with Split Brain with an cluster have the two number of nodes?

I am leaning some basic concept of cluster computing and I have some questions to ask.
According to this article:
If a cluster splits into two (or more) groups of nodes that can no longer communicate with each other (aka.partitions), quorum is used to prevent resources from starting on more nodes than desired, which would risk data corruption.
A cluster has quorum when more than half of all known nodes are online in the same partition, or for the mathematically inclined, whenever the following equation is true:
total_nodes < 2 * active_nodes
For example, if a 5-node cluster split into 3- and 2-node paritions, the 3-node partition would have quorum and could continue serving resources. If a 6-node cluster split into two 3-node partitions, neither partition would have quorum; pacemaker’s default behavior in such cases is to stop all resources, in order to prevent data corruption.
Two-node clusters are a special case.
By the above definition, a two-node cluster would only have quorum when both nodes are running. This would make the creation of a two-node cluster pointless
Questions:
From above,I came out with some confuse, why we can not stop all cluster resources like “6-node cluster”?What`s the special lies in the two node cluster?
You are correct that a two node cluster can only have quorum when they are in communication. Thus if the cluster was to split, using the default behavior, the resources would stop.
The solution is to not use the default behavior. Simply set Pacemaker to no-quorum-policy=ignore. This will instruct Pacemaker to continue to run resources even when quorum is lost.
...But wait, now what happens if the cluster communication is broke but both nodes are still operational. Will they not consider their peers dead and both become the active nodes? Now I have two primaries, and potentially diverging data, or conflicts on my network, right? This issue is addressed via STONITH. Properly configured STONITH will ensure that only one node is ever active at a given time and essentially prevent split-brains from even occurring.
An excellent article further explaining STONITH and it's importance was written by LMB back in 2010 here: http://advogato.org/person/lmb/diary/105.html

About elasticsearch cluster

I need to provide many elasticSearch instances for different clients but hosted in my infrastructre.
For the moment it is only some small instances.
I am wondering if it is not better to build a big ElastSearch Cluster with 3-5 servers to handle all instances and then each client gets a different index in this cluster and each instance is distributed over servers.
Or maybe another idea?
And another question is about quorum, what is the quorum for ES please?
thanks,
You don’t have to assign each client to different index, Elasticsearch cluster will automatically share loading among all nodes which share shards.
If you are not sure how many nodes are needed, start from a small cluster then keep monitoring the health status of cluster. Add more nodes to the cluster if server loading is high; remove nodes if server loading is low.
When the cluster continuously grow, you may need to assign a dedicated role to each node. In this way, you will have more control over the cluster, easier to diagnose the problem and plan resources. For example, adding more master nodes to stabilize the cluster, adding more data nodes to increase searching and indexing performance, adding more coordinate nodes to handle client requests.
A quorum is defined as majority of eligible master nodes in cluster as follows:
(master_eligible_nodes / 2) + 1

Elasticsearch one big cluster VS tribe node?

Problem descriptions:
- Multiple machines producing logs.
- On each machine we have logstash which filters the log files and sends them to a local elasticsearch
- We would like to keep the machines as separate as possible and avoid intercommunication
- But we would also like to be able to visualize all of these logs with a single Kibana instance
Approaches:
Make each machine a single node ES cluster, and have one of the machines as a tribe node with Kibana installed on this machine (of course with avoiding indices conflict)
Make all machines (nodes) part of a single cluster with each node writing to unique index of one shard and statically map each shard to its node, and finally of course having one instance of kibana for the cluster
Question:
Which approach is more appropriate for the described scenario in terms of: limiting inter machine communications, cluster management, and maybe other aspects that I haven't think about ?
Tribe node is there because of this requirements. So my advice to use the Tribe node setup.
With the second option;
There will be a cluster but you will not use its benefits (replica shards, shard relocation, query performance, etc)
Benefits mentioned above will be pain points that will generate configuration complexity and troubleshooting hell.
Besides the shard allocation and node communication there will be other things to configure that nodes will have when they are in a cluster.

How many master in three node cluster

I was stumbled at this question that how many masters can be there in a three node cluster. I came across this point in one of a article on internet that search and index requests are not to be sent to elected master. Is that correct? So , if i have three nodes acting as master(out of which one node is elected master) should i point out incoming logs to be indexed and searched onto other master nodes apart from elected master?Please clarify.Thanks in advance
In a three node cluster, all nodes most likely hold data and are master-eligible. That is the most simple situation in which you don't have to worry about anything else.
If you have a larger cluster, you can have a couple of nodes which are configured as dedicated master nodes. That is, they are master-eligible and they don't hold any data. For example you would have 3 dedicated master nodes and 7 data nodes (not master-eligible). Exactly one of the dedicated master nodes will always be the elected master.
The point is that since the dedicated master nodes don't hold data, they will not directly service index and search request. If you send an index or search request to them there's no other way for them than to delegate to one of the 7 data nodes.
From the Elasticsearch Reference for Modules - Node:
dedicated master nodes are nodes with the settings node.data: false
and node.master: true. We actively promote the use of dedicated master
nodes in critical clusters to make sure that there are 3 dedicated
nodes whose only role is to be master, a lightweight operational
(cluster management) responsibility. By reducing the amount of
resource intensive work that these nodes do (in other words, do not
send index or search requests to these dedicated master nodes), we
greatly reduce the chance of cluster instability.
A related question is how many master nodes there should be in a cluster. The answer essentially is at least 3 in order to prevent split-brain (a situation when due to a network error, two masters are elected simultaneously).
The Elasticsearch Guide has a section on Minimum Master Nodes, an excerpt:
When you have a split brain, your cluster is at danger of losing data.
Because the master is considered the supreme ruler of the cluster, it
decides when new indices can be created, how shards are moved, and so
forth. If you have two masters, data integrity becomes perilous, since
you have two nodes that think they are in charge.
This setting tells Elasticsearch to not elect a master unless there
are enough master-eligible nodes available. Only then will an election
take place.
This setting should always be configured to a quorum (majority) of
your master-eligible nodes. A quorum is (number of master-eligible
nodes / 2) + 1. Here are some examples:
If you have ten regular nodes (can hold data, can become master), a
quorum is 6.
If you have three dedicated master nodes and a hundred data nodes, the quorum is 2, since you need to count only nodes that are master eligible.
If you have two regular nodes, you are in a conundrum. A quorum would be 2, but this means a loss of one node will
make your cluster inoperable. A setting of 1 will allow your cluster
to function, but doesn’t protect against split brain. It is best to
have a minimum of three nodes in situations like this.

Resources