what is the role of elected master elasticsearch (ELK Stack) - elasticsearch

What is the main purpose of elected master in ELK. If elected master to have only node.master enabled, node.data disabled and not allowed to take any searching and indexing requests.
I have 3 node cluster in which 1 is elected master. If I have kibana as front end UI for querying data and logstash sending data to the cluster for indexing (for realtime log analysis) , Is it a good idea to send searching/indexing requests to other 2 master nodes apart from elected master. or select 1 node for searching or another node for indexing leaving elected master untouched. Please advice.
Please suggest me what would be the best plan
Plan A
or
Plan B
or Plan C

Elected master is a node elected between data nodes and the function of master is to maintain cluster state. Cluster state is the data which Has information on the entire cluster . What nodes are present , what indices are present and which shards are in which nodes etc are stored in cluster state , though only master is allowed to make any changes to cluster state , every node will have a copy of it. This removes single dependency of master and make it possible for any node to become master.
Now as the function of master is thin , it doesn't make sense to have dedicated masters until and unless you have crazy number of nodes.

Related

elasticsearch 7.X cluster with specified master node

I have 3 elastic node , How can I cluster there three nodes with always same master node , I didn't find any good docs about new elastic 7 way of specify discovery and master node:
discovery.seed_hosts: [ ]
cluster.initial_master_nodes: []
for example I have node a, b, c and I want node a to be master what what should be discovery.seed_hosts and cluster.initial_master_nodes for master node and child nodes
UPDATE
with using Daniel answer , and checking ports are open and node have same cluster name , other nodes didn't join cluster, is there any additional config needed?
UPDATE 2
looks like nodes found each other but for some reason can't choose master node with election:
master not discovered or elected yet, an election requires 2 nodes
with ids [wOZEfOs9TvqGWIHHcKXtkQ, Cs0xaF-BSBGMGB8a-swznA]
Solution
Deleting folder data of all nodes start a node and then add other nodes with first node (as master) as seed host.
Elasticsearch allows you to specify the role of a node. A node (an instance of Elasticsearch) can serve as a coordinating node, master node, voting_only node, data node, ingest node or machine learning node.
With respect to master nodes you can only configure which nodes potentially can become the (active) master, but you cannot specify which one of the so-called master-eligible nodes will be the active master node.
The only exception to this is when you only configure one master-eligible node, then obviously only this one can become the active master. But be aware that in order to get true high availability you need to have at least 3 master-eligible nodes (this ensures that your cluster will still be 100% operational even when losing one of the master-eligible nodes).
Therefore Elastic always recommends to configure 3 or 5 nodes in your cluster as master-eligible nodes. You can configure that role via the node.master property in the Elasticsearch.yml-file. Setting it to true (default) allows that node to become master, while false will ensure that this node never ever will become master and also will not participate in the master election.
Over the life-time of your cluster (master-eligible) nodes might get added and removed. Elasticsearch automatically manages your cluster and the master node election process with the ultimate goal to prevent a split brain scenario from happening, meaning you eventually end up having 2 clusters which go by the same name but with independent master nodes. To prevent that from happening when starting up your cluster for the very first time (bootstrapping your cluster) Elastic requires you to configure the cluster.initial_master_nodes property with the names of the nodes that initially will serve as master-eligible nodes. This property only needs to be configured on nodes that are master-eligible and the setting will only be considered for the very first startup of your cluster. As values you put in the names as configured with the node.name property of your master-eligible nodes.
The discovery.seed_hosts property supports the discovery process which is all about enabling a new node to establish communication with an already existing cluster and eventually joining it when the cluster.name matches. You are supposed to configure it with an array of host names (not node names!) on which you expect other instances of Elasticsearch belonging to the same cluster to be running. You don't need to add all 100 host names of the 100 nodes you may have in your cluster. It's sufficient to list host names of the most stable node names there. As master (eligible) nodes are supposed to be very stable nodes, Elastic recommends to put the host of all master-eligible nodes (typically 3) in there. Whenever you start/restart a node, it goes through this discovery process.
Conclusion
With a cluster made up of 3 nodes you would configure all of them as master-eligible nodes and list the 3 node names in the cluster.initial_master_nodes setting. And you would put all the 3 host names also in the discovery.seed_hosts setting to support the discovery process.
Useful information from the Elasticsearch reference:
Important discovery and cluster formation settings
Discovery and cluster formation settings
Bootstrapping a cluster

Rejoin separated data node to cluster

I have 3 nodes of elasticsearch all of them act as master-data node.
Due to connectivity issue one node leaves the cluster and promotes iteslf as master.Now i have two cluster first one with two nodes and other with one node. As all the nodes were under load balancer all nodes were receiving request from logstash.What will happen if i restart the single node cluster and try to add it back to the original cluster?
The problem that you are encountering is called split brain problem.
Here is a description of it
The problem comes in when a node falls down or there's simply a lapse
in communication between nodes for some reason. If one of the slave
nodes cannot communicate with the master node, it initiates the
election of a new master node from those it's still connected with.
That new master node then will take over the duties of the previous
master node. If the older master node rejoins the cluster or
communication is restored, the new master node will demote it to a
slave so there's no conflict. For the most part, this process is
seamless and "just works."
However, consider a scenario where you have just two nodes: one master
and one slave. If communication between the two is disrupted, the
slave will be promoted to a master, but once communication is
restored, you end up with two master nodes. The original master node
thinks the slave dropped and should rejoin as a slave, while the new
master thinks the original master dropped and should rejoin as a
slave. Your cluster, therefore, is said to have a split brain.
Reference link to it : https://qbox.io/blog/split-brain-problem-elasticsearch
To avoid this problem add this to your yml file on your master nodes : discovery.zen.minimum_master_nodes: 2
The formulae for this is : Prevent the "split brain" by configuring the
majority of nodes (total number of master-eligible nodes / 2 + 1)

How many master in three node cluster

I was stumbled at this question that how many masters can be there in a three node cluster. I came across this point in one of a article on internet that search and index requests are not to be sent to elected master. Is that correct? So , if i have three nodes acting as master(out of which one node is elected master) should i point out incoming logs to be indexed and searched onto other master nodes apart from elected master?Please clarify.Thanks in advance
In a three node cluster, all nodes most likely hold data and are master-eligible. That is the most simple situation in which you don't have to worry about anything else.
If you have a larger cluster, you can have a couple of nodes which are configured as dedicated master nodes. That is, they are master-eligible and they don't hold any data. For example you would have 3 dedicated master nodes and 7 data nodes (not master-eligible). Exactly one of the dedicated master nodes will always be the elected master.
The point is that since the dedicated master nodes don't hold data, they will not directly service index and search request. If you send an index or search request to them there's no other way for them than to delegate to one of the 7 data nodes.
From the Elasticsearch Reference for Modules - Node:
dedicated master nodes are nodes with the settings node.data: false
and node.master: true. We actively promote the use of dedicated master
nodes in critical clusters to make sure that there are 3 dedicated
nodes whose only role is to be master, a lightweight operational
(cluster management) responsibility. By reducing the amount of
resource intensive work that these nodes do (in other words, do not
send index or search requests to these dedicated master nodes), we
greatly reduce the chance of cluster instability.
A related question is how many master nodes there should be in a cluster. The answer essentially is at least 3 in order to prevent split-brain (a situation when due to a network error, two masters are elected simultaneously).
The Elasticsearch Guide has a section on Minimum Master Nodes, an excerpt:
When you have a split brain, your cluster is at danger of losing data.
Because the master is considered the supreme ruler of the cluster, it
decides when new indices can be created, how shards are moved, and so
forth. If you have two masters, data integrity becomes perilous, since
you have two nodes that think they are in charge.
This setting tells Elasticsearch to not elect a master unless there
are enough master-eligible nodes available. Only then will an election
take place.
This setting should always be configured to a quorum (majority) of
your master-eligible nodes. A quorum is (number of master-eligible
nodes / 2) + 1. Here are some examples:
If you have ten regular nodes (can hold data, can become master), a
quorum is 6.
If you have three dedicated master nodes and a hundred data nodes, the quorum is 2, since you need to count only nodes that are master eligible.
If you have two regular nodes, you are in a conundrum. A quorum would be 2, but this means a loss of one node will
make your cluster inoperable. A setting of 1 will allow your cluster
to function, but doesn’t protect against split brain. It is best to
have a minimum of three nodes in situations like this.

Elasticsearch shard relocation query - is the master node involved during shard relocation (data transfer)

For example, we have one master node running on master1
two data nodes running on server2, server3
Let us say shard relocation happening from server2 to server3
Now to copy the data folder, will elasticsearch cluster make use of master1 (which is a master node) i.e. is the data transferred directly from server2 to server3 or will it go via master1?
We would like to know this as our master1 is running low configuration machine.
No, the master node is not directly involved in the transfer of shards from one node to another. The data is copied from the source node directly to the destination node.
The master node is involved in managing global cluster state, but if it's master only it will not have any data files on it nor have data transferred to or from it:
Note, Elasticsearch is a peer to peer based system, nodes communicate
with one another directly if operations are delegated / broadcast. All
the main APIs (index, delete, search) do not communicate with the
master node. The responsibility of the master node is to maintain the
global cluster state, and act if nodes join or leave the cluster by
reassigning shards.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery.html
dedicated master nodes are nodes with the settings node.data: false
and node.master: true. We actively promote the use of dedicated master
nodes in critical clusters to make sure that there are 3 dedicated
nodes whose only role is to be master, a lightweight operational
(cluster management) responsibility. By reducing the amount of
resource intensive work that these nodes do (in other words, do not
send index or search requests to these dedicated master nodes), we
greatly reduce the chance of cluster instability.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-node.html

What happens if an ElasticSearch node/index/shard gets corrupted

I'm new to ES. We've recently setup a 3 node elasticsearch cluster for our Prod App. Just want to understand what would happen if ElasticSearch node or index or shard gets corrupted.
Thanks!
What would happen actually depends on how you have set up your ES cluster.
With respect to DATA
If you have a singular cluster, a corruption would render your ES setup useless.You would,pretty much,need to setup everything from scratch.
If you have multiple nodes in your cluster,there can be following scenarios-
If you configure a single node as data node and if that goes down,you would have the cluster running but queries would not return any result. You will then need to re-configure a node to behave as data node and restart the cluster.
If you have multiple nodes designated as data node,then a corruption/failure of a node will only affect that node.The rest of the nodes and the ES will in essence perform as usual. The only effect is that the data stored in the corrupted node will obviously be not available. The shards in the corrupted node will become unassigned shards and have to be reassigned to some other data node.
If you have replicas enabled,then there will be no effects in term of data loss. It would simply require the unassigned shards to be re-assigned to some new data node(if and when it is added).
Its best to have a multi-node cluster with at least 2 data nodes and replicas enabled to mitigate shards/data nodes corruption.
This Stackoverflow post explains shards and replicas in an excellent way.
Edit 1:
This is in response to your comment.
Default settings dictate that each node is master eligible and also stores data and hence,each of your nodes can become Master and will also store data.
Lets consider nodes as A,B & C.
Initially, one of them will be designated as master node,e.g. Node A.
Now if Node A goes down,one of the remaining nodes (B & C) will become the master nodes. Queries will now only return results from data stored in Node B & C.
Check out this page for more insights into how cluster works
One way is to you need to take incremental snapshots of your indices and need to restore from that snapshot.

Resources