How to add a node for failover in Elasticsearch - elasticsearch

I currently have single node for elasticsearch in a windows server. Can you please explain how to add one extra node for failover in different machine? I also wonder how two nodes can be kept identical using NEST.

Usually, you don't run a failover node, but run a cluster of nodes to provide High Availability.
A minimum topology of 3 master eligible nodes with minimum_master_nodes set to 2 and a sharding strategy that distributes primary and replica shards over nodes to provide data redundancy is the minimum viable topology I'd consider running in production.

Related

How many instances do I need for Amazon Elasticsearch Service?

If I need a 3-node cluster do I need to have 3 instances? Or are they created in same instance?
https://i.stack.imgur.com/4oRAI.png
If you need a 3-node cluster then you must have 3 instances. Node means a separate instance and 3 machine ElasticSearch cluster will have different jobs assigned to each node.
What is an Elasticsearch cluster?
As the name implies, an Elasticsearch cluster is a group of one or more Elasticsearch nodes instances that are connected together. The power of an Elasticsearch cluster lies in the distribution of tasks, searching and indexing, across all the nodes in the cluster.
The nodes in the Elasticsearch cluster can be assigned different jobs or responsibilities:
Data Nodes - stores data and executes data-related operations such as search and aggregation
Master Nodes - in charge of cluster-wide management and configuration actions such as adding and removing nodes
Client Nodes - forwards cluster requests to the master node and data-related requests to data nodes
Ingest Nodes - for pre-processing documents before indexing
By default, each node is automatically assigned a unique identifier, or name, that is used for management purposes and becomes even more important in a multi-node, or clustered, environment.
When installed, a single Elasticsearch node will form a new single-node cluster entitled elasticsearch but it can also be configured to join an existing cluster using the cluster name.

Elasticsearch 7.2.0: master not discovered or elected yet, an election requires at least X nodes

I'm trying to automate the process of horizontal scale up and scale down of elasticsearch nodes in kubernetes cluster.
Initially, I deployed an elasticsearch cluster (3 master, 3 data & 3 ingest nodes) on a Kubernetes cluster. Where, cluster.initial_master_nodes was:
cluster.initial_master_nodes:
- master-a
- master-b
- master-c
Then, I performed scale down operation, reduced the number of master node 3 to 1 (unexpected, but for testing purpose). While doing this, I deleted master-c, master-b nodes and restarted master-a node with the following setting:
cluster.initial_master_nodes:
- master-a
Since the elasticsearch nodes (i.e. pods) use persistant volume, after restarting the node, the master-a slowing the following logs:
"message": "master not discovered or elected yet, an election requires at least 2 nodes with ids from [TxdOAdryQ8GAeirXQHQL-g, VmtilfRIT6KDVv1R6MHGlw, KAJclUD2SM6rt9PxCGACSA], have discovered [] which is not a quorum; discovery will continue using [] from hosts providers and [{master-a}{VmtilfRIT6KDVv1R6MHGlw}{g29haPBLRha89dZJmclkrg}{10.244.0.95}{10.244.0.95:9300}{ml.machine_memory=12447109120, xpack.installed=true, ml.max_open_jobs=20}] from last-known cluster state; node term 5, last-accepted version 40 in term 5" }
Seems like it's trying to find master-b and master-c.
Questions:
How to overwrite cluster settings so that master-a won't search for these deleted nodes?
The cluster.initial_master_nodes setting only has an effect the first time the cluster starts up, but to avoid some very rare corner cases you should never change its value once you've set it and generally you should remove it from the config file as soon as possible. From the reference manual regarding cluster.initial_master_nodes:
You should not use this setting when restarting a cluster or adding a new node to an existing cluster.
Aside from that, Elasticsearch uses a quorum-based election protocol and says the following:
To be sure that the cluster remains available you must not stop half or more of the nodes in the voting configuration at the same time.
You have stopped two of your three master-eligible nodes at the same time, which is more than half of them, so it's expected that the cluster no longer works.
The reference manual also contains instructions for removing master-eligible nodes which you have not followed:
As long as there are at least three master-eligible nodes in the cluster, as a general rule it is best to remove nodes one-at-a-time, allowing enough time for the cluster to automatically adjust the voting configuration and adapt the fault tolerance level to the new set of nodes.
If there are only two master-eligible nodes remaining then neither node can be safely removed since both are required to reliably make progress. To remove one of these nodes you must first inform Elasticsearch that it should not be part of the voting configuration, and that the voting power should instead be given to the other node.
It goes on to describe how to safely remove the unwanted nodes from the voting configuration using POST /_cluster/voting_config_exclusions/node_name when scaling down to a single node.
Cluster state which also stores the master configuration stores on the data folder of Elasticsearch node, In your case, it seems it is reading the old-cluster state(which is 3 master nodes, with their ids).
Could you delete the data folder of your master-a, so that it can start from a clean cluster state and it should resolve your issue.
Also make sure, other data and ingest node have master.node:false setting as by default it's true.

About elasticsearch cluster

I need to provide many elasticSearch instances for different clients but hosted in my infrastructre.
For the moment it is only some small instances.
I am wondering if it is not better to build a big ElastSearch Cluster with 3-5 servers to handle all instances and then each client gets a different index in this cluster and each instance is distributed over servers.
Or maybe another idea?
And another question is about quorum, what is the quorum for ES please?
thanks,
You don’t have to assign each client to different index, Elasticsearch cluster will automatically share loading among all nodes which share shards.
If you are not sure how many nodes are needed, start from a small cluster then keep monitoring the health status of cluster. Add more nodes to the cluster if server loading is high; remove nodes if server loading is low.
When the cluster continuously grow, you may need to assign a dedicated role to each node. In this way, you will have more control over the cluster, easier to diagnose the problem and plan resources. For example, adding more master nodes to stabilize the cluster, adding more data nodes to increase searching and indexing performance, adding more coordinate nodes to handle client requests.
A quorum is defined as majority of eligible master nodes in cluster as follows:
(master_eligible_nodes / 2) + 1

Which Elasticsearch node is better configured in Logstash Elasticsearch output plugin and Kibana

I have ELK stack with Elasticsearch, Logstash and kibana installed on 3 different instances.
Now I want to make 3 node cluster of Elasticsearch.
I will make one node as master and 2 data nodes.
I want to know in logstash config
elasticsearch {
hosts => "http://es01:9200"
Which address I need to enter there master node or data node. and also if I have 3 master nodes then which address I need to write there.
similarly in kibana , I use
elasticsearch.url: es01:9200
In cluster env which url I need to use?
In general, the answer depends on your cluster data size and load.
Nevertheless, I'll try to answer your questions assuming the master node is not a data eligible node as well. This means it only takes care for cluster-wide actions such as creating or deleting an index, tracking which nodes are part of the cluster, and deciding which shards to allocate to which nodes. For this purposes, it is very recommended to have your master node as stable and less loaded as possible.
So, in your logstash config I would put the addresses of your two data nodes as follows:
elasticsearch{
hosts => ["http://es01:9200", "http://es02:9200"]
}
This confirmation maximize performance and fault tolerance as your master do not contain data and if one node failes it will continue to work with the other.
Please note that it is very recommended to have at least 3 master eligible nodes configured in Elasticsearch clusters since if you are loosing the (only) master node you loose data. 3 is to avoid split brain
Regarding kibana, since all nodes in the cluster "knows" each other. You basically can put any address in the cluster. But, for the same reasons as above it is recommended to fill one of your data nodes addresses.
For further reading, please refer to this documentation.
Hope I have managed to help!

How many master in three node cluster

I was stumbled at this question that how many masters can be there in a three node cluster. I came across this point in one of a article on internet that search and index requests are not to be sent to elected master. Is that correct? So , if i have three nodes acting as master(out of which one node is elected master) should i point out incoming logs to be indexed and searched onto other master nodes apart from elected master?Please clarify.Thanks in advance
In a three node cluster, all nodes most likely hold data and are master-eligible. That is the most simple situation in which you don't have to worry about anything else.
If you have a larger cluster, you can have a couple of nodes which are configured as dedicated master nodes. That is, they are master-eligible and they don't hold any data. For example you would have 3 dedicated master nodes and 7 data nodes (not master-eligible). Exactly one of the dedicated master nodes will always be the elected master.
The point is that since the dedicated master nodes don't hold data, they will not directly service index and search request. If you send an index or search request to them there's no other way for them than to delegate to one of the 7 data nodes.
From the Elasticsearch Reference for Modules - Node:
dedicated master nodes are nodes with the settings node.data: false
and node.master: true. We actively promote the use of dedicated master
nodes in critical clusters to make sure that there are 3 dedicated
nodes whose only role is to be master, a lightweight operational
(cluster management) responsibility. By reducing the amount of
resource intensive work that these nodes do (in other words, do not
send index or search requests to these dedicated master nodes), we
greatly reduce the chance of cluster instability.
A related question is how many master nodes there should be in a cluster. The answer essentially is at least 3 in order to prevent split-brain (a situation when due to a network error, two masters are elected simultaneously).
The Elasticsearch Guide has a section on Minimum Master Nodes, an excerpt:
When you have a split brain, your cluster is at danger of losing data.
Because the master is considered the supreme ruler of the cluster, it
decides when new indices can be created, how shards are moved, and so
forth. If you have two masters, data integrity becomes perilous, since
you have two nodes that think they are in charge.
This setting tells Elasticsearch to not elect a master unless there
are enough master-eligible nodes available. Only then will an election
take place.
This setting should always be configured to a quorum (majority) of
your master-eligible nodes. A quorum is (number of master-eligible
nodes / 2) + 1. Here are some examples:
If you have ten regular nodes (can hold data, can become master), a
quorum is 6.
If you have three dedicated master nodes and a hundred data nodes, the quorum is 2, since you need to count only nodes that are master eligible.
If you have two regular nodes, you are in a conundrum. A quorum would be 2, but this means a loss of one node will
make your cluster inoperable. A setting of 1 will allow your cluster
to function, but doesn’t protect against split brain. It is best to
have a minimum of three nodes in situations like this.

Resources