Should I include data nodes in discovery.zen.ping.unicast.hosts? - elasticsearch

I've just created three dedicated master nodes and one data node for my cluster.
Now, I need to configure an initial list of nodes that will be contacted to discover and form a cluster.
I included IP addresses of the three dedicated master nodes as the values of discovery.zen.ping.unicast.hosts, but should I also include the IP address of the data node?

Adding IP address of 3 master nodes is sufficient. Data node's IP Address is not necessary.

Related

elasticsearch 7.X cluster with specified master node

I have 3 elastic node , How can I cluster there three nodes with always same master node , I didn't find any good docs about new elastic 7 way of specify discovery and master node:
discovery.seed_hosts: [ ]
cluster.initial_master_nodes: []
for example I have node a, b, c and I want node a to be master what what should be discovery.seed_hosts and cluster.initial_master_nodes for master node and child nodes
UPDATE
with using Daniel answer , and checking ports are open and node have same cluster name , other nodes didn't join cluster, is there any additional config needed?
UPDATE 2
looks like nodes found each other but for some reason can't choose master node with election:
master not discovered or elected yet, an election requires 2 nodes
with ids [wOZEfOs9TvqGWIHHcKXtkQ, Cs0xaF-BSBGMGB8a-swznA]
Solution
Deleting folder data of all nodes start a node and then add other nodes with first node (as master) as seed host.
Elasticsearch allows you to specify the role of a node. A node (an instance of Elasticsearch) can serve as a coordinating node, master node, voting_only node, data node, ingest node or machine learning node.
With respect to master nodes you can only configure which nodes potentially can become the (active) master, but you cannot specify which one of the so-called master-eligible nodes will be the active master node.
The only exception to this is when you only configure one master-eligible node, then obviously only this one can become the active master. But be aware that in order to get true high availability you need to have at least 3 master-eligible nodes (this ensures that your cluster will still be 100% operational even when losing one of the master-eligible nodes).
Therefore Elastic always recommends to configure 3 or 5 nodes in your cluster as master-eligible nodes. You can configure that role via the node.master property in the Elasticsearch.yml-file. Setting it to true (default) allows that node to become master, while false will ensure that this node never ever will become master and also will not participate in the master election.
Over the life-time of your cluster (master-eligible) nodes might get added and removed. Elasticsearch automatically manages your cluster and the master node election process with the ultimate goal to prevent a split brain scenario from happening, meaning you eventually end up having 2 clusters which go by the same name but with independent master nodes. To prevent that from happening when starting up your cluster for the very first time (bootstrapping your cluster) Elastic requires you to configure the cluster.initial_master_nodes property with the names of the nodes that initially will serve as master-eligible nodes. This property only needs to be configured on nodes that are master-eligible and the setting will only be considered for the very first startup of your cluster. As values you put in the names as configured with the node.name property of your master-eligible nodes.
The discovery.seed_hosts property supports the discovery process which is all about enabling a new node to establish communication with an already existing cluster and eventually joining it when the cluster.name matches. You are supposed to configure it with an array of host names (not node names!) on which you expect other instances of Elasticsearch belonging to the same cluster to be running. You don't need to add all 100 host names of the 100 nodes you may have in your cluster. It's sufficient to list host names of the most stable node names there. As master (eligible) nodes are supposed to be very stable nodes, Elastic recommends to put the host of all master-eligible nodes (typically 3) in there. Whenever you start/restart a node, it goes through this discovery process.
Conclusion
With a cluster made up of 3 nodes you would configure all of them as master-eligible nodes and list the 3 node names in the cluster.initial_master_nodes setting. And you would put all the 3 host names also in the discovery.seed_hosts setting to support the discovery process.
Useful information from the Elasticsearch reference:
Important discovery and cluster formation settings
Discovery and cluster formation settings
Bootstrapping a cluster

Which Elasticsearch node is better configured in Logstash Elasticsearch output plugin and Kibana

I have ELK stack with Elasticsearch, Logstash and kibana installed on 3 different instances.
Now I want to make 3 node cluster of Elasticsearch.
I will make one node as master and 2 data nodes.
I want to know in logstash config
elasticsearch {
hosts => "http://es01:9200"
Which address I need to enter there master node or data node. and also if I have 3 master nodes then which address I need to write there.
similarly in kibana , I use
elasticsearch.url: es01:9200
In cluster env which url I need to use?
In general, the answer depends on your cluster data size and load.
Nevertheless, I'll try to answer your questions assuming the master node is not a data eligible node as well. This means it only takes care for cluster-wide actions such as creating or deleting an index, tracking which nodes are part of the cluster, and deciding which shards to allocate to which nodes. For this purposes, it is very recommended to have your master node as stable and less loaded as possible.
So, in your logstash config I would put the addresses of your two data nodes as follows:
elasticsearch{
hosts => ["http://es01:9200", "http://es02:9200"]
}
This confirmation maximize performance and fault tolerance as your master do not contain data and if one node failes it will continue to work with the other.
Please note that it is very recommended to have at least 3 master eligible nodes configured in Elasticsearch clusters since if you are loosing the (only) master node you loose data. 3 is to avoid split brain
Regarding kibana, since all nodes in the cluster "knows" each other. You basically can put any address in the cluster. But, for the same reasons as above it is recommended to fill one of your data nodes addresses.
For further reading, please refer to this documentation.
Hope I have managed to help!

Does all the nodes in cassandra cluster know the "partition key ranges" for each other?

Lets say I have a cassandra cluster with the following scheme:
(76-100) Node1 - Node2 (0-25)
| |
(51-75) Node4 - Node3 (26-50)
Each node is primarily responsible for a range of partition keys: For example, for a total range of 0-100, I have indicated what range the node is responsible above.
Now, lets say Node 1 is coordinator handing requests. A read request corresponding to partition key 28 reaches Node 1.
How does Node 1 know that Node 2 is primary node for partition key 28. Does each node have a mapping of node IDs to the partition key they are responsible for.
For instance,
{Node1:76-100, Node2: 0-25, Node3: 26-50, Node4: 51-75}
is this mapping present as global configuration in all the nodes since any node can act as coordinator when requests are forwarded in round-robin fashion?
Thanks
The mapping is not present as a global configuration. Rather each node maintains its own copy of the state of the other nodes in the cluster. Typically the cluster will use the gossip protocol to frequently exchange information about the other nodes in the cluster with a few nearby nodes. In this way the mapping information will rapidly propagate to all the nodes in the cluster, even if there are thousands of nodes.
It is necessary for every node to know how to map partition keys to token values, and to know which node is responsible for that token. This is so that every node can act as a coordinator to handle any request by sending it to the exact nodes that are handling that key.
Taken a step further, if you use for example the current java driver, you can have the client use a token aware routing policy. This works by the client driver also getting a copy of the information about how to map keys to nodes. Then when you issue a request, it will be sent directly to a node that is handling that key. This gives a nice performance boost.
Generally you do not need to worry about how the keys are mapped, since if you use vnodes and the Murmur3Partitioner, the cluster will take care of creating the mapping of keys to balance the load across the cluster as nodes are added and removed.

Elasticsearch shard relocation query - is the master node involved during shard relocation (data transfer)

For example, we have one master node running on master1
two data nodes running on server2, server3
Let us say shard relocation happening from server2 to server3
Now to copy the data folder, will elasticsearch cluster make use of master1 (which is a master node) i.e. is the data transferred directly from server2 to server3 or will it go via master1?
We would like to know this as our master1 is running low configuration machine.
No, the master node is not directly involved in the transfer of shards from one node to another. The data is copied from the source node directly to the destination node.
The master node is involved in managing global cluster state, but if it's master only it will not have any data files on it nor have data transferred to or from it:
Note, Elasticsearch is a peer to peer based system, nodes communicate
with one another directly if operations are delegated / broadcast. All
the main APIs (index, delete, search) do not communicate with the
master node. The responsibility of the master node is to maintain the
global cluster state, and act if nodes join or leave the cluster by
reassigning shards.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery.html
dedicated master nodes are nodes with the settings node.data: false
and node.master: true. We actively promote the use of dedicated master
nodes in critical clusters to make sure that there are 3 dedicated
nodes whose only role is to be master, a lightweight operational
(cluster management) responsibility. By reducing the amount of
resource intensive work that these nodes do (in other words, do not
send index or search requests to these dedicated master nodes), we
greatly reduce the chance of cluster instability.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-node.html

How can ElasticSearch node join cluster at runtime?

Assume there are 3 running nodes launched with multicast=false and unicast=true, but no unicast nodes are given when starting. After they all get up, they are not aware of each other.
Is there a way to tell each one IP address of the other two so they can do discovery at runtime and join to same cluster?
Yes, add the ip addresses of all the other nodes in the cluster to the
discovery.zen.ping.unicast.hosts property,
in the elasticsearch.yml file in the config folder.
Say you have three nodes, in each node, add the address of the other two nodes as below:
discovery.zen.ping.unicast.hosts: ["xx.xx.xxx.xx","yy.yy.yy.yy"]

Resources