Elasticsearch Cluster - No known master node, scheduling a retry - elasticsearch

I have a server running elasticsearch and kibana. I have added a second node to form a cluster but only want that second node to replicate data from the master node.
Based on limited documentation on how to do this, I am running into issue on second with following error
[DEBUG][action.admin.indices.get ] [Match] no known master node, scheduling a retry
I am unable to determine the best configuration for both servers to achieve this but this is what I have done so far:
Master Node Config:
cluster.name: elasticsearch
node.master: true
path.data: /local00/elasticsearch/
path.work: /local00/el_temp/
network.host: 0.0.0.0
http.port: 9200
script.disable_dynamic: true
Node 2
cluster.name: elasticsearch
node.master: false
node.data: true
index.number_of_shards: 5
index.number_of_replicas: 1
path.data: /local00/elasticsearch/
path.work: /local00/el_temp/
network.host: 0.0.0.0
http.port: 9200
script.disable_dynamic: true
I am assuming I am missing additional config somewhere. Any help will be much appreciated.

Got it working with following changes answered here How to set up ES cluster?:
Node 1:
cluster.name: mycluster
node.name: "node1"
node.master: true
node.data: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["node1.example.com"]
Node 2:
cluster.name: mycluster
node.name: "node2"
node.master: false
node.data: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["node1.example.com"]

If you are trying to connect additional node to already existed ES cluster, make sure that this node also have all the same ES plugins as another nodes. If not - node cant be fully connected (other nodes can show it as connected but http-commands cant be launched on it) to ES cluster with error like:
[2016-07-21 11:56:59,564][DEBUG][action.admin.cluster.health] [dev-marvel1] no known master node, scheduling a retry
[2016-07-21 11:57:05,313][INFO ][rest.suppressed ] /_cluster/health Params: {pretty=true}
MasterNotDiscoveredException[waited for [30s]]
at org.elasticsearch.action.support.master.TransportMasterNodeAction$4.onTimeout(TransportMasterNodeAction.java:154)
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:239)
at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:574)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
In my case problem was with license plugin. After removing it everything became fine.

If you are searching for this because you are working on your local machine then the quickest solution is to free up nodes that are hung by individually killing all the processes. I searched for these processes using.
ps -alx | grep elastic
kill -9 {pid}

Related

Failed to send join request to master in Elasticsearch, Unknown NamedWriteable [org.elasticsearch.cluster.metadata.MetaData$Custom][licenses]]

We have a long running single node ELK cluster running (master/data). I have decided to add additional data node. However Im getting the below error on the data node
30.X.XXX}{172.30.X.XXX:9300}{ml.enabled=true}], reason [RemoteTransportException[[master][172.30.X.XXX:9300][internal:discovery/zen/join]];
nested: IllegalStateException[failure when sending a validation request to node];
nested: RemoteTransportException[[data1][172.30.X.XXX:9300][internal:discovery/zen/join/validate]];
nested: IllegalArgumentException[Unknown NamedWriteable [org.elasticsearch.cluster.metadata.MetaData$Custom][licenses]]; ]
Below are the config files on master and new data node
Master Node:
cluster.name: my-application
node.name: master
node.master: true
node.data: true
path.data: /opt/elasticsearch
network.host: ["172.30.X.XX1","localhost"]
http.port: 9200
transport.tcp.port: 9300
discovery.zen.ping.unicast.hosts: ["172.30.X.XX1"]
discovery.zen.minimum_master_nodes: 1
Data1 Node:
cluster.name: my-application
node.name: data1
node.master: false
node.data: true
path.data: /opt/elasticsearch
network.host: ["172.30.X.XX2","localhost"]
http.port: 9200
transport.tcp.port: 9300
discovery.zen.ping.unicast.hosts: ["172.30.X.XX1"]
discovery.zen.minimum_master_nodes: 1
Tried pinging and checked telnet on 9200 and 9300 from master to data node and vice versa and it is working fine
I have tried deleting the data from /var/lib/elasticsearch/nodes/0 and restarted the data1, it didnt work
This happens if you try with a mix of xpack/commercial/non-open-source binaries of Elasticsearch and some nodes with the open-source binaries.
Unfortunately Elasticsearch tries to "trick" you into using their non-open-source version nowadays and this causes many unintended non-open-source installations.
A simple solution is to install the non-oss version everywhere, however you may not want to run the commercial version as you then need to adhere to the commercial license!
In order to convert to the open-source license on all nodes you can do the following:
You can set the following in /etc/elasticsearch/elasticsearch.yml and restart all nodes to disable some commercial features:
xpack.security.enabled: false
xpack.ml.enabled: false
Then you can change all nodes to the open-source binaries one by one in rolling fashion.
See also the following similar discussions:
https://discuss.elastic.co/t/elasticsearch-cluster-cant-join-new-node/126964
https://discuss.elastic.co/t/adding-a-new-node-on-a-different-subnet/125377/2- https://github.com/codelibs/elasticsearch-module/issues/3
https://discuss.elastic.co/t/transport-client-error-after-installing-x-pack-on-es-5-5-1/97021/4
https://discuss.elastic.co/t/bulk-indexing-with-x-pack-exception/92086/5

Failed to send join request to master ElasticSearch on AWS EC2 owned cluster

I am trying to build a cluster of 3 EC2 instances (I do not want to use the ElasticSearch service of amazon) and after installing the software and configuring it in all three instances I encounter the problem that they do not communicate with each other.
I’m working with ES 5.5.1 on instances with Ubuntu 16.04
All nodes are up and running
All nodes has a Security Groupof AWS with permissions for all traffic between nodes (all ports)
Internal firewall on very machine white list for every node
Master
cluster.name: excelle
node.name: ${HOSTNAME}
node.master: true
path.data: /srv/data
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
network.host: 172.31.MAS.TER
discovery.zen.ping.unicast.hosts: ["172.31.MAS.TER", "172.31.NODE.TWO", "172.31.NODE.THREE"]
Node two
cluster.name: excelle
node.name: ${HOSTNAME}
node.master: false
path.data: /srv/data
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
network.host: 172.31.NODE.TWO
discovery.zen.ping.unicast.hosts: ["172.31.MAS.TER", "172.31.NODE.TWO", "172.31.NODE.THREE"]
Node 3
cluster.name: excelle
node.name: ${HOSTNAME}
node.master: false
path.data: /srv/data
path.logs: /var/log/elasticsearch
bootstrap.memory_lock: true
network.host: 172.31.NODE.THREE
discovery.zen.ping.unicast.hosts: ["172.31.MAS.TER", "172.31.NODE.TWO", "172.31.NODE.THREE"]
But on logs, on node 3 for exmple...
[2017-08-15T11:01:41,241][INFO ][o.e.d.z.ZenDiscovery ] [es03] failed to send join request to master [{esmaster}{scquEEaETDKMKLHzZvEHZQ}{NdLtMUXtT7WXnv1a4uHWqQ}{172.31.44.107}{172.31.44.107:9300}], reason [RemoteTransportException[[esmaster][172.31.44.107:9300][internal:discovery/zen/join]]; nested: ConnectTransportException[[es03][172.31.18.76:9300] connect_timeout[30s]]; nested: IOException[connection timed out: 172.31.18.76/172.31.18.76:9300]; ]
I testing connection from node 3 to master not problem (for network question)
telnet 172.31.MAS.TER 9300
Trying 172.31.MAS.TER...
Connected to 172.31.MAS.TER.
Escape character is '^]'.
What it's wrong? Any idea?
I found an answer to this posted on ElasticSearch
The gem was from manst:
"solution for this error (you must deleted contents of data folder(/var/lib/elasticsearch/nodes/0) and restarted both the servers ):"
I deleted the nodes folder from each of my SpotInst instances and rebooted. My 3 ES distributed master-only nodes all came online. My 8 data-only nodes have connected automatically without any issue.

Elastic data node with shield

but it can't working after I setup shield
I added user to elastic by command
shield/esusers useradd es_admin -r admin
This is my master node config
cluster.name: vision
node.name: "node_master"
node.master: true
node.data: false
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["192.168.1.5"]
path.logs: /var/elastic/log
path.data: /var/elastic/data
This is my data node config
cluster.name: vision
node.name: "node_data"
node.master: false
node.data: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["192.168.1.5"]
path.logs: /var/elastic/log
path.data: /var/elastic/data
How can I connect data node to master node?
There is no extra work you need to do to join data and master node to form a cluster.It treats both type of nodes same.
Your hosts setting is mentioning only one host.
discovery.zen.ping.unicast.hosts: ["host1:port","host2:port"]
Each node will keep pinging the hosts listed above until both are initialized.Adding the local host is of no harm to array as ping wont fail but help in automated deployement of elasticsearch on multinode ecosystem.
since you are using shield make sure if you enabled ssl for node communicatioon then also specify the path to SSL keystore files.

Elasticsearch cluster initialization

I just setup a 3 node Elasticsearch cluster, with each node having common settings (pasted at the end of the post)
However, when I start my master node, and try to get the cluster status or even check if any one of the nodes is up, I get a 503 as the status code. Also, shutdowns (on any of the nodes) do not work.
Could someone please tell me what I'm doing wrong here? The log file on Node 1 says:
[ESNode1] observer: timeout notification from cluster service. timeout setting [30s], time since start [30s]
Here's snippets from the elasticsearch.yml config files:
Node 1
cluster.name: myCluster
node.name: ESNode1
node.master: true
node.data: true
discovery.zen.minimum_master_nodes: 2
discover.zen.ping.timeout: 20s #just for good measure
discovery.zen.ping.multicast.enabled: false
Node 2
cluster.name: myCluster
node.name: ESNode2
node.master: true
node.data: true
discovery.zen.minimum_master_nodes: 2
discover.zen.ping.timeout: 20s
discovery.zen.ping.multicast.enabled: false
Node 3
cluster.name: myCluster
node.name: ESNode3
node.master: false
node.data: true
discovery.zen.minimum_master_nodes: 2
discover.zen.ping.timeout: 20s
discovery.zen.ping.multicast.enabled: false
Thank you!
You configure that the minimum master nodes is 2. This means your cluster needs at least two master nodes. This is fine, however, together with the setting discovery.zen.ping.multicast.enabled: false this is hard to get working. This setting means you are not going to look for other nodes. So you should configure the nodes manually using the setting hosts.
You can find more information here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-discovery-zen.html#unicast
An example for three nodes running on one machine:
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9300","127.0.0.1:9301","127.0.0.1:9302"]
Disabling multicast discovery means discovery pings will only be sent to specific addresses. The addresses/hosts are those specified in discovery.zen.ping.unicast.hosts.
Note that a single address can be specified. When a node joins it becomes aware of all nodes in the cluster, and can start communicating with them directly.
To clarify using Jettros' example:
discovery.zen.unicast.hosts:["127.0.0.1:9300"]
will cause the nodes bound to 9301 and 9302 to ping only 9300.
If 9301 joins first it 'already knows' all other nodes in the cluster (just 9300).
If 9302 subsequently joins it will become aware of 9301 and vice-versa.
If 9301 and 9302 cant join with 9300 the cluster will not be formed.

How could I set up a ES cluster?

I have a master node which ip is 192.168.1.101 and a non-master node which ip is 192.168.1.106. The two use the same version of ElasticSearch-1.2.0.
But after I started the master node and the non-master node, then I got the following info:
[2014-06-04 02:38:49,350][INFO ][discovery.zen ] [node2] failed to send join request to master [[node1][TxZ5wuhnT1awPC1gEjYPdw][flyers-MacBook-Air.local][inet[/192.168.1.101:9300]]{master=true}], reason [org.elasticsearch.ElasticsearchTimeoutException: Timeout waiting for task.]
Config of the master node:
cluster.name: mycluster
node.name: "node1"
node.master: true
node.data: true
index.number_of_shards: 5
index.number_of_replicas: 1
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["192.168.1.101"]
Config of the non-master node:
cluster.name: mycluster
node.name: "node2"
node.master: false
node.data: true
index.number_of_shards: 5
index.number_of_replicas: 1
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["192.168.1.101"]
I don't know why this exception happens. Please give me some tips. Thanks in advance.
After I set network.bind_host、network.publish_host、network.host to the IP that the node held,it worked. Very strange.
I had the same issue until I found out that my ES node did not bind to eth0 as expected but to eth2 instead. Of course this could not work because the registration response from the master node could not be sent to the IP address of my other network.
I was able to fix this behaviour by setting the following parameter in my elasticsearch.yml (on the server which was not able to join the cluster)
network.publish_host: "_eth0:ipv4_"
I'd better change ["192.168.1.101"] to ["192.168.1.101", "192.168.1.106"] in both configurations.

Resources