Elasticsearch: Node registered to cluster but MasterNotDiscoveredException - elasticsearch

I´m having a variation of the usual connection problem between ElasticSearch nodes, however here it does not seem to be related to the network as the client registers with the master without any problem (apparently). My set-up is the following:
One Master node (node.master=true, node.data=true, cluster.name=stokker)
One Client node (Spring Boot 1.3.0.M5) with these settings:
spring.data.elasticsearch.properties.http.enabled=true
spring.data.elasticsearch.cluster-name=stokker
spring.data.elasticsearch.properties.node.local=false
spring.data.elasticsearch.properties.node.data=false
spring.data.elasticsearch.properties.node.client=true
First I start the master node, then the client and I can see that the client registers OK:
[Kilmer] recovered [0] indices into cluster_state
[Kilmer] watch service has started
[Kilmer] bound_address {inet[/0:0:0:0:0:0:0:0:9201]}, publish_address {inet[/159.107.28.230:9201]}
[Kilmer] started
[Kilmer] added {[Thunderclap][VVF_5QnLREac-Du-dZK1IQ][ES00052260][inet[/159.107.28.230:9301]]{client=true, data=false, local=false},}, reason: zen-disco-receive(join from node[[Thunderclap][VVF_5QnLREac-Du-dZK1IQ] [ES00052260][inet[/159.107.28.230:9301]]{client
Client´s console output
org.elasticsearch.node : [Thunderclap] version[1.7.0], pid[12084], build[929b973/2015-07-16T14:31:07Z]
org.elasticsearch.node : [Thunderclap] initializing ...
org.elasticsearch.plugins : [Thunderclap] loaded [], sites []
org.elasticsearch.bootstrap : JNA not found. native methods will be disabled.
org.elasticsearch.node : [Thunderclap] initialized
org.elasticsearch.node : [Thunderclap] starting ...
org.elasticsearch.transport : [Thunderclap] bound_address {inet[/0:0:0:0:0:0:0:0:9301]}, publish_address {inet[/159.107.28.230:9301]}
org.elasticsearch.discovery : [Thunderclap] stokker/VVF_5QnLREac-Du-dZK1IQ
org.elasticsearch.discovery : [Thunderclap] waited for 30s and no initial state was set by the discovery
org.elasticsearch.http : [Thunderclap] bound_address {inet[/0:0:0:0:0:0:0:0:9202]}, publish_address {inet[/159.107.28.230:9202]}
org.elasticsearch.node : [Thunderclap] started
However, when I try to perform some indexing, I get the following exception:
org.elasticsearch.discovery.MasterNotDiscoveredException: waited for [30s]
Any ideas on what am I missing here?
Thanks

I solved the issue by adding this property manually indicating where the master node is:
spring.data.elasticsearch.cluster-nodes=192.168.1.18:9300
If somebody finds a better solution, please let me know, I´m not fully confident in this one.

Related

What is causing elasticsearch to shutdown shortly after starting up?

I'm having an issue with Elasticsearch on EC2 where I'm starting up several new instances from the same AMI, and very occasionally (like < 1% of the time), the Elasticsearch service will stop shortly after starting. I've looked at the log file, but it's not really clear to me why the service is stopping. Are there any clues in this that I'm missing, or is there anywhere else I should look for logs when this happens?
[2020-07-28T18:17:44,251][INFO ][o.e.c.c.ClusterBootstrapService] [ip-10-0-0-68] no discovery configuration found, will perform best-effort cluster bootstrapping after [3s] unless existing master is discovered
[2020-07-28T18:17:44,375][INFO ][o.e.c.s.MasterService ] [ip-10-0-0-68] elected-as-master ([1] nodes joined)[{ip-10-0-0-68}{C1lEYCg6RUWry4avn4isxw}{IjXE3KNOQO2UeZyrX2o3FA}{127.0.0.1}{127.0.0.1:9300}{dilm}{ml.machine_memory=32601837568, xpack.installed=true, ml.max_open_jobs=20} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 4, version: 26, delta: master node changed {previous [], current [{ip-10-0-0-68}{C1lEYCg6RUWry4avn4isxw}{IjXE3KNOQO2UeZyrX2o3FA}{127.0.0.1}{127.0.0.1:9300}{dilm}{ml.machine_memory=32601837568, xpack.installed=true, ml.max_open_jobs=20}]}
[2020-07-28T18:17:44,416][INFO ][o.e.c.s.ClusterApplierService] [ip-10-0-0-68] master node changed {previous [], current [{ip-10-0-0-68}{C1lEYCg6RUWry4avn4isxw}{IjXE3KNOQO2UeZyrX2o3FA}{127.0.0.1}{127.0.0.1:9300}{dilm}{ml.machine_memory=32601837568, xpack.installed=true, ml.max_open_jobs=20}]}, term: 4, version: 26, reason: Publication{term=4, version=26}
[2020-07-28T18:17:44,446][INFO ][o.e.h.AbstractHttpServerTransport] [ip-10-0-0-68] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2020-07-28T18:17:44,447][INFO ][o.e.n.Node ] [ip-10-0-0-68] started
[2020-07-28T18:17:44,595][INFO ][o.e.l.LicenseService ] [ip-10-0-0-68] license [a9a29e21-5167-497e-9e49-ccc785ea2d47] mode [basic] - valid
[2020-07-28T18:17:44,596][INFO ][o.e.x.s.s.SecurityStatusChangeListener] [ip-10-0-0-68] Active license is now [BASIC]; Security is disabled
[2020-07-28T18:17:44,602][INFO ][o.e.g.GatewayService ] [ip-10-0-0-68] recovered [0] indices into cluster_state
[2020-07-28T18:18:29,947][INFO ][o.e.n.Node ] [ip-10-0-0-68] stopping ...
[2020-07-28T18:18:29,962][INFO ][o.e.x.w.WatcherService ] [ip-10-0-0-68] stopping watch service, reason [shutdown initiated]
[2020-07-28T18:18:29,963][INFO ][o.e.x.w.WatcherLifeCycleService] [ip-10-0-0-68] watcher has stopped and shutdown
[2020-07-28T18:18:30,014][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [ip-10-0-0-68] [controller/2184] [Main.cc#150] Ml controller exiting
[2020-07-28T18:18:30,015][INFO ][o.e.x.m.p.NativeController] [ip-10-0-0-68] Native controller process has stopped - no new native processes can be started
[2020-07-28T18:18:30,024][INFO ][o.e.n.Node ] [ip-10-0-0-68] stopped
[2020-07-28T18:18:30,024][INFO ][o.e.n.Node ] [ip-10-0-0-68] closing ...
[2020-07-28T18:18:30,032][INFO ][o.e.n.Node ] [ip-10-0-0-68] closed
[2020-07-28T18:18:29,947][INFO ][o.e.n.Node ] [ip-10-0-0-68] stopping ...
This log line means Elasticsearch shut down gracefully after receiving a shutdown signal (typically SIGTERM) from an external source. It's not possible to say what the external source is, it depends on your system. It could for instance be systemd if that's how you're starting Elasticsearch. If so, hopefully its logs tell you why it's sending that shutdown signal.

URL to access cluster environment for ElasticSearch2.4.3

We have an ElasticSearch2.4.3 cluster environment of two nodes. I want to ask what URL should I provide to access the environment so that it works in High Availability?
We have two master Node1 and Node2. The host name for Node1 is node1.elastic.com and Node2 is node2.elastic.com. Both the nodes are master according to formula (n/2 +1).
We have enabled cluster setting by modifying the elastic.yml file by adding
discovery.zen.ping.unicast.hosts for the two nodes.
From our java application, we are connecting to node1.elastic.com. It works fine till both the nodes are up. Data is getting populated in both the ES servers and everything is good. But as soon as Node1 goes down entire elastic search cluster gets disconnected. And it automatically doesn't switch to Node2 for processing requests.
I feel like the URL which I am giving is not right, and it has to be something else to provide an automatic switch.
Logs from Node1
[2020-02-10 12:15:45,639][INFO ][node ] [Wildpride] initialized
[2020-02-10 12:15:45,639][INFO ][node ] [Wildpride] starting ...
[2020-02-10 12:15:45,769][WARN ][common.network ] [Wildpride] _non_loopback_ is deprecated as it picks an arbitrary interface. specify explicit scope(s), interface(s), address(es), or hostname(s) instead
[2020-02-10 12:15:45,783][WARN ][common.network ] [Wildpride] _non_loopback_ is deprecated as it picks an arbitrary interface. specify explicit scope(s), interface(s), address(es), or hostname(s) instead
[2020-02-10 12:15:45,784][INFO ][transport ] [Wildpride] publish_address {000.00.00.204:9300}, bound_addresses {[fe80::9af2:b3ff:fee9:90ca]:9300}, {000.00.00.204:9300}
[2020-02-10 12:15:45,788][INFO ][discovery ] [Wildpride] XXXX/Hg_5eGZIS0e249KUTQqPPg
[2020-02-10 12:16:15,790][WARN ][discovery ] [Wildpride] waited for 30s and no initial state was set by the discovery
[2020-02-10 12:16:15,799][WARN ][common.network ] [Wildpride] _non_loopback_ is deprecated as it picks an arbitrary interface. specify explicit scope(s), interface(s), address(es), or hostname(s) instead
[2020-02-10 12:16:15,802][WARN ][common.network ] [Wildpride] _non_loopback_ is deprecated as it picks an arbitrary interface. specify explicit scope(s), interface(s), address(es), or hostname(s) instead
[2020-02-10 12:16:15,803][INFO ][http ] [Wildpride] publish_address {000.00.00.204:9200}, bound_addresses {[fe80::9af2:b3ff:fee9:90ca]:9200}, {000.00.00.204:9200}
[2020-02-10 12:16:15,803][INFO ][node ] [Wildpride] started
[2020-02-10 12:16:35,552][INFO ][node ] [Wildpride] stopping ...
[2020-02-10 12:16:35,619][WARN ][discovery.zen.ping.unicast] [Wildpride] [17] failed send ping to {#zen_unicast_1#}{000.00.00.206}{000.00.00.206:9300}
java.lang.IllegalStateException: can't add nodes to a stopped transport
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:916)
at org.elasticsearch.transport.netty.NettyTransport.connectToNodeLight(NettyTransport.java:906)
at org.elasticsearch.transport.TransportService.connectToNodeLight(TransportService.java:267)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$3.run(UnicastZenPing.java:395)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2020-02-10 12:16:35,620][WARN ][discovery.zen.ping.unicast] [Wildpride] failed to send ping to [{Wildpride}{Hg_5eGZIS0e249KUTQqPPg}{000.00.00.204}{000.00.00.204:9300}]
SendRequestTransportException[[Wildpride][000.00.00.204:9300][internal:discovery/zen/unicast]]; nested: TransportException[TransportService is closed stopped can't send request];
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:340)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPingRequestToNode(UnicastZenPing.java:440)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPings(UnicastZenPing.java:426)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$2.doRun(UnicastZenPing.java:249)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: TransportException[TransportService is closed stopped can't send request]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:320)
... 7 more
[2020-02-10 12:16:35,623][INFO ][node ] [Wildpride] stopped
[2020-02-10 12:16:35,623][INFO ][node ] [Wildpride] closing ...
[2020-02-10 12:16:35,642][INFO ][node ] [Wildpride] closed
TL;DR: There is no automatic switch in elasticsearch and you'll need some kind of loadbalancer in front of the elasticsearch cluster.
For an HA setup, you need at least 3 master eligible nodes. In front of the cluster there have to be a loadbalancer (also HA) to distribute the requests across the cluster. Or the client needs to be somehow aware of cluster and in a failure scenario failover to any node left.
If you go with 2 master nodes, the cluster can get into the "split brain" state. If your network gets somehow fragmented and the nodes become invisible to each other, both of them will think it is the last one working and keep serving the read/write requests independently. In that way they drift away from each other and it will become nearly impossible to fix it - at least, when the fragmentation is gone, there is a lot of trouble to fix. With 3 nodes, in a fragmentation scenario the clusternwill only continue to serve requests if there are at least 2 nodes visible to each other.

Elasticsearch fails to start

I'm trying to implement a 2 node ES cluster using Amazon EC2 instances. After everything is setup and I try to start the ES, it fails to start. Below are the config files:
/etc/elasticsearch/elasticsearch.yml - http://pastebin.com/3Q1qNqmZ
/etc/init.d/elasticsearch - http://pastebin.com/f3aJyurR
Below are the /var/log/elasticsearch/es-cluster.log content -
[2014-06-08 07:06:01,761][WARN ][common.jna ] Unknown mlockall error 0
[2014-06-08 07:06:02,095][INFO ][node ] [logstash] version[0.90.13], pid[29666], build[249c9c5/2014-03-25T15:27:12Z]
[2014-06-08 07:06:02,095][INFO ][node ] [logstash] initializing ...
[2014-06-08 07:06:02,108][INFO ][plugins ] [logstash] loaded [], sites []
[2014-06-08 07:06:07,504][INFO ][node ] [logstash] initialized
[2014-06-08 07:06:07,510][INFO ][node ] [logstash] starting ...
[2014-06-08 07:06:07,646][INFO ][transport ] [logstash] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.164.27.207:9300]}
[2014-06-08 07:06:12,177][INFO ][cluster.service ] [logstash] new_master [logstash][vCS_3LzESEKSN-thhGWeGA][inet[/<an_ip_is_here>:9300]], reason: zen-disco-join (elected_as_master)
[2014-06-08 07:06:12,208][INFO ][discovery ] [logstash] es-cluster/vCS_3LzESEKSN-thhGWeGA
[2014-06-08 07:06:12,334][INFO ][http ] [logstash] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/<an_ip_is_here>:9200]}
[2014-06-08 07:06:12,335][INFO ][node ] [logstash] started
[2014-06-08 07:06:12,379][INFO ][gateway ] [logstash] recovered [0] indices into cluster_state
I see several things that you should correct in your configuration files.
1) Need different node names. You are using the same config file for both nodes. You do not want to do this if you are setting node name like you are: node.name: "logstash". Either create separate configuration files with different node.name entries or comment it out and let ES auto assign the node.name.
2) Mlockall setting is throwing an error. I would not start out setting bootstrap.mlockall: True until you've first gotten ES to run without it and then have spent a little time configuring linux to support it. It can cause problems with booting up:
Warning
mlockall might cause the JVM or shell session to exit if it tries to
allocate more memory than is available!
I'd check out the documentation on the configuration variables and be careful about making too many adjustments right out of the gate.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-service.html
If you do want to make memory adjustments to ES this previous stackoverflow article should be helpful:
How to change Elasticsearch max memory size

logstash - Exception in thread ">output" org.elasticsearch.discovery.MasterNotDiscoveredException: waited for [30s]

Log stash is 100% a disaster for me. I am using LS 1.4.1 and ES 1.02 in the same machine.
Here is how I start logstash indexer:
/usr/local/share/logstash-1.4.1/bin/logstash -f /usr/local/share/logstash.indexer.config
input {
redis {
host => "redis.queue.do.development.sf.test.com"
data_type => "list"
key => "logstash"
codec => json
}
}
output {
stdout { }
elasticsearch {
bind_host => "127.0.0.1"
port => "9300"
}
}
ES I set:
network.bind_host: 127.0.0.1
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9300"]
And wow..this is what I get:
/usr/local/share/logstash-1.4.1/bin/logstash -f /usr/local/share/logstash.indexer.config
Using milestone 2 input plugin 'redis'. This plugin should be stable, but if you see strange behavior, please let us know! For more information on plugin milestones, see http://logstash.net/docs/1.4.1/plugin-milestones {:level=>:warn}
log4j, [2014-05-29T12:02:29.545] WARN: org.elasticsearch.discovery: [logstash-do-logstash-sf-development-20140527082230-866-2010] waited for 30s and no initial state was set by the discovery
Exception in thread ">output" org.elasticsearch.discovery.MasterNotDiscoveredException: waited for [30s]
at org.elasticsearch.action.support.master.TransportMasterNodeOperationAction$3.onTimeout(org/elasticsearch/action/support/master/TransportMasterNodeOperationAction.java:180)
at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(org/elasticsearch/cluster/service/InternalClusterService.java:492)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java/util/concurrent/ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java/util/concurrent/ThreadPoolExecutor.java:615)
at java.lang.Thread.run(java/lang/Thread.java:744)
See http://logstash.net/docs/1.4.1/outputs/elasticsearch
VERSION NOTE: Your Elasticsearch cluster must be running Elasticsearch 1.1.1. If you use any other version of Elasticsearch, you should set protocol => http in this plugin.
So your problem is that logstash doesn't support the older ES version you are using without using an http transport.
Setting 'protocol => "http"' worked for me. I expected the EPEL repo to have complementary versions of logstash and elasticsearch, but ES is used for lots of stuff, thus is not tightly coupled with the logstash rpms.
For me, the problem wasn't with the versions of elasticsearch or logstash. I had just installed them and I was using the latest version of each (1.5.0 & 1.4.2 respectively).
Running the following worked for me as well:
logstash -e 'input { stdin { } } output { elasticsearch { protocol => "http" } }'
But I wanted to get to the bottom of why I wasn't able to connect over the other protocols. Though the documentation doesn't say what the default protocol is, I was pretty sure I was either using transport or node for port 9300 by default because of the following output I got when I started elasticsearch
[2015-04-14 22:21:56,355][INFO ][node ] [Super-Nova] version[1.5.0], pid[10796], build[5448160/2015-03-23T14:30:58Z]
[2015-04-14 22:21:56,355][INFO ][node ] [Super-Nova] initializing ...
[2015-04-14 22:21:56,358][INFO ][plugins ] [Super-Nova] loaded [], sites []
[2015-04-14 22:21:58,186][INFO ][node ] [Super-Nova] initialized
[2015-04-14 22:21:58,187][INFO ][node ] [Super-Nova] starting ...
[2015-04-14 22:21:58,257][INFO ][transport ] [Super-Nova] bound_address {inet[/127.0.0.1:9300]}, publish_address {inet[/127.0.0.1:9300]}
[2015-04-14 22:21:58,273][INFO ][discovery ] [Super-Nova] elasticsearch/KPaTxb9vRnaNXBncN5KN7g
[2015-04-14 22:22:02,053][INFO ][cluster.service ] [Super-Nova] new_master [Super-Nova][KPaTxb9vRnaNXBncN5KN7g][Azads-MBP-2][inet[/127.0.0.1:9300]], reason: zen-disco-join (elected_as_master)
[2015-04-14 22:22:02,069][INFO ][http ] [Super-Nova] bound_address {inet[/127.0.0.1:9200]}, publish_address {inet[/127.0.0.1:9200]}
[2015-04-14 22:22:02,069][INFO ][node ] [Super-Nova] started
At first, I tried opening up port 9300 by following these instructions. That didn't change a thing, so most likely that port wasn't blocked.
Then I stumbled upon this github issue. There wasn't really a solution there that helped, but I did double check to make sure my elasticsearch cluster name was right by checking elasticsearch.yaml (This file is ususally stored where elasticsearch is installed. Run "which elasticsearch" to give you an idea where to look). Lo and behold, my elastisearch cluster.name had my name appended to it. Removing it so that the cluster name was just "elasticsearch" helped logstash discover my elasticsearch instance.

Install elasticsearch on OpenShift

I installed a pre build Elasticsearch 1.0.0 version by reading this tutorial. If I start elasticsearch I got the following error message, Should I try an older version of ES or how to fix this issue?
[elastic-dataportal.rhcloud.com elasticsearch-1.0.0]\> ./bin/elasticsearch
[2014-02-25 10:02:18,757][INFO ][node ] [Desmond Pitt] version[1.0.0], pid[203443], build[a46900e/2014-02-12T16:18:34Z]
[2014-02-25 10:02:18,764][INFO ][node ] [Desmond Pitt] initializing ...
[2014-02-25 10:02:18,780][INFO ][plugins ] [Desmond Pitt] loaded [], sites []
OpenJDK Server VM warning: You have loaded library /var/lib/openshift/430c93b1500446b03a00005c/app-root/data/elasticsearch-1.0.0/lib/sigar/libsigar-x86-linux.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
[2014-02-25 10:02:32,198][INFO ][node ] [Desmond Pitt] initialized
[2014-02-25 10:02:32,205][INFO ][node ] [Desmond Pitt] starting ...
[2014-02-25 10:02:32,813][INFO ][transport ] [Desmond Pitt] bound_address {inet[/127.8.212.129:3306]}, publish_address {inet[/127.8.212.129:3306]}
[2014-02-25 10:02:35,949][INFO ][cluster.service ] [Desmond Pitt] new_master [Desmond Pitt][_bWO_h9ETTWrMNr7x_yALg][ex-std-node134.prod.rhcloud.com][inet[/127.8.212.129:3306]], reason: zen-disco-join (elected_as_master)
[2014-02-25 10:02:36,167][INFO ][discovery ] [Desmond Pitt] elasticsearch/_bWO_h9ETTWrMNr7x_yALg
{1.0.0}: Startup Failed ...
- BindHttpException[Failed to bind to [8080]]
ChannelException[Failed to bind to: /127.8.212.129:8080]
BindException[Address already in use]
You first have to stop the running demo application, which is already bound to 8080. This can be done with this command:
ctl_app stop
After running this command you will be able to start elasticsearch on the port 8080. However this is not recommended for production environments.
I would recommend installing ElasticSearch with this cartridge: https://github.com/ncdc/openshift-elasticsearch-cartridge
It will save you the headaches of manual custom configurations.
you try to assign ES to port 8080, which already is taken. the culprit in the config from there is http.port: ${OPENSHIFT_DIY_PORT}. just leave both port configs out of the config or assign the envvar some other port. the default ports for ES are 9200 for http and 9300.

Resources