Elasticsearch fails to start - amazon-ec2

I'm trying to implement a 2 node ES cluster using Amazon EC2 instances. After everything is setup and I try to start the ES, it fails to start. Below are the config files:
/etc/elasticsearch/elasticsearch.yml - http://pastebin.com/3Q1qNqmZ
/etc/init.d/elasticsearch - http://pastebin.com/f3aJyurR
Below are the /var/log/elasticsearch/es-cluster.log content -
[2014-06-08 07:06:01,761][WARN ][common.jna ] Unknown mlockall error 0
[2014-06-08 07:06:02,095][INFO ][node ] [logstash] version[0.90.13], pid[29666], build[249c9c5/2014-03-25T15:27:12Z]
[2014-06-08 07:06:02,095][INFO ][node ] [logstash] initializing ...
[2014-06-08 07:06:02,108][INFO ][plugins ] [logstash] loaded [], sites []
[2014-06-08 07:06:07,504][INFO ][node ] [logstash] initialized
[2014-06-08 07:06:07,510][INFO ][node ] [logstash] starting ...
[2014-06-08 07:06:07,646][INFO ][transport ] [logstash] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.164.27.207:9300]}
[2014-06-08 07:06:12,177][INFO ][cluster.service ] [logstash] new_master [logstash][vCS_3LzESEKSN-thhGWeGA][inet[/<an_ip_is_here>:9300]], reason: zen-disco-join (elected_as_master)
[2014-06-08 07:06:12,208][INFO ][discovery ] [logstash] es-cluster/vCS_3LzESEKSN-thhGWeGA
[2014-06-08 07:06:12,334][INFO ][http ] [logstash] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/<an_ip_is_here>:9200]}
[2014-06-08 07:06:12,335][INFO ][node ] [logstash] started
[2014-06-08 07:06:12,379][INFO ][gateway ] [logstash] recovered [0] indices into cluster_state

I see several things that you should correct in your configuration files.
1) Need different node names. You are using the same config file for both nodes. You do not want to do this if you are setting node name like you are: node.name: "logstash". Either create separate configuration files with different node.name entries or comment it out and let ES auto assign the node.name.
2) Mlockall setting is throwing an error. I would not start out setting bootstrap.mlockall: True until you've first gotten ES to run without it and then have spent a little time configuring linux to support it. It can cause problems with booting up:
Warning
mlockall might cause the JVM or shell session to exit if it tries to
allocate more memory than is available!
I'd check out the documentation on the configuration variables and be careful about making too many adjustments right out of the gate.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-service.html
If you do want to make memory adjustments to ES this previous stackoverflow article should be helpful:
How to change Elasticsearch max memory size

Related

URL to access cluster environment for ElasticSearch2.4.3

We have an ElasticSearch2.4.3 cluster environment of two nodes. I want to ask what URL should I provide to access the environment so that it works in High Availability?
We have two master Node1 and Node2. The host name for Node1 is node1.elastic.com and Node2 is node2.elastic.com. Both the nodes are master according to formula (n/2 +1).
We have enabled cluster setting by modifying the elastic.yml file by adding
discovery.zen.ping.unicast.hosts for the two nodes.
From our java application, we are connecting to node1.elastic.com. It works fine till both the nodes are up. Data is getting populated in both the ES servers and everything is good. But as soon as Node1 goes down entire elastic search cluster gets disconnected. And it automatically doesn't switch to Node2 for processing requests.
I feel like the URL which I am giving is not right, and it has to be something else to provide an automatic switch.
Logs from Node1
[2020-02-10 12:15:45,639][INFO ][node ] [Wildpride] initialized
[2020-02-10 12:15:45,639][INFO ][node ] [Wildpride] starting ...
[2020-02-10 12:15:45,769][WARN ][common.network ] [Wildpride] _non_loopback_ is deprecated as it picks an arbitrary interface. specify explicit scope(s), interface(s), address(es), or hostname(s) instead
[2020-02-10 12:15:45,783][WARN ][common.network ] [Wildpride] _non_loopback_ is deprecated as it picks an arbitrary interface. specify explicit scope(s), interface(s), address(es), or hostname(s) instead
[2020-02-10 12:15:45,784][INFO ][transport ] [Wildpride] publish_address {000.00.00.204:9300}, bound_addresses {[fe80::9af2:b3ff:fee9:90ca]:9300}, {000.00.00.204:9300}
[2020-02-10 12:15:45,788][INFO ][discovery ] [Wildpride] XXXX/Hg_5eGZIS0e249KUTQqPPg
[2020-02-10 12:16:15,790][WARN ][discovery ] [Wildpride] waited for 30s and no initial state was set by the discovery
[2020-02-10 12:16:15,799][WARN ][common.network ] [Wildpride] _non_loopback_ is deprecated as it picks an arbitrary interface. specify explicit scope(s), interface(s), address(es), or hostname(s) instead
[2020-02-10 12:16:15,802][WARN ][common.network ] [Wildpride] _non_loopback_ is deprecated as it picks an arbitrary interface. specify explicit scope(s), interface(s), address(es), or hostname(s) instead
[2020-02-10 12:16:15,803][INFO ][http ] [Wildpride] publish_address {000.00.00.204:9200}, bound_addresses {[fe80::9af2:b3ff:fee9:90ca]:9200}, {000.00.00.204:9200}
[2020-02-10 12:16:15,803][INFO ][node ] [Wildpride] started
[2020-02-10 12:16:35,552][INFO ][node ] [Wildpride] stopping ...
[2020-02-10 12:16:35,619][WARN ][discovery.zen.ping.unicast] [Wildpride] [17] failed send ping to {#zen_unicast_1#}{000.00.00.206}{000.00.00.206:9300}
java.lang.IllegalStateException: can't add nodes to a stopped transport
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:916)
at org.elasticsearch.transport.netty.NettyTransport.connectToNodeLight(NettyTransport.java:906)
at org.elasticsearch.transport.TransportService.connectToNodeLight(TransportService.java:267)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$3.run(UnicastZenPing.java:395)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2020-02-10 12:16:35,620][WARN ][discovery.zen.ping.unicast] [Wildpride] failed to send ping to [{Wildpride}{Hg_5eGZIS0e249KUTQqPPg}{000.00.00.204}{000.00.00.204:9300}]
SendRequestTransportException[[Wildpride][000.00.00.204:9300][internal:discovery/zen/unicast]]; nested: TransportException[TransportService is closed stopped can't send request];
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:340)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPingRequestToNode(UnicastZenPing.java:440)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPings(UnicastZenPing.java:426)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$2.doRun(UnicastZenPing.java:249)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: TransportException[TransportService is closed stopped can't send request]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:320)
... 7 more
[2020-02-10 12:16:35,623][INFO ][node ] [Wildpride] stopped
[2020-02-10 12:16:35,623][INFO ][node ] [Wildpride] closing ...
[2020-02-10 12:16:35,642][INFO ][node ] [Wildpride] closed
TL;DR: There is no automatic switch in elasticsearch and you'll need some kind of loadbalancer in front of the elasticsearch cluster.
For an HA setup, you need at least 3 master eligible nodes. In front of the cluster there have to be a loadbalancer (also HA) to distribute the requests across the cluster. Or the client needs to be somehow aware of cluster and in a failure scenario failover to any node left.
If you go with 2 master nodes, the cluster can get into the "split brain" state. If your network gets somehow fragmented and the nodes become invisible to each other, both of them will think it is the last one working and keep serving the read/write requests independently. In that way they drift away from each other and it will become nearly impossible to fix it - at least, when the fragmentation is gone, there is a lot of trouble to fix. With 3 nodes, in a fragmentation scenario the clusternwill only continue to serve requests if there are at least 2 nodes visible to each other.

elasticsearch: How to reinitialize a node?

elasticsearch 1.7.2 on CentOS
We have a 3 node cluster that has been running fine. A networking problem caused the "B" node to lose network access. (It then turns out that the C node had the "minimum_master_nodes" as 1, not 2.)
So we are now poking along with just the A node.
We fixed the issues on the B and C nodes, but they refuse to come up and join the cluster. On B and C:
# curl -XGET http://localhost:9200/_cluster/health?pretty=true
{
"error" : "MasterNotDiscoveredException[waited for [30s]]",
"status" : 503
}
The elasticsearch.yml is as follows (the name on "b" and "c" nodes are reflected in the node names on those systems, ALSO, the IP addys on each node reflect the other 2 nodes, HOWEVER, on the "c" node, the index.number_of_replicas was mistakenly set to 1.)
cluster.name: elasticsearch-prod
node.name: "PROD-node-3a"
node.master: true
index.number_of_replicas: 2
discovery.zen.minimum_master_nodes: 2
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["192.168.3.100", "192.168.3.101"]
We have no idea why they won't join. They have network visibility to A, and A can see them. Each node correctly has the other two defined in "discovery.zen.ping.unicast.hosts:"
On B and C, the log is very sparse, and tells us nothing:
# cat elasticsearch.log
[2015-09-24 20:07:46,686][INFO ][node ] [The Profile] version[1.7.2], pid[866], build[e43676b/2015-09-14T09:49:53Z]
[2015-09-24 20:07:46,688][INFO ][node ] [The Profile] initializing ...
[2015-09-24 20:07:46,931][INFO ][plugins ] [The Profile] loaded [], sites []
[2015-09-24 20:07:47,054][INFO ][env ] [The Profile] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [148.7gb], net total_space [157.3gb], types [rootfs]
[2015-09-24 20:07:50,696][INFO ][node ] [The Profile] initialized
[2015-09-24 20:07:50,697][INFO ][node ] [The Profile] starting ...
[2015-09-24 20:07:50,942][INFO ][transport ] [The Profile] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.181.3.138:9300]}
[2015-09-24 20:07:50,983][INFO ][discovery ] [The Profile] elasticsearch/PojoIp-ZTXufX_Lxlwvdew
[2015-09-24 20:07:54,772][INFO ][cluster.service ] [The Profile] new_master [The Profile][PojoIp-ZTXufX_Lxlwvdew][elastic-search-3c-prod-centos-case-48307][inet[/10.181.3.138:9300]], reason: zen-disco-join (elected_as_master)
[2015-09-24 20:07:54,801][INFO ][http ] [The Profile] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.181.3.138:9200]}
[2015-09-24 20:07:54,802][INFO ][node ] [The Profile] started
[2015-09-24 20:07:54,880][INFO ][gateway ] [The Profile] recovered [0] indices into cluster_state
[2015-09-24 20:42:45,691][INFO ][node ] [The Profile] stopping ...
[2015-09-24 20:42:45,727][INFO ][node ] [The Profile] stopped
[2015-09-24 20:42:45,727][INFO ][node ] [The Profile] closing ...
[2015-09-24 20:42:45,735][INFO ][node ] [The Profile] closed
How do we bring the whole beast to life?
Rebooting B and C makes no difference at all
I am hesitant to cycle A, as that is what our app is hitting...
Well, we do not know what brought it to life, but it kind of magically came back up.
I believe that the shard reroute, (shown here: elasticsearch: Did I lose data when two of my three nodes went down? ) caused the nodes to rejoin the cluster. Our theory is that node A, the only surviving node, was not a "healthy" master, because it knew that one shard (the "p" cut of shard 1, as spelled out here: elasticsearch: Did I lose data when two of my three nodes went down? ) was not allocated.
Since the master knew it was not intact, the other nodes declined to join the cluster, throwing the "MasterNotDiscoveredException"
Once we got all the "p" shards assigned to the surviving A node, the other nodes joined up, and did the whole replicating dance.
HOWEVER Data was lost by allocating the shard like that. We ultimately set up a new cluster, and are rebuilding the index (which takes several days).

Elastic search (1.6) not starting on windows7 with jdk1.7.0_71

I am new to elastic search and when i start it ..keep getting below warning. can anybody help here..what need to to be changed.
[2015-06-25 13:04:15,143][WARN ][bootstrap ] jvm uses the client vm, make sure to run `java` with the server vm for best performance by adding `-server` to the command line
[2015-06-25 13:04:15,244][INFO ][node ] [Kubik] version[1.6.0], pid[20068], build[cdd3ac4/2015-06-09T13:36:34Z]
[2015-06-25 13:04:15,244][INFO ][node ] [Kubik] initializing ...
[2015-06-25 13:04:15,247][INFO ][plugins ] [Kubik] loaded [], sites []
[2015-06-25 13:04:15,277][INFO ][env ] [Kubik] using [1] data paths, mounts [[System (C:)]], net usable_space [109.3gb], net total_space [238.1gb], types [NTFS]
[2015-06-25 13:04:18,034][INFO ][node ] [Kubik] initialized
[2015-06-25 13:04:18,034][INFO ][node ] [Kubik] starting ...
[2015-06-25 13:04:18,317][INFO ][transport ] [Kubik] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.154.249.29:9300]}
[2015-06-25 13:04:18,669][INFO ][discovery ] [Kubik] elasticsearch/ZWZR28dARWqqEf8FOn0Hgw
[2015-06-25 13:04:18,688][WARN ][transport.netty ] [Kubik] exception caught on transport layer [[id: 0xdfa8e460]], closing connection
java.net.SocketException: Permission denied: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Resolved my issue.... putting this as Answer so that it will be helpful for others.
go to installed ElasticSearch setup and go to folder "config" and then file "elasticsearch.yml" and add/modify 'network.host' property with "127.0.0.1" .

Install elasticsearch on OpenShift

I installed a pre build Elasticsearch 1.0.0 version by reading this tutorial. If I start elasticsearch I got the following error message, Should I try an older version of ES or how to fix this issue?
[elastic-dataportal.rhcloud.com elasticsearch-1.0.0]\> ./bin/elasticsearch
[2014-02-25 10:02:18,757][INFO ][node ] [Desmond Pitt] version[1.0.0], pid[203443], build[a46900e/2014-02-12T16:18:34Z]
[2014-02-25 10:02:18,764][INFO ][node ] [Desmond Pitt] initializing ...
[2014-02-25 10:02:18,780][INFO ][plugins ] [Desmond Pitt] loaded [], sites []
OpenJDK Server VM warning: You have loaded library /var/lib/openshift/430c93b1500446b03a00005c/app-root/data/elasticsearch-1.0.0/lib/sigar/libsigar-x86-linux.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
[2014-02-25 10:02:32,198][INFO ][node ] [Desmond Pitt] initialized
[2014-02-25 10:02:32,205][INFO ][node ] [Desmond Pitt] starting ...
[2014-02-25 10:02:32,813][INFO ][transport ] [Desmond Pitt] bound_address {inet[/127.8.212.129:3306]}, publish_address {inet[/127.8.212.129:3306]}
[2014-02-25 10:02:35,949][INFO ][cluster.service ] [Desmond Pitt] new_master [Desmond Pitt][_bWO_h9ETTWrMNr7x_yALg][ex-std-node134.prod.rhcloud.com][inet[/127.8.212.129:3306]], reason: zen-disco-join (elected_as_master)
[2014-02-25 10:02:36,167][INFO ][discovery ] [Desmond Pitt] elasticsearch/_bWO_h9ETTWrMNr7x_yALg
{1.0.0}: Startup Failed ...
- BindHttpException[Failed to bind to [8080]]
ChannelException[Failed to bind to: /127.8.212.129:8080]
BindException[Address already in use]
You first have to stop the running demo application, which is already bound to 8080. This can be done with this command:
ctl_app stop
After running this command you will be able to start elasticsearch on the port 8080. However this is not recommended for production environments.
I would recommend installing ElasticSearch with this cartridge: https://github.com/ncdc/openshift-elasticsearch-cartridge
It will save you the headaches of manual custom configurations.
you try to assign ES to port 8080, which already is taken. the culprit in the config from there is http.port: ${OPENSHIFT_DIY_PORT}. just leave both port configs out of the config or assign the envvar some other port. the default ports for ES are 9200 for http and 9300.

elasticsearch auto-discovery rackspace not working

I'm trying to use ElasticSearch for an application I'm building, and I am hosting it on Rackspace servers. However, the auto-discovery feature is not working. I thought that it was because auto-discovery uses broadcast and multicast to find the other nodes with the matching cluster name. I found this article saying that Rackspace does now support multicast and broadcast with their new Cloud Networks feature. Then following the article's instructions I created a network, and added that network to both of the servers the nodes were running on. I then tried restarting ElasticSearch on both nodes, but they didn't find each other, and each declared themselves as the "master" (here's the output from the logs):
[2013-04-03 22:14:03,516][INFO ][node ] [Nemesis] {0.20.6}[2752]: initializing ...
[2013-04-03 22:14:03,530][INFO ][plugins ] [Nemesis] loaded [], sites []
[2013-04-03 22:14:07,873][INFO ][node ] [Nemesis] {0.20.6}[2752]: initialized
[2013-04-03 22:14:07,873][INFO ][node ] [Nemesis] {0.20.6}[2752]: starting ...
[2013-04-03 22:14:08,052][INFO ][transport ] [Nemesis] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/166.78.177.149:9300]}
[2013-04-03 22:14:11,117][INFO ][cluster.service ] [Nemesis] new_master [Nemesis][3ih_VZsNQem5W4csDk-Ntg][inet[/166.78.177.149:9300]], reason: zen-disco-join (elected_as_master)
[2013-04-03 22:14:11,168][INFO ][discovery ] [Nemesis] elasticsearch/3ih_VZsNQem5W4csDk-Ntg
[2013-04-03 22:14:11,202][INFO ][http ] [Nemesis] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/166.78.177.149:9200]}
[2013-04-03 22:14:11,202][INFO ][node ] [Nemesis] {0.20.6}[2752]: started
[2013-04-03 22:14:11,275][INFO ][gateway ] [Nemesis] recovered [0] indices into cluster_state
The other node's log:
[2013-04-03 22:13:54,538][INFO ][node ] [Jaguar] {0.20.6}[3364]: initializing ...
[2013-04-03 22:13:54,546][INFO ][plugins ] [Jaguar] loaded [], sites []
[2013-04-03 22:13:58,825][INFO ][node ] [Jaguar] {0.20.6}[3364]: initialized
[2013-04-03 22:13:58,826][INFO ][node ] [Jaguar] {0.20.6}[3364]: starting ...
[2013-04-03 22:13:58,977][INFO ][transport ] [Jaguar] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/166.78.63.101:9300]}
[2013-04-03 22:14:02,041][INFO ][cluster.service ] [Jaguar] new_master [Jaguar][WXAO9WOoQDuYQo7Z2GeAOw][inet[/166.78.63.101:9300]], reason: zen-disco-join (elected_as_master)
[2013-04-03 22:14:02,094][INFO ][discovery ] [Jaguar] elasticsearch/WXAO9WOoQDuYQo7Z2GeAOw
[2013-04-03 22:14:02,129][INFO ][http ] [Jaguar] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/166.78.63.101:9200]}
[2013-04-03 22:14:02,129][INFO ][node ] [Jaguar] {0.20.6}[3364]: started
[2013-04-03 22:14:02,211][INFO ][gateway ] [Jaguar] recovered [0] indices into cluster_state
Is adding the network not enough (Rackspace also gave me an IP for this network)? Do I need to somehow specify in the conf file to check that network when using multicast to find other nodes?
I also found this article which offered a different approach. Per the article's instructions I put this into /config/elasticsearch.yml:
cloud:
account: account #
key: account key
compute:
type: rackspace
discovery:
type: cloud
However, then when I tried to restart ElasticSearch I got this:
Stopping ElasticSearch...
Stopped ElasticSearch.
Starting ElasticSearch...
Waiting for ElasticSearch.......
WARNING: ElasticSearch may have failed to start.
And it did fail to start. I checked into the log file for any errors, but this was all that was there:
[2013-04-03 22:31:00,788][INFO ][node ] [Chamber] {0.20.6}[4354]: initializing ...
[2013-04-03 22:31:00,797][INFO ][plugins ] [Chamber] loaded [], sites []
And it stopped there without any errors and without continuing.
Has anyone successfully gotten ElasticSearch to work in the Rackspace cloud before? I know that the unicast option is also available, but I'd prefer to not have to specify each IP address individually, as I would like it to be easy to add other nodes later. Thanks!
UPDATE
I haven't solved the issue yet, but after some searching I found this post that says the "old" cloud plugin was discontinued and replaced with just an Ec2 plugin for Amazon's cloud, which explains why the changes I made to the config file do not work.
Multicast is disabled on public cloud for security reasons(can be verified with ifconfig). Here is an article that should get you what you need:
https://developer.rackspace.com/blog/elasticsearch-autodiscovery-on-the-rackspace-cloud/

Resources