corosync immediately shuts down after start

corosync immediately shuts down after start - cluster-computing

I am running a pacemaker cluster with corosync on two nodes.
I had to restart node2 and after reboot and doing
service corosync start
corosync is started but shuts down itself immediately.
After the log entry "Completed service synchronization, ready to provide service." there is an entry "Node was shut down by a signal" and the shutdown starts.
This is the complete log output:
notice [MAIN ] Corosync Cluster Engine ('2.3.4'): started and ready to
provide service.
info [MAIN ] Corosync built-in features: debug testagents augeas systemd pie relro bindnow
warning [MAIN ] member section is deprecated.
warning [MAIN ] Please migrate config file to nodelist.
notice [TOTEM ] Initializing transport (UDP/IP Unicast).
notice [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none
notice [TOTEM ] Initializing transport (UDP/IP Unicast).
notice [TOTEM ] Initializing transmit/receive security (NSS) crypto: none hash: none
notice [TOTEM ] The network interface [192.168.1.102] is now up.
notice [SERV ] Service engine loaded: corosync configuration map access [0]
info [QB ] server name: cmap
notice [SERV ] Service engine loaded: corosync configuration service [1]
info [QB ] server name: cfg
notice [SERV ] Service engine loaded: corosync cluster closed process group service v1.01 [2]
info [QB ] server name: cpg
notice [SERV ] Service engine loaded: corosync profile loading service [4]
notice [QUORUM] Using quorum provider corosync_votequorum
notice [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
notice [SERV ] Service engine loaded: corosync vote quorum service v1.0 [5]
info [QB ] server name: votequorum
notice [SERV ] Service engine loaded: corosync cluster quorum service v0.1 [3]
info [QB ] server name: quorum
notice [TOTEM ] The network interface [x.x.x.3] is now up.
notice [TOTEM ] adding new UDPU member {x.x.x.3}
notice [TOTEM ] adding new UDPU member {x.x.x.2}
warning [TOTEM ] Incrementing problem counter for seqid 1 iface x.x.x.3 to [1 of 10]
notice [TOTEM ] A new membership (192.168.1.102:7420) was formed. Members joined: -1062731418
notice [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
notice [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
notice [VOTEQ ] Waiting for all cluster members. Current votes: 1 expected_votes: 2
notice [QUORUM] Members[1]: -1062731418
notice [MAIN ] Completed service synchronization, ready to provide service.
notice [MAIN ] Node was shut down by a signal
notice [SERV ] Unloading all Corosync service engines.
info [QB ] withdrawing server sockets
notice [SERV ] Service engine unloaded: corosync vote quorum service v1.0
info [QB ] withdrawing server sockets
notice [SERV ] Service engine unloaded: corosync configuration map access
info [QB ] withdrawing server sockets
notice [SERV ] Service engine unloaded: corosync configuration service
info [QB ] withdrawing server sockets
notice [SERV ] Service engine unloaded: corosync cluster closed process group service v1.01
info [QB ] withdrawing server sockets
notice [SERV ] Service engine unloaded: corosync cluster quorum service v0.1
notice [SERV ] Service engine unloaded: corosync profile loading service
notice [MAIN ] Corosync Cluster Engine exiting normally

This seems to be an issue with openSUSE 13.2.
Since this version you can find the line
StopWhenUnneeded=yes
in the file
/usr/lib/systemd/system/corosync.service
which controls the service via "service corosync start/stop/etc".
If you do not have the service enabled it will automatically be stopped after the manual start. The solution is to enable the service.
I did not enable it until now because I always started the service manually but since the upgrade to 13.2 it is neccessary.

Related

What is causing elasticsearch to shutdown shortly after starting up?

I'm having an issue with Elasticsearch on EC2 where I'm starting up several new instances from the same AMI, and very occasionally (like < 1% of the time), the Elasticsearch service will stop shortly after starting. I've looked at the log file, but it's not really clear to me why the service is stopping. Are there any clues in this that I'm missing, or is there anywhere else I should look for logs when this happens?
[2020-07-28T18:17:44,251][INFO ][o.e.c.c.ClusterBootstrapService] [ip-10-0-0-68] no discovery configuration found, will perform best-effort cluster bootstrapping after [3s] unless existing master is discovered
[2020-07-28T18:17:44,375][INFO ][o.e.c.s.MasterService ] [ip-10-0-0-68] elected-as-master ([1] nodes joined)[{ip-10-0-0-68}{C1lEYCg6RUWry4avn4isxw}{IjXE3KNOQO2UeZyrX2o3FA}{127.0.0.1}{127.0.0.1:9300}{dilm}{ml.machine_memory=32601837568, xpack.installed=true, ml.max_open_jobs=20} elect leader, _BECOME_MASTER_TASK_, _FINISH_ELECTION_], term: 4, version: 26, delta: master node changed {previous [], current [{ip-10-0-0-68}{C1lEYCg6RUWry4avn4isxw}{IjXE3KNOQO2UeZyrX2o3FA}{127.0.0.1}{127.0.0.1:9300}{dilm}{ml.machine_memory=32601837568, xpack.installed=true, ml.max_open_jobs=20}]}
[2020-07-28T18:17:44,416][INFO ][o.e.c.s.ClusterApplierService] [ip-10-0-0-68] master node changed {previous [], current [{ip-10-0-0-68}{C1lEYCg6RUWry4avn4isxw}{IjXE3KNOQO2UeZyrX2o3FA}{127.0.0.1}{127.0.0.1:9300}{dilm}{ml.machine_memory=32601837568, xpack.installed=true, ml.max_open_jobs=20}]}, term: 4, version: 26, reason: Publication{term=4, version=26}
[2020-07-28T18:17:44,446][INFO ][o.e.h.AbstractHttpServerTransport] [ip-10-0-0-68] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2020-07-28T18:17:44,447][INFO ][o.e.n.Node ] [ip-10-0-0-68] started
[2020-07-28T18:17:44,595][INFO ][o.e.l.LicenseService ] [ip-10-0-0-68] license [a9a29e21-5167-497e-9e49-ccc785ea2d47] mode [basic] - valid
[2020-07-28T18:17:44,596][INFO ][o.e.x.s.s.SecurityStatusChangeListener] [ip-10-0-0-68] Active license is now [BASIC]; Security is disabled
[2020-07-28T18:17:44,602][INFO ][o.e.g.GatewayService ] [ip-10-0-0-68] recovered [0] indices into cluster_state
[2020-07-28T18:18:29,947][INFO ][o.e.n.Node ] [ip-10-0-0-68] stopping ...
[2020-07-28T18:18:29,962][INFO ][o.e.x.w.WatcherService ] [ip-10-0-0-68] stopping watch service, reason [shutdown initiated]
[2020-07-28T18:18:29,963][INFO ][o.e.x.w.WatcherLifeCycleService] [ip-10-0-0-68] watcher has stopped and shutdown
[2020-07-28T18:18:30,014][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [ip-10-0-0-68] [controller/2184] [Main.cc#150] Ml controller exiting
[2020-07-28T18:18:30,015][INFO ][o.e.x.m.p.NativeController] [ip-10-0-0-68] Native controller process has stopped - no new native processes can be started
[2020-07-28T18:18:30,024][INFO ][o.e.n.Node ] [ip-10-0-0-68] stopped
[2020-07-28T18:18:30,024][INFO ][o.e.n.Node ] [ip-10-0-0-68] closing ...
[2020-07-28T18:18:30,032][INFO ][o.e.n.Node ] [ip-10-0-0-68] closed

[2020-07-28T18:18:29,947][INFO ][o.e.n.Node ] [ip-10-0-0-68] stopping ...
This log line means Elasticsearch shut down gracefully after receiving a shutdown signal (typically SIGTERM) from an external source. It's not possible to say what the external source is, it depends on your system. It could for instance be systemd if that's how you're starting Elasticsearch. If so, hopefully its logs tell you why it's sending that shutdown signal.

adminCenter-1.0 and jaxrs-2.1

Running WebSphere Liberty 20.0.0.1
Enabling adminCenter on a liberty server running jaxrs I gethis message:
The configured features jaxrs-2.1 and adminCenter-1.0 include one or more features that cause the conflict.
So how can I run adminCenter together with other applications require some newer features ?
The complete feature set is :
<featureManager>
<feature>jaxrs-2.1</feature>
<feature>adminCenter-1.0</feature>
</featureManager>
/bwa

Try to update to 20.0.0.2, it works fine on mine. Or try to clean restart on yours.
Launching defaultServer (WebSphere Application Server 20.0.0.2/wlp-1.0.37.cl200220200204-1746) on Eclipse OpenJ9 VM, version 11.0.6+10 (en_PL)
[AUDIT ] CWWKE0001I: The server defaultServer has been launched.
[AUDIT ] CWWKE0100I: This product is licensed for development, and limited production use. The full license terms can be viewed here: https://public.dhe.ibm.com/ibmdl/export/pub/software/websphere/wasdev/license/base_ilan/ilan/20.0.0.2/lafiles/en.html
[INFO ] CWWKE0002I: The kernel started after 0.858 seconds
[INFO ] CWWKF0007I: Feature update started.
[WARNING ] CWWKS3103W: There are no users defined for the BasicRegistry configuration of ID com.ibm.ws.security.registry.basic.config[basic].
[INFO ] CWWKS0007I: The security service is starting...
[INFO ] OSGi Work Path [ D:\Liberty\wlp-javaee8-20.0.0.2\wlp\usr\servers\defaultServer\workarea\org.eclipse.osgi\41\data ]
[AUDIT ] CWWKZ0058I: Monitoring dropins for applications.
[INFO ] Aries Blueprint packages not available. So namespaces will not be registered
[INFO ] DYNA1001I: WebSphere Dynamic Cache instance named baseCache initialized successfully.
[INFO ] DYNA1071I: The cache provider default is being used.
[INFO ] DYNA1056I: Dynamic Cache (object cache) initialized successfully.
[AUDIT ] CWPKI0820A: The default keystore has been created using the 'keystore_password' environment variable.
[INFO ] CWWKS4105I: LTPA configuration is ready after 0.006 seconds.
[INFO ] CWWKS0008I: The security service is ready.
[INFO ] CWWKS1123I: The collective authentication plugin with class name NullCollectiveAuthenticationPlugin has been activated.
[INFO ] Successfully loaded default keystore: D:/Liberty/wlp-javaee8-20.0.0.2/wlp/usr/servers/defaultServer/resources/security/key.p12 of type: PKCS12
[INFO ] CWWKX1015I: FILE persistence layer initialized for the Admin Center.
[INFO ] CWWKX1063I: FILE persistence layer initialized for the Admin Center tool data loader.
[INFO ] CWWKO0219I: TCP Channel defaultHttpEndpoint has been started and is now listening for requests on host kubernetes.docker.internal (IPv4: 127.0.0.1) port 9080.
[INFO ] CWWKO0219I: TCP Channel defaultHttpEndpoint-ssl has been started and is now listening for requests on host kubernetes.docker.internal (IPv4: 127.0.0.1) port 9443.
[AUDIT ] CWWKF0012I: The server installed the following features: [adminCenter-1.0, distributedMap-1.0, el-3.0, jaxrs-2.1, jaxrsClient-2.1, jndi-1.0, json-1.0, jsonp-1.1, jsp-2.3, localConnector-1.0, restConnector-2.0, servlet-4.0, ssl-1.0].
[INFO ] CWWKF0008I: Feature update completed in 1.038 seconds.
[AUDIT ] CWWKF0011I: The defaultServer server is ready to run a smarter planet. The defaultServer server started in 1.897 seconds.
[INFO ] SRVE0169I: Loading Web Module: IBMJMXConnectorREST.
[INFO ] SRVE0169I: Loading Web Module: The Liberty Server Config Tool.
[INFO ] SRVE0250I: Web Module The Liberty Server Config Tool has been bound to default_host.
[INFO ] SRVE0250I: Web Module IBMJMXConnectorREST has been bound to default_host.
[AUDIT ] CWWKT0016I: Web application available (default_host): http://localhost:9080/ibm/adminCenter/serverConfig-1.0/
[AUDIT ] CWWKT0016I: Web application available (default_host): http://localhost:9080/IBMJMXConnectorREST/
[INFO ] CWWKX0103I: The JMX REST connector is running and is available at the following service URL: service:jmx:rest://localhost:9443/IBMJMXConnectorREST
[INFO ] CWWKX0103I: The JMX REST connector is running and is available at the following service URL: service:jmx:rest://localhost:9443/IBMJMXConnectorREST
[INFO ] SRVE0169I: Loading Web Module: ibm/api.
[INFO ] SRVE0250I: Web Module ibm/api has been bound to default_host.
[AUDIT ] CWWKT0016I: Web application available (default_host): http://localhost:9080/ibm/api/
[INFO ] SRVE0169I: Loading Web Module: The Liberty Explore Tool.
[INFO ] SRVE0250I: Web Module The Liberty Explore Tool has been bound to default_host.
[AUDIT ] CWWKT0016I: Web application available (default_host): http://localhost:9080/ibm/adminCenter/explore-1.0/
[INFO ] SESN8501I: The session manager did not find a persistent storage location; HttpSession objects will be stored in the local application server's memory.
[INFO ] SRVE0169I: Loading Web Module: The Liberty Admin Center.
[INFO ] SRVE0250I: Web Module The Liberty Admin Center has been bound to default_host.
[AUDIT ] CWWKT0016I: Web application available (default_host): http://localhost:9080/adminCenter/
[INFO ] SESN0176I: A new session context will be created for application key default_host/ibm/api
[INFO ] SESN0176I: A new session context will be created for application key default_host/adminCenter
[INFO ] SESN0176I: A new session context will be created for application key default_host/ibm/adminCenter/serverConfig-1.0
[INFO ] SESN0176I: A new session context will be created for application key default_host/ibm/adminCenter/explore-1.0
[INFO ] SESN0176I: A new session context will be created for application key default_host/IBMJMXConnectorREST
[INFO ] SESN0172I: The session manager is using the Java default SecureRandom implementation for session ID generation.
[INFO ] SESN0172I: The session manager is using the Java default SecureRandom implementation for session ID generation.
[INFO ] SESN0172I: The session manager is using the Java default SecureRandom implementation for session ID generation.
[INFO ] SESN0172I: The session manager is using the Java default SecureRandom implementation for session ID generation.
[INFO ] SESN0172I: The session manager is using the Java default SecureRandom implementation for session ID generation.
[INFO ] DYNA1056I: Dynamic Cache (object cache) initialized successfully.
[INFO ] SRVE0242I: [com.ibm.ws.jmx.connector.server.rest] [/IBMJMXConnectorREST] [JMXRESTProxyServlet]: Initialization successful.
[INFO ] SRVE9103I: A configuration file for a web server plugin was automatically generated for this server at D:\Liberty\wlp-javaee8-20.0.0.2\wlp\usr\servers\defaultServer\logs\state\plugin-cfg.xml.
[INFO ] SRVE0242I: [com.ibm.ws.ui] [/adminCenter] [/login.jsp]: Initialization successful.
[AUDIT ] CWWKS1100A: Authentication did not succeed for user ID admin. An invalid user ID or password was specified.
[AUDIT ] CWWKG0016I: Starting server configuration update.
[AUDIT ] CWWKG0017I: The server configuration was successfully updated in 0.030 seconds.
[INFO ] SRVE0242I: [com.ibm.ws.ui] [/adminCenter] [/toolbox.jsp]: Initialization successful.
[INFO ] SRVE0242I: [com.ibm.ws.rest.handler] [/ibm/api] [RESTProxyServlet]: Initialization successful.
[INFO ] CWWKX1029I: The Admin Center default toolbox for user admin loaded.
[INFO ] CWWKX1000I: The Admin Center default catalog loaded.

It turned out to be some strange issue with my install. After running a fresh binary and installUtility it works fine.

URL to access cluster environment for ElasticSearch2.4.3

We have an ElasticSearch2.4.3 cluster environment of two nodes. I want to ask what URL should I provide to access the environment so that it works in High Availability?
We have two master Node1 and Node2. The host name for Node1 is node1.elastic.com and Node2 is node2.elastic.com. Both the nodes are master according to formula (n/2 +1).
We have enabled cluster setting by modifying the elastic.yml file by adding
discovery.zen.ping.unicast.hosts for the two nodes.
From our java application, we are connecting to node1.elastic.com. It works fine till both the nodes are up. Data is getting populated in both the ES servers and everything is good. But as soon as Node1 goes down entire elastic search cluster gets disconnected. And it automatically doesn't switch to Node2 for processing requests.
I feel like the URL which I am giving is not right, and it has to be something else to provide an automatic switch.
Logs from Node1
[2020-02-10 12:15:45,639][INFO ][node ] [Wildpride] initialized
[2020-02-10 12:15:45,639][INFO ][node ] [Wildpride] starting ...
[2020-02-10 12:15:45,769][WARN ][common.network ] [Wildpride] _non_loopback_ is deprecated as it picks an arbitrary interface. specify explicit scope(s), interface(s), address(es), or hostname(s) instead
[2020-02-10 12:15:45,783][WARN ][common.network ] [Wildpride] _non_loopback_ is deprecated as it picks an arbitrary interface. specify explicit scope(s), interface(s), address(es), or hostname(s) instead
[2020-02-10 12:15:45,784][INFO ][transport ] [Wildpride] publish_address {000.00.00.204:9300}, bound_addresses {[fe80::9af2:b3ff:fee9:90ca]:9300}, {000.00.00.204:9300}
[2020-02-10 12:15:45,788][INFO ][discovery ] [Wildpride] XXXX/Hg_5eGZIS0e249KUTQqPPg
[2020-02-10 12:16:15,790][WARN ][discovery ] [Wildpride] waited for 30s and no initial state was set by the discovery
[2020-02-10 12:16:15,799][WARN ][common.network ] [Wildpride] _non_loopback_ is deprecated as it picks an arbitrary interface. specify explicit scope(s), interface(s), address(es), or hostname(s) instead
[2020-02-10 12:16:15,802][WARN ][common.network ] [Wildpride] _non_loopback_ is deprecated as it picks an arbitrary interface. specify explicit scope(s), interface(s), address(es), or hostname(s) instead
[2020-02-10 12:16:15,803][INFO ][http ] [Wildpride] publish_address {000.00.00.204:9200}, bound_addresses {[fe80::9af2:b3ff:fee9:90ca]:9200}, {000.00.00.204:9200}
[2020-02-10 12:16:15,803][INFO ][node ] [Wildpride] started
[2020-02-10 12:16:35,552][INFO ][node ] [Wildpride] stopping ...
[2020-02-10 12:16:35,619][WARN ][discovery.zen.ping.unicast] [Wildpride] [17] failed send ping to {#zen_unicast_1#}{000.00.00.206}{000.00.00.206:9300}
java.lang.IllegalStateException: can't add nodes to a stopped transport
at org.elasticsearch.transport.netty.NettyTransport.connectToNode(NettyTransport.java:916)
at org.elasticsearch.transport.netty.NettyTransport.connectToNodeLight(NettyTransport.java:906)
at org.elasticsearch.transport.TransportService.connectToNodeLight(TransportService.java:267)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$3.run(UnicastZenPing.java:395)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
[2020-02-10 12:16:35,620][WARN ][discovery.zen.ping.unicast] [Wildpride] failed to send ping to [{Wildpride}{Hg_5eGZIS0e249KUTQqPPg}{000.00.00.204}{000.00.00.204:9300}]
SendRequestTransportException[[Wildpride][000.00.00.204:9300][internal:discovery/zen/unicast]]; nested: TransportException[TransportService is closed stopped can't send request];
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:340)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPingRequestToNode(UnicastZenPing.java:440)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing.sendPings(UnicastZenPing.java:426)
at org.elasticsearch.discovery.zen.ping.unicast.UnicastZenPing$2.doRun(UnicastZenPing.java:249)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: TransportException[TransportService is closed stopped can't send request]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:320)
... 7 more
[2020-02-10 12:16:35,623][INFO ][node ] [Wildpride] stopped
[2020-02-10 12:16:35,623][INFO ][node ] [Wildpride] closing ...
[2020-02-10 12:16:35,642][INFO ][node ] [Wildpride] closed

TL;DR: There is no automatic switch in elasticsearch and you'll need some kind of loadbalancer in front of the elasticsearch cluster.
For an HA setup, you need at least 3 master eligible nodes. In front of the cluster there have to be a loadbalancer (also HA) to distribute the requests across the cluster. Or the client needs to be somehow aware of cluster and in a failure scenario failover to any node left.
If you go with 2 master nodes, the cluster can get into the "split brain" state. If your network gets somehow fragmented and the nodes become invisible to each other, both of them will think it is the last one working and keep serving the read/write requests independently. In that way they drift away from each other and it will become nearly impossible to fix it - at least, when the fragmentation is gone, there is a lot of trouble to fix. With 3 nodes, in a fragmentation scenario the clusternwill only continue to serve requests if there are at least 2 nodes visible to each other.

Elasticsearch: Node registered to cluster but MasterNotDiscoveredException

I´m having a variation of the usual connection problem between ElasticSearch nodes, however here it does not seem to be related to the network as the client registers with the master without any problem (apparently). My set-up is the following:
One Master node (node.master=true, node.data=true, cluster.name=stokker)
One Client node (Spring Boot 1.3.0.M5) with these settings:
spring.data.elasticsearch.properties.http.enabled=true
spring.data.elasticsearch.cluster-name=stokker
spring.data.elasticsearch.properties.node.local=false
spring.data.elasticsearch.properties.node.data=false
spring.data.elasticsearch.properties.node.client=true
First I start the master node, then the client and I can see that the client registers OK:
[Kilmer] recovered [0] indices into cluster_state
[Kilmer] watch service has started
[Kilmer] bound_address {inet[/0:0:0:0:0:0:0:0:9201]}, publish_address {inet[/159.107.28.230:9201]}
[Kilmer] started
[Kilmer] added {[Thunderclap][VVF_5QnLREac-Du-dZK1IQ][ES00052260][inet[/159.107.28.230:9301]]{client=true, data=false, local=false},}, reason: zen-disco-receive(join from node[[Thunderclap][VVF_5QnLREac-Du-dZK1IQ] [ES00052260][inet[/159.107.28.230:9301]]{client
Client´s console output
org.elasticsearch.node : [Thunderclap] version[1.7.0], pid[12084], build[929b973/2015-07-16T14:31:07Z]
org.elasticsearch.node : [Thunderclap] initializing ...
org.elasticsearch.plugins : [Thunderclap] loaded [], sites []
org.elasticsearch.bootstrap : JNA not found. native methods will be disabled.
org.elasticsearch.node : [Thunderclap] initialized
org.elasticsearch.node : [Thunderclap] starting ...
org.elasticsearch.transport : [Thunderclap] bound_address {inet[/0:0:0:0:0:0:0:0:9301]}, publish_address {inet[/159.107.28.230:9301]}
org.elasticsearch.discovery : [Thunderclap] stokker/VVF_5QnLREac-Du-dZK1IQ
org.elasticsearch.discovery : [Thunderclap] waited for 30s and no initial state was set by the discovery
org.elasticsearch.http : [Thunderclap] bound_address {inet[/0:0:0:0:0:0:0:0:9202]}, publish_address {inet[/159.107.28.230:9202]}
org.elasticsearch.node : [Thunderclap] started
However, when I try to perform some indexing, I get the following exception:
org.elasticsearch.discovery.MasterNotDiscoveredException: waited for [30s]
Any ideas on what am I missing here?
Thanks

I solved the issue by adding this property manually indicating where the master node is:
spring.data.elasticsearch.cluster-nodes=192.168.1.18:9300
If somebody finds a better solution, please let me know, I´m not fully confident in this one.

Elasticsearch fails to start

I'm trying to implement a 2 node ES cluster using Amazon EC2 instances. After everything is setup and I try to start the ES, it fails to start. Below are the config files:
/etc/elasticsearch/elasticsearch.yml - http://pastebin.com/3Q1qNqmZ
/etc/init.d/elasticsearch - http://pastebin.com/f3aJyurR
Below are the /var/log/elasticsearch/es-cluster.log content -
[2014-06-08 07:06:01,761][WARN ][common.jna ] Unknown mlockall error 0
[2014-06-08 07:06:02,095][INFO ][node ] [logstash] version[0.90.13], pid[29666], build[249c9c5/2014-03-25T15:27:12Z]
[2014-06-08 07:06:02,095][INFO ][node ] [logstash] initializing ...
[2014-06-08 07:06:02,108][INFO ][plugins ] [logstash] loaded [], sites []
[2014-06-08 07:06:07,504][INFO ][node ] [logstash] initialized
[2014-06-08 07:06:07,510][INFO ][node ] [logstash] starting ...
[2014-06-08 07:06:07,646][INFO ][transport ] [logstash] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.164.27.207:9300]}
[2014-06-08 07:06:12,177][INFO ][cluster.service ] [logstash] new_master [logstash][vCS_3LzESEKSN-thhGWeGA][inet[/<an_ip_is_here>:9300]], reason: zen-disco-join (elected_as_master)
[2014-06-08 07:06:12,208][INFO ][discovery ] [logstash] es-cluster/vCS_3LzESEKSN-thhGWeGA
[2014-06-08 07:06:12,334][INFO ][http ] [logstash] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/<an_ip_is_here>:9200]}
[2014-06-08 07:06:12,335][INFO ][node ] [logstash] started
[2014-06-08 07:06:12,379][INFO ][gateway ] [logstash] recovered [0] indices into cluster_state

I see several things that you should correct in your configuration files.
1) Need different node names. You are using the same config file for both nodes. You do not want to do this if you are setting node name like you are: node.name: "logstash". Either create separate configuration files with different node.name entries or comment it out and let ES auto assign the node.name.
2) Mlockall setting is throwing an error. I would not start out setting bootstrap.mlockall: True until you've first gotten ES to run without it and then have spent a little time configuring linux to support it. It can cause problems with booting up:
Warning
mlockall might cause the JVM or shell session to exit if it tries to
allocate more memory than is available!
I'd check out the documentation on the configuration variables and be careful about making too many adjustments right out of the gate.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-configuration.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/setup-service.html
If you do want to make memory adjustments to ES this previous stackoverflow article should be helpful:
How to change Elasticsearch max memory size

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

corosync immediately shuts down after start - cluster-computing

Related

What is causing elasticsearch to shutdown shortly after starting up?

adminCenter-1.0 and jaxrs-2.1

URL to access cluster environment for ElasticSearch2.4.3

Elasticsearch: Node registered to cluster but MasterNotDiscoveredException

Elasticsearch fails to start

Categories

Resources