Elasticsearch MasterNotDiscoveredException between client and master - elasticsearch

In a nutshell, I have a standalone ES master instance and a client node that is created within my Java application. The client node discovers the standalone ES instance correctly if the standalone ES instance is started before the client node, which is expected.
The problem I'm facing is this - if for some reason, the client node starts before the standalone ES instance, I see a "MasterNotDiscoveredException", which is again expected. However, I continue seeing the same exception even after I start the standalone ES instance. Is there some configuration I should be changing to fix this?
I'm using ES 1.7.1 with Unicast discovery.
EDIT
Cluster information: The standalone ES instance and the client node together make up a cluster.
Client node stack trace:
11:29:35,634 INFO http [496648366, id=7BCBFQLCTWOO2, ide=tcp://172.17.78.80:61616] [Squidboy] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/172.17.78.80:9200]}
11:29:35,635 INFO node [496648366, id=7BCBFQLCTWOO2, ide=tcp://172.17.78.80:61616] [Squidboy] started
11:30:10,279 ERROR ApplicationLifeCycle [299961584] System startup not complete after 120 seconds ...
11:30:14,706 WARN ElasticSearchStatus [278792216] An Exception occurred during cluster health status update - java.util.concurrent.ExecutionException: org.elasticsearch.discovery.MasterNotDiscoveredException: waited for [30s]
at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.getValue(BaseFuture.java:292)
at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:279)
at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:117)
at com.harry.elastic.node.ElasticSearchStatus.updateClusterHealth(ElasticSearchStatus.java:90)
at com.harry.elastic.node.ElasticSearchStatus.access$000(ElasticSearchStatus.java:37)
at com.harry.elastic.node.ElasticSearchStatus$1.run(ElasticSearchStatus.java:62)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.elasticsearch.discovery.MasterNotDiscoveredException: waited for [30s]
at org.elasticsearch.action.support.master.TransportMasterNodeOperationAction$4.onTimeout(TransportMasterNodeOperationAction.java:164)
at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:231)
at org.elasticsearch.cluster.service.InternalClusterService$NotifyTimeout.run(InternalClusterService.java:560)
... 3 more
Client creation code:
private Node createEmbeddedClientNode() {
ImmutableSettings.Builder settingsBuilder = ImmutableSettings.settingsBuilder()
.put("discovery.zen.ping.multicast.enabled", false)
.put("discovery.zen.ping.unicast.hosts", "localhost[9300-9400]");
return nodeBuilder().settings(settingsBuilder).clusterName("harryService")
.client(true).data(false).node();
}
Master discovery configuration
"discovery": {
"zen": {
"ping": {
"multicast": {
"enabled": false
}
}
}

By default your client node will retry pinging your master node each 30s for 3 times and then give up. So if you started your master node after that time has passed, your client node will not discover it.
Try increasing the retries and/or the timeout, that should help.
.put("discovery.zen.fd.ping_timeout", "1m")
.put("discovery.zen.fd.ping_retries", 5)
With those settings, your client node will keep trying during 5 minutes instead of only 1.5 minutes. However, your master node should really be up already when you start your application.
Another settings that might help is the following, as it is true by default and your master will ignore client pings during master election, but since there's a single master node it might not make any difference, still worth a try:
.put("discovery.zen.master_election.filter_client", false)

I solved the problem by adding unicast configuration in the master node explicitly.
"discovery": {
"zen": {
"ping": {
"multicast": {
"enabled": false
},
"unicast": {
"hosts": "localhost[9300-9400]"
}
}
}
}

Related

storm-twitter hdfsBolt hiveBolt failure on Kerberized cluster

Trying to set up a storm-twitter stream on cluster using hdfsBolt and HiveBolt for flushing data to disk/hive table. using https://github.com/pvillard31/storm-twitter as reference. Followed all instructions to pass keytabs/principal both inside topology and storm.yaml as per https://github.com/apache/storm/blob/master/external/storm-hive/README.md But still getting error on both the bolts.
For HDFSBolt getting:
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
For HiveBolt getting:
com.google.common.util.concurrent.UncheckedExecutionException: java.lang.RuntimeException: Unable to instantiate org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient
Tried many different ways following other posts to make storm aware about secure cluster but seems like it is still expecting a SIMPLE authentication.
my storm.yaml has as follows where 'xyz' is user that is running the zookeeper/nimbus/supervisor/topology on a docker container:
storm.zookeeper.servers:
- "localhost"
nimbus.host: "127.0.0.1"
nimbus.seeds: ["localhost"]
ui.port: 5555
logviewer.port: 5566
hive.keytab.file : "/home/user/.kt/xyz.keytab"
hive.kerberos.principal : "hive/_HOST#HADOOP.DOMAIN.ORG"
hive.metastore.uris : "thrift://<f.q.d.n>:9083"
hdfs.keytab.file : "/home/user/.kt/xyz.keytab"
hdfs.kerberos.principal : "xyz#HADOOP.DOMAIN.ORG"
topology.auto-credentials : ["org.apache.storm.hive.security.AutoHive", "org.apache.storm.hdfs.security.AutoHDFS"]
hiveCredentialsConfigKeys : ["hivecluster"]
"hivecluster": {"hive.keytab.file": "/home/user/storm/hive.keytab", "hive.kerberos.principal": "hive/_HOST#HADOOP.DOMAIN.ORG", "hive.metastore.uris": "thrift://<f.q.d.n>:9083"}
hdfsCredentialsConfigKeys : ["hdfscluster"]
"hdfscluster": {"hdfs.keytab.file": "/home/user/.kt/xyz.keytab", "hdfs.kerberos.principal": "xyz#HADOOP.DOMAIN.ORG"}
I also included keytab info inside topology config:
Config config = new Config();
config.put(HdfsSecurityUtil.STORM_KEYTAB_FILE_KEY, "/home/user/.kt/xyz.keytab");
config.put(HdfsSecurityUtil.STORM_USER_NAME_KEY, "xyz#HADOOP.DOMAIN.ORG");
As well as cluster xmls in hdfsBolt:
public void doPrepare(Map conf, TopologyContext topologyContext, OutputCollector collector) throws IOException {
this.hdfsConfig.addResource(new Path("/etc/hadoop/conf/hdfs-site.xml"));
this.hdfsConfig.addResource(new Path("/etc/hadoop/conf/core-site.xml"));
this.hdfsConfig.addResource(new Path("/etc/hadoop/conf/yarn-site.xml"));
this.hdfsConfig.addResource(new Path("/etc/hive/conf/hive-site.xml"));
this.fs = FileSystem.get(URI.create(this.fsUrl), this.hdfsConfig);
}
built a shaded jar to include everything except storm-core.
Any help would be appreciated.

How to check how many total redis connection , that a REDIS server can given to clients?

We are using REDIS cache , and using Spring-Redis module , we set the maxActiveConnections 10 in application configuration , but sometimes in my applications am seeing below errors
Exception occurred while querying cache : org.springframework.data.redis.RedisConnectionFailureException: Cannot get Jedis connection; nested exception is redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
is it because of in the Redis server their are no more connections to give to my applications or any other reason , can anyone please suggest on this ?
Note : their are 15 applications which are using the same Redis server to store the data , i mean 15 applications need connections from this single redis server only , for now we set 10 as maxActiveConnections for each of the 15 applications
To check how many clients are connected to redis you can use redis-cli and type this command: redis> INFO more specifically info Clients command.
192.168.8.176:8023> info Clients
# Clients
connected_clients:1
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
Form Jedis source code, it seems that the exception happened for the following reason:
Exhausted cache: // The exception was caused by an exhausted pool
or // Otherwise, the exception was caused by the implemented activateObject() or ValidateObject()
Here is the code snippet of Jedis getResource method:
public T getResource() {
try {
return internalPool.borrowObject();
} catch (NoSuchElementException nse) {
if (null == nse.getCause()) { // The exception was caused by an exhausted pool
throw new JedisExhaustedPoolException(
"Could not get a resource since the pool is exhausted", nse);
}
// Otherwise, the exception was caused by the implemented activateObject() or ValidateObject()
throw new JedisException("Could not get a resource from the pool", nse);
} catch (Exception e) {
throw new JedisConnectionException("Could not get a resource from the pool", e);
}
}

Health-check of a redis job flagged as critical in Nomad

When deploying a Redis job in Nomad (0.6), I do not manage to have it healthy in Consul.
I start Consul in a container and make the port 8500 available on localhost.
$ docker container run --name consul -d -p 8500:8500 consul
When I run nomad, it connects correctly to Consul as we can see in the logs.
$ nomad agent -dev
No configuration files loaded
==> Starting Nomad agent...
==> Nomad agent configuration:
Client: true
Log Level: DEBUG
Region: global (DC: dc1)
Server: true
Version: 0.6.0
==> Nomad agent started! Log data will stream in below:
...
2017/08/18 15:45:28.373766 [DEBUG] client.consul: bootstrap contacting following Consul DCs: ["dc1"]
2017/08/18 15:45:28.377703 [INFO] client.consul: discovered following Servers: 127.0.0.1:4647
2017/08/18 15:45:28.378851 [INFO] client: node registration complete
2017/08/18 15:45:28.378895 [DEBUG] client: periodically checking for node changes at duration 5s
2017/08/18 15:45:28.379232 [DEBUG] consul.sync: registered 1 services, 1 checks; deregistered 0 services, 0 checks
...
I then run a redis job with the following configuration file
job "nomad-redis" {
datacenters = ["dc1"]
type = "service"
group "cache" {
task "redis" {
driver = "docker"
config {
image = "redis:3.2"
port_map {
db = 6379
}
}
resources {
cpu = 500 # 500 MHz
memory = 256 # 256MB
network {
mbits = 10
port "db" {}
}
}
service {
name = "redis"
port = "db"
check {
name = "alive"
type = "tcp"
interval = "10s"
timeout = "2s"
}
}
}
}
}
Redis service is added into consul but it appears as critical. Seems the healthcheck cannot be done. From what I understand, checks are done within the task. Is there something I'm missing ?
Running Consul on localhost or in a container attached to the host network (--net=host) fixed the thing.

path.home is not configured in elasticsearch

Exception in thread "main" java.lang.IllegalStateException: path.home is not configured
at org.elasticsearch.env.Environment.(Environment.java:101)
at org.elasticsearch.node.internal.InternalSettingsPreparer.prepareEnvironment(InternalSettingsPreparer.java:81)
at org.elasticsearch.node.Node.(Node.java:128)
at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:145)
at org.elasticsearch.node.NodeBuilder.node(NodeBuilder.java:152)
at JavaAPIMain.main(JavaAPIMain.java:43)
//adding document to elasticsearch using java
Node node = nodeBuilder().clusterName("myapplication").node();
Client client = node.client();
client.prepareIndex("kodcucom", "article", "1")
.setSource(putJsonDocument("ElasticSearch: Java",
"ElasticSeach provides Java API, thus it executes all operations " +
"asynchronously by using client object..",
new Date(),
new String[]{"elasticsearch"},
"Hüseyin Akdoğan")).execute().actionGet();
How about trying this one:
NodeBuilder.nodeBuilder()
.settings(Settings.builder()
.put("path.home", "/path/to/elasticsearch/home/dir")
.node();
Credits: https://github.com/elastic/elasticsearch/issues/15325
Always ask Google about your error message first. There are more than 5k results for your problem.
if you are using intellij or eclipse,
edit configuration and add the below line in your VMoptions
-Des.path.home={dropwizard installation directory}
for example in my mac
-Des.path.home=/Users/supreeth.vp/elasticsearch-2.3.4/bin

UnresolvedAddressException in Logstash+elasticsearch

Logstash is not working in my system(Windows 7).I am using Logstash-1.4.0, kibana-3.0.0, Elasticsearch-1.3.0 version installed in my system.
I created logstash.conf file in logstash-1.4.0 (Logstash-1.4.0/logstash.conf).
input {
file {
path => “C:/apache-tomcat-7.0.62/logs/*access*”
}
}
filter {
date {
match => [ “timestamp” , “dd/MMM/yyyy:HH:mm:ss Z” ]
}
}
output {
elasticsearch { host => “localhost:9205″}
}
And I run the logstash
c:\logstash-1.4.0\bin>logstash agent -f ../logstash.conf
Getting below Exception
log4j, [2015-06-09T15:24:45.342] WARN: org.elasticsearch.transport.netty: [logstash-IT-BHARADWAJ-512441] exception caught on transport layer [[id: 0x0ee1f960]], closing connection
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:123)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:621)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.connect(NioClientSocketPipelineSink.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:70)
etc……..
How to solve this problem
You cant connect to the socket, by default elasticsearch sitting on 9200 port for http and 9300 for tcp. Try change it for 9200 first, its default case.

Resources