Getting ClusterBlockException while running queries using node client - elasticsearch

My elasticsearch cluster(version 2.0) is started and the node client is built successfully, but for some reason I'm getting the following error while running queries using node client.
20:15:15.479 [Pool:entitytaskscheduler: Thread#1] DEBUG c.b.o.e.t.c.DataCollectorStatusUpdateTask - collectors updated due to agent reconnected:{}
ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];]
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:154)
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:144)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.<init>(TransportSearchTypeAction.java:116)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.<init>(TransportSearchQueryThenFetchAction.java:73)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.<init>(TransportSearchQueryThenFetchAction.java:67)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:64)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:53)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:70)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:99)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:44)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:70)
at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:58)
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:347)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:85)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:59)
at com.hidden.ppp.management.dc.DataCollectorPollStatusDAOESImpl.findDCIdsUpdatedInTime(DataCollectorPollStatusDAOESImpl.java:151)
at com.hidden.ppp.engine.taskexecutor.cptaskexecs.DataCollectorStatusUpdateTask.execute(DataCollectorStatusUpdateTask.java:199)
at com.hidden.ppp.engine.taskexecutor.cptaskexecs.DataCollectorStatusUpdateTaskRunner.run(DataCollectorStatusUpdateTaskRunner.java:27)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
20:15:15.558 [Pool:entitytaskscheduler: Thread#1] WARN c.b.o.m.d.DataCollectorPollStatusDAOESImpl - blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
20:15:15.558 [Pool:entitytaskscheduler: Thread#1] DEBUG c.b.o.e.t.c.DataCollectorStatusUpdateTask - collectors for which polls updated after epoc time:1453128243336 - dcids: []
ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];]
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:154)
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:144)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.<init>(TransportSearchTypeAction.java:116)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.<init>(TransportSearchQueryThenFetchAction.java:73)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.<init>(TransportSearchQueryThenFetchAction.java:67)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:64)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:53)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:70)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:99)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:44)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:70)
at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:58)
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:347)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:85)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:59)
at com.hidden.ppp.management.dc.DataCollectorPollStatusDAOESImpl.findDCIdsNotUpdatedInTime(DataCollectorPollStatusDAOESImpl.java:182)
at com.hidden.ppp.engine.taskexecutor.cptaskexecs.DataCollectorStatusUpdateTask.execute(DataCollectorStatusUpdateTask.java:204)
at com.hidden.ppp.engine.taskexecutor.cptaskexecs.DataCollectorStatusUpdateTaskRunner.run(DataCollectorStatusUpdateTaskRunner.java:27)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
I've even disabled the "multicast" as per this post - still no luck. Surprisingly, I could access the elasticsearch from sense. Any clues on what is going wrong ?

I faced the same error message and was not able to understand the problem first. I was developing a node client Java application on my laptop, using an Elasticsearch data node on a remote server. For production use, I needed to deploy the Java application on this remote server.
I configured the Java application to talk to the local host only (being on the same host now):
elasticsearch.discovery.zen.ping.unicast.hosts=127.0.0.1
And got the same exception
ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];]
Looking at the logs I also found this entry:
[WARN] [TP-Processor2] DiscoveryService.waitForInitialState -> [cerbera] waited for 30s and no initial state was set by the discovery
So basically, the question was: Why doesn't it find the Elasticsearch data node? I changed port ranges and also played with the multicast setting - without success.
Finally, I checked elasticsearch.yml and found the data node not listening to localhost (127.0.0.1), but instead on the ethernet interface 192.168.1.2.
network.host: 192.168.1.2
http.port: 9200
The final change was simple, I just needed to reconfigure the node client configuration to talk to the correct interface
elasticsearch.discovery.zen.ping.unicast.hosts=192.168.1.2
Now my node client is talking to elasticsearch via the correct interface. Job done.

I had the same problem (using k8s ) I finally replaced my elastic image and the issue was solved...
moved from 6.5.4-debian-9-r41 to 6.8.16-debian-10-r5 (using bitnami images)
I know it is not the best answer - but I really tried suggested answers and nothing worked for me. so my recommendation is to update to a newer better version. (docker makes that easy:) )

Related

Kafka Streams - Volumes to state stores causes "Failed to delete the state directory" errors

Short overview:
I have a service written in scala using kafka-streams running inside a dedicated docker.
To allow state-stores dirs to be stored where i want, i created a volume from the state stores dir inside the container to some dir outside. Once i did that, started seeing such exceptions in the container logs:
Failed to delete the state directory.
java.nio.file.DirectoryNotEmptyException: /usr/src/app/stateStores/MyServiceStreams___20/0_0
at sun.nio.fs.UnixFileSystemProvider.implDelete(UnixFileSystemProvider.java:242)
at sun.nio.fs.AbstractFileSystemProvider.delete(AbstractFileSystemProvider.java:103)
at java.nio.file.Files.delete(Files.java:1126)
at org.apache.kafka.common.utils.Utils$2.postVisitDirectory(Utils.java:763)
at org.apache.kafka.common.utils.Utils$2.postVisitDirectory(Utils.java:746)
at java.nio.file.Files.walkFileTree(Files.java:2688)
at java.nio.file.Files.walkFileTree(Files.java:2742)
at org.apache.kafka.common.utils.Utils.delete(Utils.java:746)
at org.apache.kafka.streams.processor.internals.StateDirectory.cleanRemovedTasks(StateDirectory.java:290)
at org.apache.kafka.streams.processor.internals.StateDirectory.cleanRemovedTasks(StateDirectory.java:253)
at org.apache.kafka.streams.KafkaStreams$2.run(KafkaStreams.java:795)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
It seems it doesn't effect any functionality, just spams the logs.
Analysing the logs it show the following line before the error message:
Deleting obsolete state directory 0_0 for task 0_0 as 79998936ms has elapsed (cleanup delay is 6000000ms
Increasing the state.cleanup.delay.ms param didn't help.
[Edit]
Some technicals:
The /usr/src/app/stateStores/ are with root permissions
from inside the container, can remove the dirs (running as root). From outside, no (not root..)
kafka-streams version: 2.1.0
Please assist

Apache MiNifi- Putelasticsearch

i make flow, which process real time data from local server and send relevant data to Elasticsearch. I use Minifi, but when I run MiNifi it returned the following error.
Does anyone know, where is the issue?
Thanks
ERROR [Timer-Driven Process Thread-10] o.a.n.p.elasticsearch.PutElasticsearch5 PutElasticsearch5[id=4ed70cbe-9838-35cd-0000-000000000000] PutElasticsearch5[id=4ed70cbe-9838-35cd-0000-000000000000] failed to process due to java.lang.NoClassDefFoundError: Could not initialize class org.elasticsearch.Version; rolling back session: {}
java.lang.NoClassDefFoundError: Could not initialize class org.elasticsearch.Version
at org.elasticsearch.common.io.stream.StreamOutput.(StreamOutput.java:73)
at org.elasticsearch.common.io.stream.BytesStreamOutput.(BytesStreamOutput.java:60)
at org.elasticsearch.common.io.stream.BytesStreamOutput.(BytesStreamOutput.java:57)
at org.elasticsearch.common.io.stream.BytesStreamOutput.(BytesStreamOutput.java:47)
at org.elasticsearch.common.xcontent.XContentBuilder.builder(XContentBuilder.java:67)
at org.elasticsearch.common.settings.Setting.arrayToParsableString(Setting.java:698)
at org.elasticsearch.common.settings.Setting.lambda$listSetting$26(Setting.java:656)
at org.elasticsearch.common.settings.Setting$2.getRaw(Setting.java:660)
at org.elasticsearch.common.settings.Setting.get(Setting.java:300)
at org.elasticsearch.plugins.PluginsService.(PluginsService.java:164)
at org.elasticsearch.client.transport.TransportClient.newPluginService(TransportClient.java:81)
at org.elasticsearch.client.transport.TransportClient.buildTemplate(TransportClient.java:106)
at org.elasticsearch.client.transport.TransportClient.(TransportClient.java:228)
at org.elasticsearch.transport.client.PreBuiltTransportClient.(PreBuiltTransportClient.java:69)
at org.elasticsearch.transport.client.PreBuiltTransportClient.(PreBuiltTransportClient.java:65)
at org.apache.nifi.processors.elasticsearch.AbstractElasticsearch5TransportClientProcessor.getTransportClient(AbstractElasticsearch5TransportClientProcessor.java:230)
at org.apache.nifi.processors.elasticsearch.AbstractElasticsearch5TransportClientProcessor.createElasticsearchClient(AbstractElasticsearch5TransportClientProcessor.java:170)
at org.apache.nifi.processors.elasticsearch.AbstractElasticsearch5Processor.setup(AbstractElasticsearch5Processor.java:94)
at org.apache.nifi.processors.elasticsearch.PutElasticsearch5.onTrigger(PutElasticsearch5.java:177)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1122)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:128)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
In order to reduce its footprint, MiNiFi java only ships with the standard bundle of processors. In order to use the other processors that are present within a standard NiFi deployment in MiNiFi, you need to put the appropriate "nar" file into the "lib" of the MiNiFi deployment.
For "PutElasticSearch" you need "nifi-elasticsearch-nar-.nar" where "" is the version of NiFi that your version of MiNiFi is built off of. Versions 0.4.0 of MiNiFi java uses NiFi 1.5.0.
For more information and a list of the processors that do come bundled with MiNiFi out of the box see the "MiNiFi Java Agent Quick Start" documentation, section "Using Processors Not Packaged with MiNiFi"[1]. For more information on the different versions of MiNiFi correspond to the version of NiFi frameworks, see here[2].
[1] https://nifi.apache.org/minifi/minifi-java-agent-quick-start.html
[2] https://cwiki.apache.org/confluence/display/MINIFI/MiNiFi+Versioning+and+Toolkit+Compatibility

sonarqube exception caught on transport layer

Good afternoon everyone, the problem is this I have a server with SonarQube, that when I try to start the windows service, it gets up but then it stops.
The following error appears in the sonarqube log:
2017.11.14 11:04:52 WARN sea[o.e.transport.netty] [sonar-1510653879773] exception caught on transport layer [[id: 0x346b46fb, /127.0.0.1:59330 => /127.0.0.1:9001]], closing connection
java.io.IOException: An existing connection was forcibly closed by the remote host
at sun.nio.ch.SocketDispatcher.read0(Native Method) ~[na:1.8.0_152]
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43) ~[na:1.8.0_152]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_152]
at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.8.0_152]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[na:1.8.0_152]
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) [elasticsearch-1.1.2.jar:na]
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) [elasticsearch-1.1.2.jar:na]
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) [elasticsearch-1.1.2.jar:na]
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) [elasticsearch-1.1.2.jar:na]
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) [elasticsearch-1.1.2.jar:na]
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [elasticsearch-1.1.2.jar:na]
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [elasticsearch-1.1.2.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_152]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_152]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_152]
2017.11.14 11:04:52 INFO app[o.s.p.m.TerminatorThread] Process[search] is stopping
2017.11.14 11:04:52 INFO sea[o.s.p.StopWatcher] Stopping process
Do you know why this error?
I have set the sonar.properties correctly, including set the value of the sonar.search.port property to 0 as this link suggests: Sonar launch error, but the problem persists.
I hope you can give me a hand...
Regards!!!
UnComment below line in sonar property file and change port 9001 to 0
#sonar.search.port=9001
sonar.search.port=0
I had the same problem and i could fix it like this:
Go to this folder: sonarqube-x.x\conf
Open this file: sonar.properties
Find the word: #sonar.web.port
Change the value from 9000 to another port, like 9002
Save your changes
Start again your sonarqube
Access to the server with port 9000: http://localhost:9000
The reason could be the port number of sonarQube OR the one of elasticSearch instance used by sonarQube (I had a similar problem before), so the step to change both/one of those ports are :
Go to this folder: sonarqube-x.x\conf
Open this file: sonar.properties
For sonarQube port:
Find: #sonar.web.port
Change the value from 9000 to another port, like 9123; and un-comment the line (remove # in the beginning) sonar.web.port=9123
For sonarQube's elasticSearch instace port:
Find: #sonar.search.port
change this line To sonar.search.port=0 (this means that he will search for any available port and bind it)
Save your changes
Start again your sonarqube
Access to the server with the new specified sonarQube-port: http://localhost:9123
I experienced this error when upgrading SonarQube from version 5.6.7 to 6.7.1.
Originally I thought this was due to the port number but upon checking the web.log I noticed that there was an error relating to the LDAP plugin (2.2.0.608).
ERROR web[][o.s.s.p.Platform] Background initialization failed. Stopping SonarQube org.sonar.plugins.ldap.LdapException: The property 'ldap.url' is empty and no realm configured to try auto-discovery.
Updating the sonar.properties file with the correct configuration allowed SonarQube to start.
I just occurred an exactly same question as you did.
I started SonarQube with MariaDB 5.5, but I found some error messages in sonarqube-x.x/logs/web.log:
2021.01.21 14:36:17 INFO web[][o.s.p.ProcessEntryPoint] Starting web
......
2021.01.21 14:36:19 ERROR web[][o.s.s.p.Platform] Web server startup failed: Unsupported mysql version: 5.5. Minimal supported version is 5.6.
So I changed my database to MySQL 5.7 and it started successfully.
Not quite sure you had the same problem, but just check these log files and see what actually happened during starting.

AWS workers can't communicate due to Netty-Client hostname resolution

I'm actually working on topology taking data from kafka and persist them into elasticsearch. Ok first, I used the basic KafkaSpout from storm dependency to listen for data coming from a precise kafka topic and, I re-implemented the Elasticsearch bolt from the elasticsearch-hadoop project: https://github.com/elastic/elasticsearch-hadoop/blob/master/storm/src/main/java/org/elasticsearch/storm/EsBolt.java. The goal was to write on several indices in elasticsearch.
So, when I process the messages coming from kafka, I have some exceptions when the number of data grow up in the kafka queue. This is one part of the stack trace in the worker logs:
2016-04-13T22:24:44.641+0000 b.s.m.n.Client [ERROR] failed to send 580 messages to Netty-Client-ip-[internal-ip].ec2.internal/[internal-ip]:6700:
java.nio.channels.ClosedChannelException
2016-04-13T22:24:44.641+0000 b.s.m.n.Client [ERROR] failed to send 575 messages to Netty-Client-ip-[internal-ip].ec2.internal/[internal-ip]:6700:
java.nio.channels.ClosedChannelException
2016-04-13T22:25:05.970+0000 b.s.m.n.Client [WARN] Re-connection to ip-[internal-ip].ec2.internal/[internal-ip]:6701 was successful but 52890 messages
has been lost so far
2016-04-13T22:36:33.571+0000 b.s.m.n.StormClientHandler [INFO] Connection failed Netty-Client-ip-ip-[internal-ip].ec2.internal/[internal-ip]:6701
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.8.0_77]
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:1.8.0_77]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_77]
at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.8.0_77]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[na:1.8.0_77]
at org.apache.storm.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) [storm-core-0.9.6.jar:0.9.6]
at org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) [storm-core-0.9.6.jar:0.9.6]
at org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) [storm-core-0.9.6.jar:0.9.6]
at org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) [storm-core-0.9.6.jar:0.9.6]
at org.apache.storm.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) [storm-core-0.9.6.jar:0.9.6]
at org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [storm-core-0.9.6.jar:0.9.6]
at org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [storm-core-0.9.6.jar:0.9.6]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_77]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_77]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
I'm using a storm cluster of 3 nodes (1 nimbus+UI+Zookeeper and 2 supervisors). Storm version 0.9.6. Each of these machines have 4GB RAM and this is the content my storm.yml config file:
storm.zookeeper.servers:
- "nimbus-ip"
storm.local.dir: "/mnt/storm"
nimbus.seeds: ["nimbus-ip"]
storm.zookeeper.port: 2181
ui.port: 8080
nimbus.host: "nimbus-ip"
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
storm.messaging.netty.max_wait_ms: 10000
Can anyone help me to know why workers can't communicate due to Netty-Client hostname resolution? I already saw one report of this issue in the 0.9.4 version of storm https://issues.apache.org/jira/browse/STORM-908. Is it possible that the 0.9.6 version does not fix this issue?
Many thanks!!
I got here from google looking for answers to a similar problem. In my case, the error was:
o.a.s.m.n.Client [ERROR] connection attempt 104 to Netty-Client-ip-XXX-XXX-XXX-XXX.ec2.internal/XXX.XXX.XXX.XXX:6703 failed: java.net.ConnectException: Connection refused: ip-XXX-XXX-XXX-XXX.ec2.internal/XXX.XXX.XXX.XXX:6703
This was appearing on a 2-node storm cluster (v1.0.1).
At first, I thought this was a networking issue with AWS (which is where I was deploying the nodes). I started to look at security group rules, /etc/hosts files etc etc, none of which helped.
After some searching I discovered this: https://issues.apache.org/jira/browse/STORM-1382 and figured that maybe the issue wasn't the network at all, but something on the other end wasn't running.
So, I ssh-d into a worker node and took a look at the supervisor log, which showed me something like this lots and lots:
o.a.s.d.supervisor [INFO] 30236e62-d2e1-4d5c-b75c-f54ef07653a4 still hasn't started
When I looked at the worker.log itself, I discovered there was a problem with the default java version. That was my problem, but other people's problems may be related to other reasons that a worker may fail.
Anyway, once I set the correct default java version it all kicked into life.

Zabbix JMX Tomcat8 monitoring fails

I'm trying to monitor Tomcat8 with JDK8 using JMX.
I have setup my agents and modified the startup.sh.
On my zabbix_java_gateway.log I get the following exception:
WARN com.zabbix.gateway.SocketProcessor - error processing request
com.zabbix.gateway.ZabbixException: java.net.SocketTimeoutException:
connection timed out:
service:jmx:rmi:///jndi/rmi://server1.example.com:10052/jmxrmi
at com.zabbix.gateway.JMXItemChecker.getValues(JMXItemChecker.java:97)
~[zabbix-java-gateway-2.4.7.jar:na]
at com.zabbix.gateway.SocketProcessor.run(SocketProcessor.java:63)
~[zabbix-java-gateway-2.4.7.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_71]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_71]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_71] Caused by: java.net.SocketTimeoutException: connection timed out:
service:jmx:rmi:///jndi/rmi://server1.example.com:10052/jmxrmi
at com.zabbix.gateway.ZabbixJMXConnectorFactory.connect(ZabbixJMXConnectorFactory.java:123)
~[zabbix-java-gateway-2.4.7.jar:na]
at com.zabbix.gateway.JMXItemChecker.getValues(JMXItemChecker.java:89)
~[zabbix-java-gateway-2.4.7.jar:na]
... 4 common frames omitted
On my startup.sh I added the following to the CATALINA_OPTS
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=10052 -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.password.file=/opt/tomcat-latest/conf/jmxremote.password
-Dcom.sun.management.jmxremote.access.file=/opt/tomcat-latest/conf/jmxremote.access
-Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=server1.example.com
My zabbix_agentd.conf contains the following:
PidFile=/tmp/zabbix_agentd.pid LogFile=/var/log/zabbix_agentd.log
LogFileSize=1 DebugLevel=3 Server=monitor.example.com
Hostname=server1.example.com ListenPort=10050 StartAgents=5 Timeout=30
I have already done the following:
successfully connected to the server using jconsole
remove authentication
telnet the server over port 10050 / 10052
The weird part is that the same setup works well for Tomcat6 with JDK7.
EDIT 1
I've updated the JDK version on the zabbix server to be newer than the JDK installed on my JAVA nodes - still same result - it ends with
ZBX_TCP_READ() failed: [4] Interrupted system call
UPDATE
So I figured it out eventually.
I had on my tomcat configuration file -Djava.rmi.server.hostname=server1.example.com
I miss understood that the hostname should be set to the monitoring server and to the monitored server hostname.
Apparently, there's a bug on Tomcat 6 and this directive does not work.
Remove it solved the problem completely.
Thanks,
Liron

Resources