I'm actually working on topology taking data from kafka and persist them into elasticsearch. Ok first, I used the basic KafkaSpout from storm dependency to listen for data coming from a precise kafka topic and, I re-implemented the Elasticsearch bolt from the elasticsearch-hadoop project: https://github.com/elastic/elasticsearch-hadoop/blob/master/storm/src/main/java/org/elasticsearch/storm/EsBolt.java. The goal was to write on several indices in elasticsearch.
So, when I process the messages coming from kafka, I have some exceptions when the number of data grow up in the kafka queue. This is one part of the stack trace in the worker logs:
2016-04-13T22:24:44.641+0000 b.s.m.n.Client [ERROR] failed to send 580 messages to Netty-Client-ip-[internal-ip].ec2.internal/[internal-ip]:6700:
java.nio.channels.ClosedChannelException
2016-04-13T22:24:44.641+0000 b.s.m.n.Client [ERROR] failed to send 575 messages to Netty-Client-ip-[internal-ip].ec2.internal/[internal-ip]:6700:
java.nio.channels.ClosedChannelException
2016-04-13T22:25:05.970+0000 b.s.m.n.Client [WARN] Re-connection to ip-[internal-ip].ec2.internal/[internal-ip]:6701 was successful but 52890 messages
has been lost so far
2016-04-13T22:36:33.571+0000 b.s.m.n.StormClientHandler [INFO] Connection failed Netty-Client-ip-ip-[internal-ip].ec2.internal/[internal-ip]:6701
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.8.0_77]
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:1.8.0_77]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_77]
at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.8.0_77]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[na:1.8.0_77]
at org.apache.storm.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) [storm-core-0.9.6.jar:0.9.6]
at org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) [storm-core-0.9.6.jar:0.9.6]
at org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) [storm-core-0.9.6.jar:0.9.6]
at org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) [storm-core-0.9.6.jar:0.9.6]
at org.apache.storm.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) [storm-core-0.9.6.jar:0.9.6]
at org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [storm-core-0.9.6.jar:0.9.6]
at org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [storm-core-0.9.6.jar:0.9.6]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_77]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_77]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
I'm using a storm cluster of 3 nodes (1 nimbus+UI+Zookeeper and 2 supervisors). Storm version 0.9.6. Each of these machines have 4GB RAM and this is the content my storm.yml config file:
storm.zookeeper.servers:
- "nimbus-ip"
storm.local.dir: "/mnt/storm"
nimbus.seeds: ["nimbus-ip"]
storm.zookeeper.port: 2181
ui.port: 8080
nimbus.host: "nimbus-ip"
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
storm.messaging.netty.max_wait_ms: 10000
Can anyone help me to know why workers can't communicate due to Netty-Client hostname resolution? I already saw one report of this issue in the 0.9.4 version of storm https://issues.apache.org/jira/browse/STORM-908. Is it possible that the 0.9.6 version does not fix this issue?
Many thanks!!
I got here from google looking for answers to a similar problem. In my case, the error was:
o.a.s.m.n.Client [ERROR] connection attempt 104 to Netty-Client-ip-XXX-XXX-XXX-XXX.ec2.internal/XXX.XXX.XXX.XXX:6703 failed: java.net.ConnectException: Connection refused: ip-XXX-XXX-XXX-XXX.ec2.internal/XXX.XXX.XXX.XXX:6703
This was appearing on a 2-node storm cluster (v1.0.1).
At first, I thought this was a networking issue with AWS (which is where I was deploying the nodes). I started to look at security group rules, /etc/hosts files etc etc, none of which helped.
After some searching I discovered this: https://issues.apache.org/jira/browse/STORM-1382 and figured that maybe the issue wasn't the network at all, but something on the other end wasn't running.
So, I ssh-d into a worker node and took a look at the supervisor log, which showed me something like this lots and lots:
o.a.s.d.supervisor [INFO] 30236e62-d2e1-4d5c-b75c-f54ef07653a4 still hasn't started
When I looked at the worker.log itself, I discovered there was a problem with the default java version. That was my problem, but other people's problems may be related to other reasons that a worker may fail.
Anyway, once I set the correct default java version it all kicked into life.
Related
As I'm trying to exploit the Camunda Enterprise version, now I'm stuck at the Optimize server. I'm not able to start the Optimize server followed by instruction. Here is the log file
16:59:37.322 [main] DEBUG o.e.j.u.component.AbstractLifeCycle - starting ServerConnector#48974e45{HTTP/1.1,[http/1.1]}{0.0.0.0:8095}
16:59:37.325 [main] WARN o.e.j.u.component.AbstractLifeCycle - FAILED ServerConnector#48974e45{HTTP/1.1,[http/1.1]}{0.0.0.0:8095}: java.io.IOException: Failed to bind to /0.0.0.0:8095
java.io.IOException: Failed to bind to /0.0.0.0:8095
at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:346)
at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:307)
at org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
at org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:231)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
at org.eclipse.jetty.server.Server.doStart(Server.java:385)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:72)
at org.camunda.optimize.jetty.EmbeddedCamundaOptimize.startOptimize(EmbeddedCamundaOptimize.java:169)
at org.camunda.optimize.Main.main(Main.java:17)
Caused by: java.net.BindException: Address already in use: bind
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:342)
... 8 common frames omitted
I know it is obvious but I tried to changing different port. It's still showing the same error, no matter which port I choose. Please help to take a look.
The port is already in use, the message is obvious. All you can do is to kill the process that is listening to this thread.
I use Windows, but you can search for Linux way to do, the principle is the same.
Windows:
Find a process that uses the port 8095 using netstat -aon | find "8095". The result looks like:
TCP 0.0.0.0:8003 0.0.0.0:0 LISTENING 23332
TCP [::]:8003 [::]:0 LISTENING 23332
Find in the Task Manager in the Details tab what runs under the PID (process ID) number 23332.
End such a task. It will be probably java.exe since there is a pending process listening to the port. Restart the IDE and everything should work well. If it is ex. a database or anything else, you have to use a different port than 8095 either at your side or the application that already uses it.
Linux: https://unix.stackexchange.com/questions/140482/kill-any-service-running-at-a-specific-port
Good afternoon everyone, the problem is this I have a server with SonarQube, that when I try to start the windows service, it gets up but then it stops.
The following error appears in the sonarqube log:
2017.11.14 11:04:52 WARN sea[o.e.transport.netty] [sonar-1510653879773] exception caught on transport layer [[id: 0x346b46fb, /127.0.0.1:59330 => /127.0.0.1:9001]], closing connection
java.io.IOException: An existing connection was forcibly closed by the remote host
at sun.nio.ch.SocketDispatcher.read0(Native Method) ~[na:1.8.0_152]
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43) ~[na:1.8.0_152]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_152]
at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.8.0_152]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[na:1.8.0_152]
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) [elasticsearch-1.1.2.jar:na]
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) [elasticsearch-1.1.2.jar:na]
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) [elasticsearch-1.1.2.jar:na]
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) [elasticsearch-1.1.2.jar:na]
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) [elasticsearch-1.1.2.jar:na]
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [elasticsearch-1.1.2.jar:na]
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [elasticsearch-1.1.2.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_152]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_152]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_152]
2017.11.14 11:04:52 INFO app[o.s.p.m.TerminatorThread] Process[search] is stopping
2017.11.14 11:04:52 INFO sea[o.s.p.StopWatcher] Stopping process
Do you know why this error?
I have set the sonar.properties correctly, including set the value of the sonar.search.port property to 0 as this link suggests: Sonar launch error, but the problem persists.
I hope you can give me a hand...
Regards!!!
UnComment below line in sonar property file and change port 9001 to 0
#sonar.search.port=9001
sonar.search.port=0
I had the same problem and i could fix it like this:
Go to this folder: sonarqube-x.x\conf
Open this file: sonar.properties
Find the word: #sonar.web.port
Change the value from 9000 to another port, like 9002
Save your changes
Start again your sonarqube
Access to the server with port 9000: http://localhost:9000
The reason could be the port number of sonarQube OR the one of elasticSearch instance used by sonarQube (I had a similar problem before), so the step to change both/one of those ports are :
Go to this folder: sonarqube-x.x\conf
Open this file: sonar.properties
For sonarQube port:
Find: #sonar.web.port
Change the value from 9000 to another port, like 9123; and un-comment the line (remove # in the beginning) sonar.web.port=9123
For sonarQube's elasticSearch instace port:
Find: #sonar.search.port
change this line To sonar.search.port=0 (this means that he will search for any available port and bind it)
Save your changes
Start again your sonarqube
Access to the server with the new specified sonarQube-port: http://localhost:9123
I experienced this error when upgrading SonarQube from version 5.6.7 to 6.7.1.
Originally I thought this was due to the port number but upon checking the web.log I noticed that there was an error relating to the LDAP plugin (2.2.0.608).
ERROR web[][o.s.s.p.Platform] Background initialization failed. Stopping SonarQube org.sonar.plugins.ldap.LdapException: The property 'ldap.url' is empty and no realm configured to try auto-discovery.
Updating the sonar.properties file with the correct configuration allowed SonarQube to start.
I just occurred an exactly same question as you did.
I started SonarQube with MariaDB 5.5, but I found some error messages in sonarqube-x.x/logs/web.log:
2021.01.21 14:36:17 INFO web[][o.s.p.ProcessEntryPoint] Starting web
......
2021.01.21 14:36:19 ERROR web[][o.s.s.p.Platform] Web server startup failed: Unsupported mysql version: 5.5. Minimal supported version is 5.6.
So I changed my database to MySQL 5.7 and it started successfully.
Not quite sure you had the same problem, but just check these log files and see what actually happened during starting.
I'm trying to monitor Tomcat8 with JDK8 using JMX.
I have setup my agents and modified the startup.sh.
On my zabbix_java_gateway.log I get the following exception:
WARN com.zabbix.gateway.SocketProcessor - error processing request
com.zabbix.gateway.ZabbixException: java.net.SocketTimeoutException:
connection timed out:
service:jmx:rmi:///jndi/rmi://server1.example.com:10052/jmxrmi
at com.zabbix.gateway.JMXItemChecker.getValues(JMXItemChecker.java:97)
~[zabbix-java-gateway-2.4.7.jar:na]
at com.zabbix.gateway.SocketProcessor.run(SocketProcessor.java:63)
~[zabbix-java-gateway-2.4.7.jar:na]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_71]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_71]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_71] Caused by: java.net.SocketTimeoutException: connection timed out:
service:jmx:rmi:///jndi/rmi://server1.example.com:10052/jmxrmi
at com.zabbix.gateway.ZabbixJMXConnectorFactory.connect(ZabbixJMXConnectorFactory.java:123)
~[zabbix-java-gateway-2.4.7.jar:na]
at com.zabbix.gateway.JMXItemChecker.getValues(JMXItemChecker.java:89)
~[zabbix-java-gateway-2.4.7.jar:na]
... 4 common frames omitted
On my startup.sh I added the following to the CATALINA_OPTS
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=10052 -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.password.file=/opt/tomcat-latest/conf/jmxremote.password
-Dcom.sun.management.jmxremote.access.file=/opt/tomcat-latest/conf/jmxremote.access
-Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=server1.example.com
My zabbix_agentd.conf contains the following:
PidFile=/tmp/zabbix_agentd.pid LogFile=/var/log/zabbix_agentd.log
LogFileSize=1 DebugLevel=3 Server=monitor.example.com
Hostname=server1.example.com ListenPort=10050 StartAgents=5 Timeout=30
I have already done the following:
successfully connected to the server using jconsole
remove authentication
telnet the server over port 10050 / 10052
The weird part is that the same setup works well for Tomcat6 with JDK7.
EDIT 1
I've updated the JDK version on the zabbix server to be newer than the JDK installed on my JAVA nodes - still same result - it ends with
ZBX_TCP_READ() failed: [4] Interrupted system call
UPDATE
So I figured it out eventually.
I had on my tomcat configuration file -Djava.rmi.server.hostname=server1.example.com
I miss understood that the hostname should be set to the monitoring server and to the monitored server hostname.
Apparently, there's a bug on Tomcat 6 and this directive does not work.
Remove it solved the problem completely.
Thanks,
Liron
My elasticsearch cluster(version 2.0) is started and the node client is built successfully, but for some reason I'm getting the following error while running queries using node client.
20:15:15.479 [Pool:entitytaskscheduler: Thread#1] DEBUG c.b.o.e.t.c.DataCollectorStatusUpdateTask - collectors updated due to agent reconnected:{}
ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];]
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:154)
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:144)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.<init>(TransportSearchTypeAction.java:116)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.<init>(TransportSearchQueryThenFetchAction.java:73)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.<init>(TransportSearchQueryThenFetchAction.java:67)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:64)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:53)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:70)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:99)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:44)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:70)
at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:58)
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:347)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:85)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:59)
at com.hidden.ppp.management.dc.DataCollectorPollStatusDAOESImpl.findDCIdsUpdatedInTime(DataCollectorPollStatusDAOESImpl.java:151)
at com.hidden.ppp.engine.taskexecutor.cptaskexecs.DataCollectorStatusUpdateTask.execute(DataCollectorStatusUpdateTask.java:199)
at com.hidden.ppp.engine.taskexecutor.cptaskexecs.DataCollectorStatusUpdateTaskRunner.run(DataCollectorStatusUpdateTaskRunner.java:27)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
20:15:15.558 [Pool:entitytaskscheduler: Thread#1] WARN c.b.o.m.d.DataCollectorPollStatusDAOESImpl - blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];
20:15:15.558 [Pool:entitytaskscheduler: Thread#1] DEBUG c.b.o.e.t.c.DataCollectorStatusUpdateTask - collectors for which polls updated after epoc time:1453128243336 - dcids: []
ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];]
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:154)
at org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:144)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.<init>(TransportSearchTypeAction.java:116)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.<init>(TransportSearchQueryThenFetchAction.java:73)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.<init>(TransportSearchQueryThenFetchAction.java:67)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:64)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:53)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:70)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:99)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:44)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:70)
at org.elasticsearch.client.node.NodeClient.doExecute(NodeClient.java:58)
at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:347)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:85)
at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:59)
at com.hidden.ppp.management.dc.DataCollectorPollStatusDAOESImpl.findDCIdsNotUpdatedInTime(DataCollectorPollStatusDAOESImpl.java:182)
at com.hidden.ppp.engine.taskexecutor.cptaskexecs.DataCollectorStatusUpdateTask.execute(DataCollectorStatusUpdateTask.java:204)
at com.hidden.ppp.engine.taskexecutor.cptaskexecs.DataCollectorStatusUpdateTaskRunner.run(DataCollectorStatusUpdateTaskRunner.java:27)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
I've even disabled the "multicast" as per this post - still no luck. Surprisingly, I could access the elasticsearch from sense. Any clues on what is going wrong ?
I faced the same error message and was not able to understand the problem first. I was developing a node client Java application on my laptop, using an Elasticsearch data node on a remote server. For production use, I needed to deploy the Java application on this remote server.
I configured the Java application to talk to the local host only (being on the same host now):
elasticsearch.discovery.zen.ping.unicast.hosts=127.0.0.1
And got the same exception
ClusterBlockException[blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];]
Looking at the logs I also found this entry:
[WARN] [TP-Processor2] DiscoveryService.waitForInitialState -> [cerbera] waited for 30s and no initial state was set by the discovery
So basically, the question was: Why doesn't it find the Elasticsearch data node? I changed port ranges and also played with the multicast setting - without success.
Finally, I checked elasticsearch.yml and found the data node not listening to localhost (127.0.0.1), but instead on the ethernet interface 192.168.1.2.
network.host: 192.168.1.2
http.port: 9200
The final change was simple, I just needed to reconfigure the node client configuration to talk to the correct interface
elasticsearch.discovery.zen.ping.unicast.hosts=192.168.1.2
Now my node client is talking to elasticsearch via the correct interface. Job done.
I had the same problem (using k8s ) I finally replaced my elastic image and the issue was solved...
moved from 6.5.4-debian-9-r41 to 6.8.16-debian-10-r5 (using bitnami images)
I know it is not the best answer - but I really tried suggested answers and nothing worked for me. so my recommendation is to update to a newer better version. (docker makes that easy:) )
I am trying to run the SimpleShortestPathsVertex (aka SimpleShortestPathComputation) example described in the Giraph Quick Start. I am running this on a Hortonworks Sandbox instance (HDP 2.1) using VirtualBox, and I packaged giraph.jar using profile hadoop_2.0.0.
When I try to run the example using
hadoop jar giraph.jar org.apache.giraph.GiraphRunner
org.apache.giraph.examples.SimpleShortestPathsVertex -vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip
/user/hue/tinygraph.txt -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat
-op /user/hue/output/shortestpaths -w 1
I get the following exception
2014-04-30 07:22:15,390 INFO [main] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect to sandbox.hortonworks.com:22181 with poll msecs = 3000
2014-04-30 07:22:15,396 WARN [main] org.apache.giraph.zk.ZooKeeperManager: onlineZooKeeperServers: Got ConnectException
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:579)
at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:701)
at org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:357)
at org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:188)
at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:60)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
I have found a work around - it seems that Giraph expects ZooKeeper to be running on port 22181, while it is actually running on 2181. I have simply used the Ambari interface to set ZooKeeper to run on 22181 (go to http://127.0.0.1:8080/, login admin/admin, Services tab, ZooKeeper and change the port to 22181, save and Service Actions -> Restart All.
Does anyone have a better solution for this problem? Is there a config via which the port should be specified, or is this port in the Giraph source code a typo?
Yes, you can specify each time you run a Giraph job via using option -Dgiraph.zkList=localhost:2181
Also you can set it up in Hadoop configs and then you don't have to pass on this option each time you submit a Giraph job. For that add the following line in conf/core-site.xml file :
<property><name>giraph.zkList</name><value>localhost:2181</value></property>
[Please check the syntax, I don't recall it on top my head and currently I don't have access to a cluster to check it]