Error when submit topology in Nimbus - apache-storm

The error i had when submit a topology
java.net.ConnectException: Connection refused
at backtype.storm.utils.NimbusClient.<init>(NimbusClient.java:36)
at backtype.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:17)
This what i got in nimbus log file
2015-09-22 04:19:58 ClientCnxn [INFO] Socket connection established to
localhost/127.0.0.1:2181, initiating session
2015-09-22 04:20:13 ConnectionState [ERROR] Connection timed out
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss
at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:72)
at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:74)
and here my storm.yaml file
storm.zookeeper.servers:
- "127.0.0.1"
nimbus.host: "127.0.0.1"
storm.local.dir: /tmp/storm
drpc.servers:
- "127.0.0.1"
- "server2"
is there anything else ??! what's wrong i have here ?

The problem was in the size of nimbus
it should be greater as possible
like
nimbus.thrift.max_buffer_size: 20480000

Related

ambari on HDP cluster + ambari-metrics-collector service not start

we have some issue with ambari-metrics-collector service , ( we have HDP cluster version - 2.6.4 with 8 nodes )
ambari metrics collector service can’t start or start of few second then failed
the details about metrics collector version
rpm -qa | grep metrics
ambari-metrics-grafana-2.6.1.0-143.x86_64
ambari-metrics-monitor-2.6.1.0-143.x86_64
ambari-metrics-collector-2.5.0.3-7.x86_64
ambari-metrics-hadoop-sink-2.6.1.0-143.x86_64
all machines are rhel 7.2
we performed the following steps in order to resolve the problem
1.restart metrics-collector service
su - ams -c '/usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf/ stop'
su - ams -c '/usr/sbin/ambari-metrics-collector --config /etc/ambari-metrics-collector/conf/ start'
or
ambari-metrics-collector stop
ambari-metrics-collector start
2.restart ambari-metrics-monitor on all nodes
ambari-metrics-monitor stop
ambari-metrics-monitor start
3.clean the folder /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/
mv /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/zookeeper_0 /tmp/bck/zookeeper/
Then restart metrics-collector service
4.Tuning the metrics-collector parameters according - https://docs.cloudera.com/HDPDocuments/Ambari-2.2.1.0/bk_ambari_reference_guide/content/_ams_general_guidelines.html
we update the follwing parameters in ambari
metrics_collector_heap_size=1024
hbase_regionserver_heapsize=1024
hbase_master_heapsize=512
hbase_master_xmn_size=128
status for now: - steps 1-4 doesn’t help
From the logs we can see the following:
log file - ambari-metrics-collector.log
2020-06-25 09:06:14,474 WARN org.apache.zookeeper.ClientCnxn: Session 0x172eab71f310002 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
2020-06-25 09:06:14,575 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=master02.sys671.com:61181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /ams-hbase-unsecure/meta-region-server
log file - hbase-ams-master-master02.sys671.com.log
2020-06-25 09:38:18,799 WARN [RS:0;master02:51842-SendThread(master02.sys671.com:61181)] zookeeper.ClientCnxn: Session 0x172ead5d73a0004 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
2020-06-25 09:38:20,437 INFO [main-SendThread(master02.sys671.com:61181)] zookeeper.ClientCnxn: Opening socket connection to server master02.sys671.com/23.2.35.171:61181. Will not attempt to authenticate using SASL (unknown error)
2020-06-25 09:38:20,438 WARN [main-SendThread(master02.sys671.com:61181)] zookeeper.ClientCnxn: Session 0x172ead5d73a0002 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
we also not see that port is listening ( timeline.metrics.service.webapp.address )
netstat -tulpn | grep 6188
any advice how to continue from this point ?
we'll appreciate to get any help about this problem

Nifi - "Connection refused" in gui after installation

I'm installing a three node non-secure nifi cluster (1.5) with embedded zookeeper.
The installation has completed, the cluster startup up and election have finished, without any obvious issues.
However when I hit the gui I see the following:
javax.ws.rs.ProcessingException: java.net.ConnectException: Connection refused (Connection refused)
In nifi-apps.log there is:
2018-03-07 14:41:37,857 WARN [Replicate Request Thread-7] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Failed to replicate request GET /nifi-api/flow/current-user to test-nifi01:8080 due to javax.ws.rs.ProcessingException: java.net.ConnectException: Connection refused (Connection refused)
2018-03-07 14:41:37,858 WARN [Replicate Request Thread-7] o.a.n.c.c.h.r.ThreadPoolRequestReplicator
javax.ws.rs.ProcessingException: java.net.ConnectException: Connection refused (Connection refused)
at org.glassfish.jersey.client.internal.HttpUrlConnector.apply(HttpUrlConnector.java:284)
at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:278)
at org.glassfish.jersey.client.JerseyInvocation.lambda$invoke$0(JerseyInvocation.java:753)
....
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
....
in nifi-user.log:
2018-03-07 15:18:49,475 INFO [NiFi Web Server-21] org.apache.nifi.web.filter.RequestLogger Attempting request for (<no user found>) POST http://test-nifi01:8080/nifi-api/access/kerberos (source ip: 172.100.1.11)
2018-03-07 15:18:49,476 INFO [NiFi Web Server-21] o.a.n.w.a.c.IllegalStateExceptionMapper java.lang.IllegalStateException: Access tokens are only issued over HTTPS.. Returning Conflict response.
2018-03-07 15:18:49,572 INFO [NiFi Web Server-17] org.apache.nifi.web.filter.RequestLogger Attempting request for (<no user found>) POST http://test-nifi01:8080/nifi-api/access/oidc/exchange (source ip: 172.100.1.11)
2018-03-07 15:18:49,573 INFO [NiFi Web Server-17] o.a.n.w.a.c.IllegalStateExceptionMapper java.lang.IllegalStateException: User authentication/authorization is only supported when running over HTTPS.. Returning Conflict response.
2018-03-07 15:18:49,672 INFO [NiFi Web Server-23] org.apache.nifi.web.filter.RequestLogger Attempting request for (anonymous) GET http://test-nifi01:8080/nifi-api/flow/current-user (source ip: 172.100.1.11)
It's the same results regardless of what host I connect to.
I'm not sure what I need to be checking here.
EDIT: Updated with more detail.
Node1:
https://pastebin.com/5nMzWfXw
# web properties #
nifi.web.war.directory=./lib
nifi.web.http.host=test-nifi01
nifi.web.http.port=8080
nifi.web.http.network.interface.default=eth0
nifi.web.https.host=
nifi.web.https.port=
nifi.web.https.network.interface.default=
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200
nifi.web.max.header.size=16 KB
nifi.web.proxy.context.path=
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=test-nifi01
nifi.cluster.node.protocol.port=11003
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=5 mins
nifi.cluster.flow.election.max.candidates=
Node 2: https://pastebin.com/n5GMy1nA
# web properties #
nifi.web.war.directory=./lib
nifi.web.http.host=test-nifi02
nifi.web.http.port=8080
nifi.web.http.network.interface.default=eth0
nifi.web.https.host=
nifi.web.https.port=
nifi.web.https.network.interface.default=
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200
nifi.web.max.header.size=16 KB
nifi.web.proxy.context.path=
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=test-nifi02
nifi.cluster.node.protocol.port=11003
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=5 mins
nifi.cluster.flow.election.max.candidates=
Node 3: https://pastebin.com/EqztLfnD
# web properties #
nifi.web.war.directory=./lib
nifi.web.http.host=test-nifi03
nifi.web.http.port=8080
nifi.web.http.network.interface.default=eth0
nifi.web.https.host=
nifi.web.https.port=
nifi.web.https.network.interface.default=
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200
nifi.web.max.header.size=16 KB
nifi.web.proxy.context.path=
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=test-nifi03
nifi.cluster.node.protocol.port=11003
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=5 mins
nifi.cluster.flow.election.max.candidates=
Anyone have any thoughts on what I should be looking at?
EDIT 2:
It's looks like each node is refusing to connect to itself. E.g on node 2 we have
2018-03-07 20:31:08,099 WARN [Replicate Request Thread-4] o.a.n.c.c.h.r.ThreadPoolRequestReplicator Failed to replicate request GET /nifi-api/flow/current-user to test-nifi02:8080 due to javax.ws.rs.ProcessingException: java.net.ConnectException: Connection refused (Connection refused)
and a curl also fails:
[taas#test-nifi02 logs]$ curl http://test-nifi02:8080
curl: (7) Failed connect to test-nifi02:8080; Connection refused
Curls to the other nodes are fine. The same behavior is occurring on each node, e,g node1 can't curl to node1 and node3 can't curl to node3.
Make sure you have dns enabled for network/subnet . if you dont have luxury of dns try updating /etc/hosts files with the exact ip address and hostnames
ex:
192.168.0.1 test-nifi01
192.168.0.2 test-nifi02
in all the hosts of the nifi nodes then try connecting it. and also make sure to check if you can able to connect from one node to another node by using shell via ssh
In a NiFi cluster, a request comes in to one node, and then that node replicates the request to all other nodes.
This issue shown nifi-app.log is indicating that the connection was refused between the current node test-nifi01:8080.
I assume you have hosts like test-nifi01, test-nifi02, and test-nifi03, so each of these nodes would need to be able to resolve the other hostnames. You can start by issuing a ping from one node to the other two, and do that on each node.

Storm - Supervisors launched but not connecting to Nimbus

I have a Storm cluster with 1 Nimbus, 4 Supervisors and 2 Zookeeper nodes. My Storm.yaml is as following:
storm.zookeeper.servers:
- "storage14"
- "storage15"
nimbus.seeds: ["storage01"]
#storm.local.hostname: "storage05"
supervisor.supervisors:
- "storage02"
- "storage03"
- "storage04"
- "storage05"
storm.local.dir: "/tmp/storm"
worker.childopts: "-Xmx%HEAP-MEM%m -XX:+PrintGCDetails -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=artifacts/heapdump"
This storm.yaml file is used by both Nimbus and Supervisors. When Nimbus is started I have the storm.local.hostname commented out as is shown above.
However, when starting Supervisors on respective nodes, I uncomment the storm.local.hostname and set it to the hostname of the node on which the supervisor is being launched. For instance if I was launching the supervisor on storage05, the storm.yaml file would have the following additional config param:
storm.local.hostname: "storage05"
The problem is even though Nimubs is launched successfully and I can see it on the Storm UI, some supervisors do not seem to be able to connect to Nimbus. For instance of the 4 nodes I start supervisors on, Storm UI often shows only 2 of them connected. However, if I ssh in to these nodes and run jps, I can see that the supervisor process is running on ALL of these nodes.
The Supervisors at the nodes which do end up connecting are not the same always, so it is definitely not a problem with those specific nodes.
Another thing to notice is if I try to execute a topology on whatever nodes that got connected, it does not get registered by the cluster and I can not see that topology on the UI either.
What do you think might be causing this erratic behavior?
UPDATE:
Tail end of nimbus.log has the following lines
2017-01-25 00:04:25.216 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.317 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage15/192.168.140.195:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.317 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.686 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage15/192.168.140.195:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.686 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.787 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage14/192.168.140.194:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.787 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
Your UPDATE (nimbus log) indicates that your Nimbus cannot connect Zookeeper cluster. Please check that Zookeeper cluster (storage14/storage15) is accessible from storage01 (not only node is accessible, but also do telnet to Zookeeper server via "telnet storage14 (and/or storage15) 2181").
When ZK connectivity issue is gone please try starting supervisor again.

Unable to use remote slave node in Apache Storm cluster

I'm following http://jayatiatblogs.blogspot.com/2011/11/storm-installation.html to try configuring Apache Storm remote cluster using few virtual machine (EC2) with Ubuntu 14.04 LTS on Amazon Web Services.
My master node is 10.0.0.230, my slave node is 10.0.0.79. My zookeeper reside in my master node. When I use storm jar storm-starter-0.9.4-jar-with-dependencies.jar storm.starter.RollingTopWords production-topology remote in master node, the message below appear, indicating it is successfully submitted:
339 [main] INFO storm.starter.RollingTopWords - Topology name: production-topology
377 [main] INFO storm.starter.RollingTopWords - Running in remote (cluster) mode
651 [main] INFO backtype.storm.StormSubmitter - Jar not uploaded to master yet. Submitting jar...
655 [main] INFO backtype.storm.StormSubmitter - Uploading topology jar storm-starter-0.9.4-jar-with-dependencies.jar to assigned location: /home/ubuntu/storm/data/nimbus/inbox/stormjar-380bb1a2-1699-4ad1-8341-3d4b92c14764.jar
672 [main] INFO backtype.storm.StormSubmitter - Successfully uploaded topology jar to assigned location: /home/ubuntu/storm/data/nimbus/inbox/stormjar-380bb1a2-1699-4ad1-8341-3d4b92c14764.jar
672 [main] INFO backtype.storm.StormSubmitter - Submitting topology production-topology in distributed mode with conf {"topology.debug":true}
714 [main] INFO backtype.storm.StormSubmitter - Finished submitting topology: production-topology
The Stoum UI & storm list command show that the topology is active:
Topology_name Status Num_tasks Num_workers Uptime_secs
-------------------------------------------------------------------
production-topology ACTIVE 0 0 59
However, in the Cluster Summary of Storm UI, there is 0 supervisor, 0 used slots, 0 free slots, 0 executors & 0 tasks. In the Topology Configuration, the supervisor.slots.ports show that it uses the default supervisor slot ports of the master node, instead of the supervisor slot ports of the slave node.
Below are my zoo.cfg of my master node:
tickTime=2000
dataDir=/home/ubuntu/zookeeper-data
clientPort=2181
The storm.yaml of my master node:
storm.zookeeper.servers:
- "10.0.0.230"
storm.zookeeper.port: 2181
nimbus.host: "localhost"
nimbus.thrift.port: 6627
nimbus.task.launch.secs: 240
supervisor.worker.start.timeout.secs: 240
supervisor.worker.timeout.secs: 240
storm.local.dir: "/home/ubuntu/storm/data"
java.library.path: "/usr/lib/jvm/java-7-oracle"
The storm.yaml of my slave node:
storm.zookeeper.server:
- "10.0.0.230"
storm.zookeeper.port: 2181
nimbus.host: "10.0.0.230"
nimbus.thrift.port: 6627
storm.local.dir: "/home/ubuntu/storm/data"
java.library.path: "/usr/lib/jvm/java-7-oracle"
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
- 6704
I had use zkCli.sh -server 10.0.0.230:2181 to connect to the zookeeper at the master node, it works fine:
2015-05-04 03:40:20,866 [myid:] - INFO [main:ZooKeeper#438] - Initiating client connection, connectString=10.0.0.230:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher#63f78dde
2015-05-04 03:40:20,888 [myid:] - INFO [main-SendThread(10.0.0.230:2181):ClientCnxn$SendThread#975] - Opening socket connection to server 10.0.0.230/10.0.0.230:2181. Will not attempt to authenticate using SASL (unknown error)
Welcome to ZooKeeper!
2015-05-04 03:40:20,900 [myid:] - INFO [main-SendThread(10.0.0.230:2181):ClientCnxn$SendThread#852] - Socket connection established to 10.0.0.230/10.0.0.230:2181, initiating session
JLine support is enabled
2015-05-04 03:40:20,918 [myid:] - INFO [main-SendThread(10.0.0.230:2181):ClientCnxn$SendThread#1235] - Session establishment complete on server 10.0.0.230/10.0.0.230:2181, sessionid = 0x14d1ca1ab73001c, negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: 10.0.0.230:2181(CONNECTED) 0]
The below are the supervisor logs from my slave nodes:
2015-05-06T06:16:28.487+0000 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-05-06T06:16:28.487+0000 o.a.s.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.7.0_80]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) ~[na:1.7.0_80]
at org.apache.storm.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[storm-core-0.9.4.jar:0.9.4]
at org.apache.storm.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) ~[storm-core-0.9.4.jar:0.9.4]
2015-05-06T06:16:28.589+0000 b.s.d.supervisor [ERROR] Error on initialization of server mk-supervisor
java.lang.RuntimeException: org.apache.storm.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /storm
at backtype.storm.util$wrap_in_runtime.invoke(util.clj:44) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.zookeeper$exists_node_QMARK_$fn__807.invoke(zookeeper.clj:102) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.zookeeper$exists_node_QMARK_.invoke(zookeeper.clj:98) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.zookeeper$mkdirs.invoke(zookeeper.clj:114) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.cluster$mk_distributed_cluster_state.invoke(cluster.clj:43) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.cluster$mk_storm_cluster_state.invoke(cluster.clj:238) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.daemon.supervisor$supervisor_data.invoke(supervisor.clj:214) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.daemon.supervisor$fn__5518$exec_fn__1754__auto____5519.invoke(supervisor.clj:409) ~[storm-core-0.9.4.jar:0.9.4]
at clojure.lang.AFn.applyToHelper(AFn.java:167) [clojure-1.5.1.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
at clojure.core$apply.invoke(core.clj:617) ~[clojure-1.5.1.jar:na]
at backtype.storm.daemon.supervisor$fn__5518$mk_supervisor__5544.doInvoke(supervisor.clj:405) [storm-core-0.9.4.jar:0.9.4]
at clojure.lang.RestFn.invoke(RestFn.java:436) [clojure-1.5.1.jar:na]
at backtype.storm.daemon.supervisor$_launch.invoke(supervisor.clj:629) [storm-core-0.9.4.jar:0.9.4]
at backtype.storm.daemon.supervisor$_main.invoke(supervisor.clj:659) [storm-core-0.9.4.jar:0.9.4]
at clojure.lang.AFn.applyToHelper(AFn.java:159) [clojure-1.5.1.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
at backtype.storm.daemon.supervisor.main(Unknown Source) [storm-core-0.9.4.jar:0.9.4]
Caused by: org.apache.storm.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /storm
at org.apache.storm.zookeeper.KeeperException.create(KeeperException.java:99) ~[storm-core-0.9.4.jar:0.9.4]
at org.apache.storm.zookeeper.KeeperException.create(KeeperException.java:51) ~[storm-core-0.9.4.jar:0.9.4]
at org.apache.storm.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) ~[storm-core-0.9.4.jar:0.9.4]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172) ~[storm-core-0.9.4.jar:0.9.4]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161) ~[storm-core-0.9.4.jar:0.9.4]
at org.apache.storm.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~[storm-core-0.9.4.jar:0.9.4]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:157) ~[storm-core-0.9.4.jar:0.9.4]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148) ~[storm-core-0.9.4.jar:0.9.4]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.zookeeper$exists_node_QMARK_$fn__807.invoke(zookeeper.clj:101) ~[storm-core-0.9.4.jar:0.9.4]
... 16 common frames omitted
2015-05-06T06:16:28.607+0000 b.s.util [ERROR] Halting process: ("Error on initialization")
java.lang.RuntimeException: ("Error on initialization")
at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.4.jar:0.9.4]
at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
at backtype.storm.daemon.supervisor$fn__5518$mk_supervisor__5544.doInvoke(supervisor.clj:405) [storm-core-0.9.4.jar:0.9.4]
at clojure.lang.RestFn.invoke(RestFn.java:436) [clojure-1.5.1.jar:na]
at backtype.storm.daemon.supervisor$_launch.invoke(supervisor.clj:629) [storm-core-0.9.4.jar:0.9.4]
at backtype.storm.daemon.supervisor$_main.invoke(supervisor.clj:659) [storm-core-0.9.4.jar:0.9.4]
at clojure.lang.AFn.applyToHelper(AFn.java:159) [clojure-1.5.1.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
at backtype.storm.daemon.supervisor.main(Unknown Source) [storm-core-0.9.4.jar:0.9.4]
Below are my nimbus logs from my master node:
2015-05-06T06:14:19.291+0000 b.s.d.nimbus [INFO] Using default scheduler
2015-05-06T06:14:19.304+0000 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [1000] the maxSleepTimeMs [30000] the maxRetries [5]
2015-05-06T06:14:19.415+0000 o.a.s.c.f.i.CuratorFrameworkImpl [INFO] Starting
2015-05-06T06:14:19.417+0000 o.a.s.z.ZooKeeper [INFO] Initiating client connection, connectString=10.0.0.230:2181 sessionTimeout=20000 watcher=org.apache.storm.curator.ConnectionState#795bca46
2015-05-06T06:14:19.436+0000 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server 10.0.0.230/10.0.0.230:2181. Will not attempt to authenticate using SASL (unknown error)
2015-05-06T06:14:19.448+0000 o.a.s.z.ClientCnxn [INFO] Socket connection established to 10.0.0.230/10.0.0.230:2181, initiating session
2015-05-06T06:14:19.457+0000 o.a.s.z.ClientCnxn [INFO] Session establishment complete on server 10.0.0.230/10.0.0.230:2181, sessionid = 0x14d27dbda310000, negotiated timeout = 20000
2015-05-06T06:14:19.459+0000 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED
2015-05-06T06:14:19.460+0000 b.s.zookeeper [INFO] Zookeeper state update: :connected:none
2015-05-06T06:14:20.485+0000 o.a.s.z.ClientCnxn [INFO] EventThread shut down
2015-05-06T06:14:20.485+0000 o.a.s.z.ZooKeeper [INFO] Session: 0x14d27dbda310000 closed
2015-05-06T06:14:20.486+0000 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [1000] the maxSleepTimeMs [30000] the maxRetries [5]
2015-05-06T06:14:20.487+0000 o.a.s.c.f.i.CuratorFrameworkImpl [INFO] Starting
2015-05-06T06:14:20.487+0000 o.a.s.z.ZooKeeper [INFO] Initiating client connection, connectString=10.0.0.230:2181/storm sessionTimeout=20000 watcher=org.apache.storm.curator.ConnectionState#510d246b
2015-05-06T06:14:20.504+0000 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server 10.0.0.230/10.0.0.230:2181. Will not attempt to authenticate using SASL (unknown error)
2015-05-06T06:14:20.505+0000 o.a.s.z.ClientCnxn [INFO] Socket connection established to 10.0.0.230/10.0.0.230:2181, initiating session
2015-05-06T06:14:20.507+0000 o.a.s.z.ClientCnxn [INFO] Session establishment complete on server 10.0.0.230/10.0.0.230:2181, sessionid = 0x14d27dbda310001, negotiated timeout = 20000
2015-05-06T06:14:20.507+0000 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED
2015-05-06T06:14:20.547+0000 b.s.d.nimbus [INFO] Starting Nimbus server...
I had used storm nimbus & storm ui in my master node, storm supervisor in my slave node.
From the supervisor.logs from my slave node, it show that my slave node tend to connect to zookeeper on local host, although I had specified in the storm.yaml of my slave node that my zookeeper is in my master node.
Why this happens and how to solve this?
So, why in the Cluster Summary of Storm UI, there is 0 supervisor, 0 used slots, 0 free slots, 0 executors & 0 tasks ?
Why it uses the supervisor slot ports of the master node, instead of the slave node?
When I click the production-topology in the Topology Summary of Storm UI, there is 0 Num workers, 0 Num executors, 0 Num tasks?
Why there is no info display for Spouts & Bolts?
I discovered the problem. I should set my zookeeper at my slave nodes, not at my master node. Now the problem is solved & the storm cluster is up.

HBase RegionServer: error telling master we are up

I am getting the following errors in the logs of the slave RegionServer.
The problem seems to be at
regionserver.HRegionServer: reportForDuty to
master=localhost,60000,1397430611631 with port=60020
The master is set as localhost but should actually be pointing towards master.
I am unable to figure out how the slave figures out who the master even after going through the docs.
The complete log is:
2014-04-14 04:49:35,939 INFO [regionserver60020] regionserver.HRegionServer: CompactionChecker runs every 10sec
2014-04-14 04:49:35,950 INFO [regionserver60020] regionserver.HRegionServer: reportForDuty to master=localhost,60000,1397430611631 with port=60020, startcode=1397431174733
2014-04-14 04:49:36,083 WARN [regionserver60020] regionserver.HRegionServer: error telling master we are up
com.google.protobuf.ServiceException: java.net.ConnectException: Connection refused
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1671)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1712)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:5402)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2013)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:846)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupConnection(RpcClient.java:578)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:866)
at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1536)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1435)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1654)
... 5 more
check your /etc/hosts file,if there is something like
127.0.0.1 localhost yourhost
change it to
127.0.0.1 localhost
192.168.1.1 yourhost

Resources