I'm following http://jayatiatblogs.blogspot.com/2011/11/storm-installation.html to try configuring Apache Storm remote cluster using few virtual machine (EC2) with Ubuntu 14.04 LTS on Amazon Web Services.
My master node is 10.0.0.230, my slave node is 10.0.0.79. My zookeeper reside in my master node. When I use storm jar storm-starter-0.9.4-jar-with-dependencies.jar storm.starter.RollingTopWords production-topology remote in master node, the message below appear, indicating it is successfully submitted:
339 [main] INFO storm.starter.RollingTopWords - Topology name: production-topology
377 [main] INFO storm.starter.RollingTopWords - Running in remote (cluster) mode
651 [main] INFO backtype.storm.StormSubmitter - Jar not uploaded to master yet. Submitting jar...
655 [main] INFO backtype.storm.StormSubmitter - Uploading topology jar storm-starter-0.9.4-jar-with-dependencies.jar to assigned location: /home/ubuntu/storm/data/nimbus/inbox/stormjar-380bb1a2-1699-4ad1-8341-3d4b92c14764.jar
672 [main] INFO backtype.storm.StormSubmitter - Successfully uploaded topology jar to assigned location: /home/ubuntu/storm/data/nimbus/inbox/stormjar-380bb1a2-1699-4ad1-8341-3d4b92c14764.jar
672 [main] INFO backtype.storm.StormSubmitter - Submitting topology production-topology in distributed mode with conf {"topology.debug":true}
714 [main] INFO backtype.storm.StormSubmitter - Finished submitting topology: production-topology
The Stoum UI & storm list command show that the topology is active:
Topology_name Status Num_tasks Num_workers Uptime_secs
-------------------------------------------------------------------
production-topology ACTIVE 0 0 59
However, in the Cluster Summary of Storm UI, there is 0 supervisor, 0 used slots, 0 free slots, 0 executors & 0 tasks. In the Topology Configuration, the supervisor.slots.ports show that it uses the default supervisor slot ports of the master node, instead of the supervisor slot ports of the slave node.
Below are my zoo.cfg of my master node:
tickTime=2000
dataDir=/home/ubuntu/zookeeper-data
clientPort=2181
The storm.yaml of my master node:
storm.zookeeper.servers:
- "10.0.0.230"
storm.zookeeper.port: 2181
nimbus.host: "localhost"
nimbus.thrift.port: 6627
nimbus.task.launch.secs: 240
supervisor.worker.start.timeout.secs: 240
supervisor.worker.timeout.secs: 240
storm.local.dir: "/home/ubuntu/storm/data"
java.library.path: "/usr/lib/jvm/java-7-oracle"
The storm.yaml of my slave node:
storm.zookeeper.server:
- "10.0.0.230"
storm.zookeeper.port: 2181
nimbus.host: "10.0.0.230"
nimbus.thrift.port: 6627
storm.local.dir: "/home/ubuntu/storm/data"
java.library.path: "/usr/lib/jvm/java-7-oracle"
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
- 6704
I had use zkCli.sh -server 10.0.0.230:2181 to connect to the zookeeper at the master node, it works fine:
2015-05-04 03:40:20,866 [myid:] - INFO [main:ZooKeeper#438] - Initiating client connection, connectString=10.0.0.230:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher#63f78dde
2015-05-04 03:40:20,888 [myid:] - INFO [main-SendThread(10.0.0.230:2181):ClientCnxn$SendThread#975] - Opening socket connection to server 10.0.0.230/10.0.0.230:2181. Will not attempt to authenticate using SASL (unknown error)
Welcome to ZooKeeper!
2015-05-04 03:40:20,900 [myid:] - INFO [main-SendThread(10.0.0.230:2181):ClientCnxn$SendThread#852] - Socket connection established to 10.0.0.230/10.0.0.230:2181, initiating session
JLine support is enabled
2015-05-04 03:40:20,918 [myid:] - INFO [main-SendThread(10.0.0.230:2181):ClientCnxn$SendThread#1235] - Session establishment complete on server 10.0.0.230/10.0.0.230:2181, sessionid = 0x14d1ca1ab73001c, negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: 10.0.0.230:2181(CONNECTED) 0]
The below are the supervisor logs from my slave nodes:
2015-05-06T06:16:28.487+0000 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2015-05-06T06:16:28.487+0000 o.a.s.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.7.0_80]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) ~[na:1.7.0_80]
at org.apache.storm.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[storm-core-0.9.4.jar:0.9.4]
at org.apache.storm.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) ~[storm-core-0.9.4.jar:0.9.4]
2015-05-06T06:16:28.589+0000 b.s.d.supervisor [ERROR] Error on initialization of server mk-supervisor
java.lang.RuntimeException: org.apache.storm.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /storm
at backtype.storm.util$wrap_in_runtime.invoke(util.clj:44) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.zookeeper$exists_node_QMARK_$fn__807.invoke(zookeeper.clj:102) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.zookeeper$exists_node_QMARK_.invoke(zookeeper.clj:98) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.zookeeper$mkdirs.invoke(zookeeper.clj:114) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.cluster$mk_distributed_cluster_state.invoke(cluster.clj:43) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.cluster$mk_storm_cluster_state.invoke(cluster.clj:238) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.daemon.supervisor$supervisor_data.invoke(supervisor.clj:214) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.daemon.supervisor$fn__5518$exec_fn__1754__auto____5519.invoke(supervisor.clj:409) ~[storm-core-0.9.4.jar:0.9.4]
at clojure.lang.AFn.applyToHelper(AFn.java:167) [clojure-1.5.1.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
at clojure.core$apply.invoke(core.clj:617) ~[clojure-1.5.1.jar:na]
at backtype.storm.daemon.supervisor$fn__5518$mk_supervisor__5544.doInvoke(supervisor.clj:405) [storm-core-0.9.4.jar:0.9.4]
at clojure.lang.RestFn.invoke(RestFn.java:436) [clojure-1.5.1.jar:na]
at backtype.storm.daemon.supervisor$_launch.invoke(supervisor.clj:629) [storm-core-0.9.4.jar:0.9.4]
at backtype.storm.daemon.supervisor$_main.invoke(supervisor.clj:659) [storm-core-0.9.4.jar:0.9.4]
at clojure.lang.AFn.applyToHelper(AFn.java:159) [clojure-1.5.1.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
at backtype.storm.daemon.supervisor.main(Unknown Source) [storm-core-0.9.4.jar:0.9.4]
Caused by: org.apache.storm.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /storm
at org.apache.storm.zookeeper.KeeperException.create(KeeperException.java:99) ~[storm-core-0.9.4.jar:0.9.4]
at org.apache.storm.zookeeper.KeeperException.create(KeeperException.java:51) ~[storm-core-0.9.4.jar:0.9.4]
at org.apache.storm.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045) ~[storm-core-0.9.4.jar:0.9.4]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172) ~[storm-core-0.9.4.jar:0.9.4]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161) ~[storm-core-0.9.4.jar:0.9.4]
at org.apache.storm.curator.RetryLoop.callWithRetry(RetryLoop.java:107) ~[storm-core-0.9.4.jar:0.9.4]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:157) ~[storm-core-0.9.4.jar:0.9.4]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148) ~[storm-core-0.9.4.jar:0.9.4]
at org.apache.storm.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.zookeeper$exists_node_QMARK_$fn__807.invoke(zookeeper.clj:101) ~[storm-core-0.9.4.jar:0.9.4]
... 16 common frames omitted
2015-05-06T06:16:28.607+0000 b.s.util [ERROR] Halting process: ("Error on initialization")
java.lang.RuntimeException: ("Error on initialization")
at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) [storm-core-0.9.4.jar:0.9.4]
at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
at backtype.storm.daemon.supervisor$fn__5518$mk_supervisor__5544.doInvoke(supervisor.clj:405) [storm-core-0.9.4.jar:0.9.4]
at clojure.lang.RestFn.invoke(RestFn.java:436) [clojure-1.5.1.jar:na]
at backtype.storm.daemon.supervisor$_launch.invoke(supervisor.clj:629) [storm-core-0.9.4.jar:0.9.4]
at backtype.storm.daemon.supervisor$_main.invoke(supervisor.clj:659) [storm-core-0.9.4.jar:0.9.4]
at clojure.lang.AFn.applyToHelper(AFn.java:159) [clojure-1.5.1.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
at backtype.storm.daemon.supervisor.main(Unknown Source) [storm-core-0.9.4.jar:0.9.4]
Below are my nimbus logs from my master node:
2015-05-06T06:14:19.291+0000 b.s.d.nimbus [INFO] Using default scheduler
2015-05-06T06:14:19.304+0000 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [1000] the maxSleepTimeMs [30000] the maxRetries [5]
2015-05-06T06:14:19.415+0000 o.a.s.c.f.i.CuratorFrameworkImpl [INFO] Starting
2015-05-06T06:14:19.417+0000 o.a.s.z.ZooKeeper [INFO] Initiating client connection, connectString=10.0.0.230:2181 sessionTimeout=20000 watcher=org.apache.storm.curator.ConnectionState#795bca46
2015-05-06T06:14:19.436+0000 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server 10.0.0.230/10.0.0.230:2181. Will not attempt to authenticate using SASL (unknown error)
2015-05-06T06:14:19.448+0000 o.a.s.z.ClientCnxn [INFO] Socket connection established to 10.0.0.230/10.0.0.230:2181, initiating session
2015-05-06T06:14:19.457+0000 o.a.s.z.ClientCnxn [INFO] Session establishment complete on server 10.0.0.230/10.0.0.230:2181, sessionid = 0x14d27dbda310000, negotiated timeout = 20000
2015-05-06T06:14:19.459+0000 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED
2015-05-06T06:14:19.460+0000 b.s.zookeeper [INFO] Zookeeper state update: :connected:none
2015-05-06T06:14:20.485+0000 o.a.s.z.ClientCnxn [INFO] EventThread shut down
2015-05-06T06:14:20.485+0000 o.a.s.z.ZooKeeper [INFO] Session: 0x14d27dbda310000 closed
2015-05-06T06:14:20.486+0000 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs [1000] the maxSleepTimeMs [30000] the maxRetries [5]
2015-05-06T06:14:20.487+0000 o.a.s.c.f.i.CuratorFrameworkImpl [INFO] Starting
2015-05-06T06:14:20.487+0000 o.a.s.z.ZooKeeper [INFO] Initiating client connection, connectString=10.0.0.230:2181/storm sessionTimeout=20000 watcher=org.apache.storm.curator.ConnectionState#510d246b
2015-05-06T06:14:20.504+0000 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server 10.0.0.230/10.0.0.230:2181. Will not attempt to authenticate using SASL (unknown error)
2015-05-06T06:14:20.505+0000 o.a.s.z.ClientCnxn [INFO] Socket connection established to 10.0.0.230/10.0.0.230:2181, initiating session
2015-05-06T06:14:20.507+0000 o.a.s.z.ClientCnxn [INFO] Session establishment complete on server 10.0.0.230/10.0.0.230:2181, sessionid = 0x14d27dbda310001, negotiated timeout = 20000
2015-05-06T06:14:20.507+0000 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED
2015-05-06T06:14:20.547+0000 b.s.d.nimbus [INFO] Starting Nimbus server...
I had used storm nimbus & storm ui in my master node, storm supervisor in my slave node.
From the supervisor.logs from my slave node, it show that my slave node tend to connect to zookeeper on local host, although I had specified in the storm.yaml of my slave node that my zookeeper is in my master node.
Why this happens and how to solve this?
So, why in the Cluster Summary of Storm UI, there is 0 supervisor, 0 used slots, 0 free slots, 0 executors & 0 tasks ?
Why it uses the supervisor slot ports of the master node, instead of the slave node?
When I click the production-topology in the Topology Summary of Storm UI, there is 0 Num workers, 0 Num executors, 0 Num tasks?
Why there is no info display for Spouts & Bolts?
I discovered the problem. I should set my zookeeper at my slave nodes, not at my master node. Now the problem is solved & the storm cluster is up.
Related
Please help me to complete Nifi cluster setup. I can see Nifi is running on server, but GUI is not coming.
Java version:
openjdk version "1.8.0_302"
OpenJDK Runtime Environment (build 1.8.0_302-b08)
OpenJDK 64-Bit Server VM (build 25.302-b08, mixed mode)
NiFi version: nifi-1.17.0
Nifi.properties:
nifi.state.management.embedded.zookeeper.start=true
nifi.remote.input.host=Svxxx.xyz.com
nifi.remote.input.secure=true
nifi.remote.input.socket.port=10443
nifi.remote.input.http.enabled=true
nifi.web.https.host=Svxxx.xyz.com
nifi.web.https.port=9443
nifi.web.proxy.host=localhost:9443,Svxxx.xyz.com:9443
nifi.sensitive.props.key=propkeywith12chars
nifi.cluster.is.node=true
nifi.cluster.node.address=Svxxx.xyz.com
nifi.cluster.node.protocol.port=11443
nifi.cluster.load.balance.host=Svxxx.xyz.com
nifi.cluster.load.balance.port=6342
nifi.zookeeper.connect.string=Svxxx.xyz.com:2181,Svxxx.xyz.com:2181,Svxxx.xyz.com:2181
zookeeper. properties:
server.1=Svxxx.xyz.com:2888:3888;2181
server.2=Svxxx.xyz.com:2888:3888;2181
server.3=Svxxx.xyz.com:2888:3888;2181
Changes made in state-management.xml:
<cluster-provider>
<id>zk-provider</id>
<class>org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider</class>
<property name="Connect String">Svxxx.xyz.com:2181,Svxxx.xyz.com:2181,Svxxx.xyz.com:2181</property>
<property name="Root Node">/nifi</property>
<property name="Session Timeout">10 seconds</property>
<property name="Access Control">Open</property>
</cluster-provider>
Firewall status: disabled
Also created SSL certificate using toolkit, and put those on respective servers. Replaced truststore.jks and keystore.jks also accordingly.
nifi-app.log:
2022-10-17 22:00:05,669 WARN [main] o.a.nifi.controller.StandardFlowService There is currently no Cluster Coordinator. This often happens upon restart of NiFi when running an embedded ZooKeeper. Will register this node to become the active Cluster Coordinator and will attempt to connect to cluster again
2022-10-17 22:00:05,670 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] Attempted to register Leader Election for role 'Cluster Coordinator' but this role is already registered
2022-10-17 22:00:11,403 WARN [Heartbeat Monitor Thread-1] o.a.n.c.l.e.CuratorLeaderElectionManager Unable to determine leader for role 'Cluster Coordinator'; returning null
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /nifi/leaders/Cluster Coordinator
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:2480)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:242)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:231)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:228)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:219)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:41)
at org.apache.curator.framework.recipes.locks.LockInternals.getSortedChildren(LockInternals.java:154)
at org.apache.curator.framework.recipes.locks.LockInternals.getParticipantNodes(LockInternals.java:134)
at org.apache.curator.framework.recipes.locks.InterProcessMutex.getParticipantNodes(InterProcessMutex.java:170)
at org.apache.curator.framework.recipes.leader.LeaderSelector.getLeader(LeaderSelector.java:337)
at org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager.getLeader(CuratorLeaderElectionManager.java:281)
at org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener.verifyLeader(CuratorLeaderElectionManager.java:571)
at org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener.isLeader(CuratorLeaderElectionManager.java:525)
at org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$LeaderRole.isLeader(CuratorLeaderElectionManager.java:466)
at org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager.isLeader(CuratorLeaderElectionManager.java:262)
at org.apache.nifi.cluster.coordination.node.NodeClusterCoordinator.isActiveClusterCoordinator(NodeClusterCoordinator.java:824)
at org.apache.nifi.cluster.coordination.heartbeat.AbstractHeartbeatMonitor.monitorHeartbeats(AbstractHeartbeatMonitor.java:132)
at org.apache.nifi.cluster.coordination.heartbeat.AbstractHeartbeatMonitor$1.run(AbstractHeartbeatMonitor.java:84)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2022-10-17 22:00:12,371 INFO [Cleanup Archive for default] o.a.n.c.repository.FileSystemRepository Successfully deleted 0 files (0 bytes) from archive
2022-10-17 22:00:12,371 INFO [Cleanup Archive for default] o.a.n.c.repository.FileSystemRepository Archive cleanup completed for container default; will now allow writing to this container. Bytes used = 10.53 GB, bytes free = 26.46 GB, capacity = 36.99 GB
2022-10-17 22:00:14,203 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog#22e813fc checkpointed with 1 Records and 0 Swap Files in 5 milliseconds (Stop-the-world time = 2 milliseconds, Clear Edit Logs time = 2 millis), max Transaction ID 3
2022-10-17 22:00:18,823 WARN [main] o.a.n.c.l.e.CuratorLeaderElectionManager Unable to determine leader for role 'Cluster Coordinator'; returning null
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /nifi/leaders/Cluster Coordinator
I'm trying to set up a NiFi cluster using an external zookeeper version is 3.4.10 container is run kubectl pod.
I have changed the following things in nifi.properties
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=XXXXXXXXXXXXXXXX
nifi.cluster.node.protocol.port=8082
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=1 min
nifi.cluster.flow.election.max.candidates=
# cluster load balancing properties #
nifi.cluster.load.balance.host=
nifi.cluster.load.balance.port=6342
nifi.cluster.load.balance.connections.per.node=1
nifi.cluster.load.balance.max.thread.count=8
nifi.cluster.load.balance.comms.timeout=30 sec
# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=xxxx:2181,xxxx:2181,xxxx:2181
nifi.zookeeper.connect.timeout=10 secs
nifi.zookeeper.session.timeout=10 secs
nifi.zookeeper.root.node=/nifi
I am not able to connect zookeeper getting a connection error
2022-09-23 10:13:23,665 ERROR [main-EventThread] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:885)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:677)
at org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152)
at org.apache.curator.framework.imps.GetConfigBuilderImpl$2.processResult(GetConfigBuilderImpl.java:222)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:601)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2022-09-23 10:13:24,331 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: RECONNECTED
2022-09-23 10:13:24,331 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener#30dbc287 Connection State changed to RECONNECTED
2022-09-23 10:13:24,431 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
2022-09-23 10:13:24,431 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener#30dbc287 Connection State changed to SUSPENDED
Connection State continues changing reconnecting and suspending
How can I set up a NiFi cluster for using an external zookeeper connection?
Did any one is facing simiar issues with this combination?
Can you please assist
I have a Storm cluster with 1 Nimbus, 4 Supervisors and 2 Zookeeper nodes. My Storm.yaml is as following:
storm.zookeeper.servers:
- "storage14"
- "storage15"
nimbus.seeds: ["storage01"]
#storm.local.hostname: "storage05"
supervisor.supervisors:
- "storage02"
- "storage03"
- "storage04"
- "storage05"
storm.local.dir: "/tmp/storm"
worker.childopts: "-Xmx%HEAP-MEM%m -XX:+PrintGCDetails -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=artifacts/heapdump"
This storm.yaml file is used by both Nimbus and Supervisors. When Nimbus is started I have the storm.local.hostname commented out as is shown above.
However, when starting Supervisors on respective nodes, I uncomment the storm.local.hostname and set it to the hostname of the node on which the supervisor is being launched. For instance if I was launching the supervisor on storage05, the storm.yaml file would have the following additional config param:
storm.local.hostname: "storage05"
The problem is even though Nimubs is launched successfully and I can see it on the Storm UI, some supervisors do not seem to be able to connect to Nimbus. For instance of the 4 nodes I start supervisors on, Storm UI often shows only 2 of them connected. However, if I ssh in to these nodes and run jps, I can see that the supervisor process is running on ALL of these nodes.
The Supervisors at the nodes which do end up connecting are not the same always, so it is definitely not a problem with those specific nodes.
Another thing to notice is if I try to execute a topology on whatever nodes that got connected, it does not get registered by the cluster and I can not see that topology on the UI either.
What do you think might be causing this erratic behavior?
UPDATE:
Tail end of nimbus.log has the following lines
2017-01-25 00:04:25.216 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.317 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage15/192.168.140.195:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.317 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.686 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage15/192.168.140.195:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.686 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2017-01-25 00:04:25.787 o.a.s.s.o.a.z.ClientCnxn [INFO] Opening socket connection to server storage14/192.168.140.194:2181. Will not attempt to authenticate using SASL (unknown error)
2017-01-25 00:04:25.787 o.a.s.s.o.a.z.ClientCnxn [WARN] Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.storm.shade.org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
Your UPDATE (nimbus log) indicates that your Nimbus cannot connect Zookeeper cluster. Please check that Zookeeper cluster (storage14/storage15) is accessible from storage01 (not only node is accessible, but also do telnet to Zookeeper server via "telnet storage14 (and/or storage15) 2181").
When ZK connectivity issue is gone please try starting supervisor again.
The error i had when submit a topology
java.net.ConnectException: Connection refused
at backtype.storm.utils.NimbusClient.<init>(NimbusClient.java:36)
at backtype.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:17)
This what i got in nimbus log file
2015-09-22 04:19:58 ClientCnxn [INFO] Socket connection established to
localhost/127.0.0.1:2181, initiating session
2015-09-22 04:20:13 ConnectionState [ERROR] Connection timed out
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss
at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:72)
at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:74)
and here my storm.yaml file
storm.zookeeper.servers:
- "127.0.0.1"
nimbus.host: "127.0.0.1"
storm.local.dir: /tmp/storm
drpc.servers:
- "127.0.0.1"
- "server2"
is there anything else ??! what's wrong i have here ?
The problem was in the size of nimbus
it should be greater as possible
like
nimbus.thrift.max_buffer_size: 20480000
While running a topology in storm we are getting error like this,
8983 [Thread-6] INFO com.netflix.curator.framework.imps.CuratorFrameworkImpl -
Starting
9144 [main] INFO **backtype.storm.daemon.nimbus** - Shutting down master
9199 [Thread-6-EventThread] INFO backtype.storm.zookeeper - Zookeeper state upd
ate: :connected:none
9241 [main] INFO backtype.storm.daemon.nimbus - Shut down master
9273 [Thread-6] INFO com.netflix.curator.framework.imps.CuratorFrameworkImpl -
Starting
9306 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN org.apache.zookeeper.serv
er.NIOServerCnxn - EndOfStreamException: Unable to read additional data from cli
ent sessionid 0x143af55728d0003, likely client has closed socket
9354 [main] INFO backtype.storm.daemon.supervisor - Shutting down c094c3b1-a378
-4c4f-af35-9278647c217a:4beddc09-4675-4fb9-8bdc-9cf5013ce9ca
9358 [main] INFO backtype.storm.daemon.supervisor - Shut down c094c3b1-a378-4c4
f-af35-9278647c217a:4beddc09-4675-4fb9-8bdc-9cf5013ce9ca
9361 [main] INFO **backtype.storm.daemon.superviso**r - Shutting down supervisor c0
94c3b1-a378-4c4f-af35-9278647c217a
9364 [Thread-5] INFO **backtype.storm.event** - Event manager interrupted
9369 [Thread-6] INFO backtype.storm.event - Event manager interrupted
9425 [main] INFO **backtype.storm.daemon.supervisor** - Shutting down supervisor 38
6d8d71-c9b5-4b51-bd6e-f9f605034ea0
9428 [Thread-8] INFO backtype.storm.event - Event manager interrupted
9429 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN org.apache.zookeeper.serv
er.NIOServerCnxn - EndOfStreamException: Unable to read additional data from cli
ent sessionid 0x143af55728d0007, likely client has closed socket
9429 [Thread-9] INFO backtype.storm.event - Event manager interrupted
9473 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN org.apache.zookeeper.serv
er.NIOServerCnxn - EndOfStreamException: Unable to read additional data from cli
ent sessionid 0x143af55728d0009, likely client has closed socket
9476 [main] INFO backtype.storm.testing - Shutting down in process zookeeper
9503 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN org.apache.zookeeper.serv
er.NIOServerCnxn - Ignoring exception
**java.nio.channels.ClosedChannelException**: null
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.jav
a:211) ~[na:1.7.0_03]
at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.j
ava:242) ~[zookeeper-3.3.3.jar:3.3.3-1073969]
9510 [main] INFO **backtype.storm.testing** - Done shutting down in process zookeep
er
9513 [main] INFO backtype.storm.testing - Deleting temporary path C:\Users\sowm
iya\AppData\Local\Temp\c9b1bc1a-a950-4098-af77-f81a4d2b112f
9520 [main] INFO backtype.storm.testing - Deleting temporary path C:\Users\sowm
iya\AppData\Local\Temp\7e75c468-18ea-4787-a4ac-496fb108db71
9527 [main] INFO backtype.storm.testing - Unable to delete file: C:\Users\sowmi
ya\AppData\Local\Temp\7e75c468-18ea-4787-a4ac-496fb108db71\version-2\log.1
9529 [main] INFO backtype.storm.testing - Deleting temporary path C:\Users\sowm
iya\AppData\Local\Temp\fa7b3c9b-ac93-4090-b9e2-63f10019e61f
9543 [main] INFO backtype.storm.testing - Deleting temporary path C:\Users\sowm
iya\AppData\Local\Temp\55f1fd11-508e-43bb-b340-0d9b79f3af33
9579 [Thread-6-EventThread] INFO com.netflix.curator.framework.state.Connection
StateManager - State change: SUSPENDED
9580 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.Connec
tionStateManager - There are no ConnectionStateListeners registered.
9583 [Thread-6-EventThread] WARN backtype.storm.cluster - Received event :disco
nnected::none: with disconnected Zookeeper.
11232 [Thread-6-SendThread(localhost:2000)] WARN org.apache.zookeeper.ClientCnx
n - Session 0x143af55728d000b for server null, unexpected error, closing socket
connection and attempting reconnect
**java.net.ConnectException: Connection refused: no further information**
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.7.0_0
3]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701
) ~[na:1.7.0_03]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
~[zookeeper-3.3.3.jar:3.3.3-1073969]
13992 [Thread-6-SendThread(localhost:2000)] WARN org.apache.zookeeper.ClientCnx
n - Session 0x143af55728d000b for server null, unexpected error, closing socket
connection and attempting reconnect
java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.7.0_0
3]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701
) ~[na:1.7.0_03]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
Whwn we are trying to run the topology jar file all the operation like nimbus,zookeeper and supervisor process going to dead.please help us to know why this is happened.
Please help us to rectify this error and help to proceed further.
Thank you,
Sowmiya
Priya
This looks like a zookeeper issue. It looks like your processes are not being able to connect to zookeeper. Can't say more without more information.