Connection Refused when creating a two nodes cluster in Apache NiFi - cluster-computing
I'm facing a problem regarding a refused connection on the cluster node protocol port.
I'm using the following configs to create the two nodes cluster:
For the First node manager :
####################
# State Management #
####################
nifi.state.management.configuration.file=./conf/state-management.xml
# The ID of the local state provider
nifi.state.management.provider.local=local-provider
# The ID of the cluster-wide state provider. This will be ignored if NiFi is not clustered but must be populated if running in a cluster.
nifi.state.management.provider.cluster=zk-provider
# Specifies whether or not this instance of NiFi should run an embedded ZooKeeper server
nifi.state.management.embedded.zookeeper.start=true
# Properties file that provides the ZooKeeper properties to use if <nifi.state.management.embedded.zookeeper.start> is set to true
nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties
# web properties #
nifi.web.war.directory=./lib
nifi.web.http.host=10.129.140.22
nifi.web.http.port=3000
nifi.web.http.network.interface.default=
nifi.web.https.host=
nifi.web.https.port=
nifi.web.https.network.interface.default=
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200
nifi.web.max.header.size=16 KB
nifi.web.proxy.context.path=
nifi.web.proxy.host=
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=
nifi.cluster.node.protocol.port=10000
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=5 mins
nifi.cluster.flow.election.max.candidates=
# cluster load balancing properties #
nifi.cluster.load.balance.host=
nifi.cluster.load.balance.port=6342
nifi.cluster.load.balance.connections.per.node=4
nifi.cluster.load.balance.max.thread.count=8
nifi.cluster.load.balance.comms.timeout=30 sec
# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=localhost:2181
nifi.zookeeper.connect.timeout=3 secs
nifi.zookeeper.session.timeout=3 secs
nifi.zookeeper.root.node=/nifi
For the second node slave:
####################
# State Management #
####################
nifi.state.management.configuration.file=./conf/state-management.xml
# The ID of the local state provider
nifi.state.management.provider.local=local-provider
# The ID of the cluster-wide state provider. This will be ignored if NiFi is not clustered but must be populated if running in a cluster.
nifi.state.management.provider.cluster=zk-provider
# Specifies whether or not this instance of NiFi should run an embedded ZooKeeper server
nifi.state.management.embedded.zookeeper.start=false
# Properties file that provides the ZooKeeper properties to use if <nifi.state.management.embedded.zookeeper.start> is set to true
nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties
# web properties #
nifi.web.war.directory=./lib
nifi.web.http.host=
nifi.web.http.port=9021
nifi.web.http.network.interface.default=
nifi.web.https.host=
nifi.web.https.port=
nifi.web.https.network.interface.default=
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200
nifi.web.max.header.size=16 KB
nifi.web.proxy.context.path=
nifi.web.proxy.host=
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=
nifi.cluster.node.protocol.port=10001
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=5 mins
nifi.cluster.flow.election.max.candidates=
# cluster load balancing properties #
nifi.cluster.load.balance.host=10.129.140.22
nifi.cluster.load.balance.port=6343
nifi.cluster.load.balance.connections.per.node=4
nifi.cluster.load.balance.max.thread.count=8
nifi.cluster.load.balance.comms.timeout=30 sec
# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=10.129.140.22:2181
nifi.zookeeper.connect.timeout=3 secs
nifi.zookeeper.session.timeout=3 secs
nifi.zookeeper.root.node=/nifi
The logs fils shows the following :
For the slave
2019-05-23 10:37:07,384 INFO [main] o.a.n.c.repository.FileSystemRepository Initializing FileSystemRepository with 'Always Sync' set to false
2019-05-23 10:37:07,541 INFO [main] o.apache.nifi.controller.FlowController Not enabling RAW Socket Site-to-Site functionality because nifi.remote.input.socket.port is not set
2019-05-23 10:37:07,546 INFO [main] o.apache.nifi.controller.FlowController Checking if there is already a Cluster Coordinator Elected...
2019-05-23 10:37:07,591 INFO [main] o.a.c.f.imps.CuratorFrameworkImpl Starting
2019-05-23 10:37:07,658 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: CONNECTED
2019-05-23 10:37:07,693 INFO [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl backgroundOperationsLoop exiting
2019-05-23 10:37:07,697 INFO [main] o.apache.nifi.controller.FlowController The Election for Cluster Coordinator has already begun (Leader is localhost:10000). Will not register to be elected for this role until after connecting to the cluster and inheriting the cluster's flow.
2019-05-23 10:37:07,699 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=true] Registered new Leader Selector for role Cluster Coordinator; this node is a silent observer in the election.
2019-05-23 10:37:07,699 INFO [main] o.a.c.f.imps.CuratorFrameworkImpl Starting
2019-05-23 10:37:07,703 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] Registered new Leader Selector for role Cluster Coordinator; this node is a silent observer in the election.
2019-05-23 10:37:07,703 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] started
2019-05-23 10:37:07,703 INFO [main] o.a.n.c.c.h.AbstractHeartbeatMonitor Heartbeat Monitor started
2019-05-23 10:37:07,706 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: CONNECTED
2019-05-23 10:37:09,587 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#1a6a4595{nifi-api,/nifi-api,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-api-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-api-1.9.2.war}
2019-05-23 10:37:09,850 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=77ms
2019-05-23 10:37:09,852 INFO [main] o.e.j.s.h.C._nifi_content_viewer No Spring WebApplicationInitializer types detected on classpath
2019-05-23 10:37:09,873 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#4b1b2255{nifi-content-viewer,/nifi-content-viewer,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-content-viewer-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-content-viewer-1.9.2.war}
2019-05-23 10:37:09,895 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=6ms
2019-05-23 10:37:09,896 WARN [main] o.e.j.webapp.StandardDescriptorProcessor Duplicate mapping from / to default
2019-05-23 10:37:09,915 INFO [main] o.e.j.s.h.ContextHandler._nifi_docs No Spring WebApplicationInitializer types detected on classpath
2019-05-23 10:37:09,917 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#4965454c{nifi-docs,/nifi-docs,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-docs-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-docs-1.9.2.war}
2019-05-23 10:37:09,936 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=8ms
2019-05-23 10:37:09,955 INFO [main] o.e.j.server.handler.ContextHandler._ No Spring WebApplicationInitializer types detected on classpath
2019-05-23 10:37:09,957 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#1e4a4ed5{nifi-error,/,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-error-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-error-1.9.2.war}
2019-05-23 10:37:09,967 INFO [main] o.eclipse.jetty.server.AbstractConnector Started ServerConnector#4518bffd{HTTP/1.1,[http/1.1]}{0.0.0.0:9021}
2019-05-23 10:37:09,967 INFO [main] org.eclipse.jetty.server.Server Started #28769ms
2019-05-23 10:37:09,978 INFO [main] org.apache.nifi.web.server.JettyServer Loading Flow...
2019-05-23 10:37:09,982 INFO [main] org.apache.nifi.io.socket.SocketListener Now listening for connections from nodes on port 10001
2019-05-23 10:37:10,026 INFO [main] o.apache.nifi.controller.FlowController Successfully synchronized controller with proposed flow
2019-05-23 10:37:10,071 INFO [main] o.a.nifi.controller.StandardFlowService Connecting Node: localhost:9021
2019-05-23 10:37:10,073 INFO [main] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at localhost:10000; will use this address for sending heartbeat messages
2019-05-23 10:37:10,074 WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket to localhost:10000 due to: java.net.ConnectException: Connection refused (Connection refused)
2019-05-23 10:37:12,715 WARN [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Failed to determine which node is elected active Cluster Coordinator: ZooKeeper reports the address as localhost:10000, but there is no node with this address. Attempted to determine the node's information but failed to retrieve its information due to org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket due to: java.net.ConnectException: Connection refused (Connection refused)
2019-05-23 10:37:12,720 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for localhost:9021 -- Received heartbeat from node previously disconnected due to Has Not Yet Connected to Cluster. Issuing reconnection request.
2019-05-23 10:37:12,721 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for localhost:9021 -- Requesting that node connect to cluster
2019-05-23 10:37:12,721 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Status of localhost:9021 changed from NodeConnectionStatus[nodeId=localhost:9021, state=DISCONNECTED, Disconnect Code=Has Not Yet Connected to Cluster, Disconnect Reason=Has Not Yet Connected to Cluster, updateId=1] to NodeConnectionStatus[nodeId=localhost:9021, state=CONNECTING, updateId=3]
2019-05-23 10:37:15,075 INFO [main] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at localhost:10000; will use this address for sending heartbeat messages
2019-05-23 10:37:15,076 WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket to localhost:10000 due to: java.net.ConnectException: Connection refused (Connection refused)
For the manager
2019-05-23 10:36:59,752 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2019-05-23 10:36:59,752 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:java.io.tmpdir=/tmp
2019-05-23 10:36:59,752 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:java.compiler=<NA>
2019-05-23 10:36:59,752 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:os.name=Linux
2019-05-23 10:36:59,752 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:os.arch=amd64
2019-05-23 10:36:59,753 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:os.version=4.15.0-20-generic
2019-05-23 10:36:59,753 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:user.name=root
2019-05-23 10:36:59,753 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:user.home=/root
2019-05-23 10:36:59,753 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:user.dir=/home/superman/nifi-1.9.2
2019-05-23 10:36:59,753 INFO [main] o.a.zookeeper.server.ZooKeeperServer tickTime set to 2000
2019-05-23 10:36:59,754 INFO [main] o.a.zookeeper.server.ZooKeeperServer minSessionTimeout set to -1
2019-05-23 10:36:59,754 INFO [main] o.a.zookeeper.server.ZooKeeperServer maxSessionTimeout set to -1
2019-05-23 10:36:59,855 INFO [main] o.apache.nifi.controller.FlowController Checking if there is already a Cluster Coordinator Elected...
2019-05-23 10:36:59,903 INFO [main] o.a.c.f.imps.CuratorFrameworkImpl Starting
2019-05-23 10:36:59,950 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] o.a.zookeeper.server.ZooKeeperServer Client attempting to establish new session at /127.0.0.1:40388
2019-05-23 10:36:59,950 INFO [SyncThread:0] o.a.z.server.persistence.FileTxnLog Creating new log file: log.3c
2019-05-23 10:36:59,963 INFO [SyncThread:0] o.a.zookeeper.server.ZooKeeperServer Established session 0x16ae443f4130000 with negotiated timeout 4000 for client /127.0.0.1:40388
2019-05-23 10:36:59,975 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: CONNECTED
2019-05-23 10:36:59,998 INFO [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl backgroundOperationsLoop exiting
2019-05-23 10:37:00,003 INFO [main] o.apache.nifi.controller.FlowController The Election for Cluster Coordinator has already begun (Leader is localhost:10001). Will not register to be elected for this role until after connecting to the cluster and inheriting the cluster's flow.
2019-05-23 10:37:00,005 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=true] Registered new Leader Selector for role Cluster Coordinator; this node is a silent observer in the election.
2019-05-23 10:37:00,005 INFO [main] o.a.c.f.imps.CuratorFrameworkImpl Starting
2019-05-23 10:37:00,017 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] Registered new Leader Selector for role Cluster Coordinator; this node is a silent observer in the election.
2019-05-23 10:37:00,017 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] started
2019-05-23 10:37:00,017 INFO [main] o.a.n.c.c.h.AbstractHeartbeatMonitor Heartbeat Monitor started
2019-05-23 10:37:00,019 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] o.a.zookeeper.server.ZooKeeperServer Client attempting to establish new session at /127.0.0.1:40390
2019-05-23 10:37:00,020 INFO [SyncThread:0] o.a.zookeeper.server.ZooKeeperServer Established session 0x16ae443f4130001 with negotiated timeout 4000 for client /127.0.0.1:40390
2019-05-23 10:37:00,020 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: CONNECTED
2019-05-23 10:37:02,022 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#1a05ff8e{nifi-api,/nifi-api,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-api-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-api-1.9.2.war}
2019-05-23 10:37:02,373 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=165ms
2019-05-23 10:37:02,375 INFO [main] o.e.j.s.h.C._nifi_content_viewer No Spring WebApplicationInitializer types detected on classpath
2019-05-23 10:37:02,401 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#251e2f4a{nifi-content-viewer,/nifi-content-viewer,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-content-viewer-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-content-viewer-1.9.2.war}
2019-05-23 10:37:02,419 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=6ms
2019-05-23 10:37:02,420 WARN [main] o.e.j.webapp.StandardDescriptorProcessor Duplicate mapping from / to default
2019-05-23 10:37:02,421 INFO [main] o.e.j.s.h.ContextHandler._nifi_docs No Spring WebApplicationInitializer types detected on classpath
2019-05-23 10:37:02,441 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#1abea1ed{nifi-docs,/nifi-docs,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-docs-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-docs-1.9.2.war}
2019-05-23 10:37:02,457 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=6ms
2019-05-23 10:37:02,475 INFO [main] o.e.j.server.handler.ContextHandler._ No Spring WebApplicationInitializer types detected on classpath
2019-05-23 10:37:02,478 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#6f5288c5{nifi-error,/,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-error-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-error-1.9.2.war}
2019-05-23 10:37:02,488 INFO [main] o.eclipse.jetty.server.AbstractConnector Started ServerConnector#167ed1cf{HTTP/1.1,[http/1.1]}{10.129.140.22:3000}
2019-05-23 10:37:02,488 INFO [main] org.eclipse.jetty.server.Server Started #26145ms
2019-05-23 10:37:02,500 INFO [main] org.apache.nifi.web.server.JettyServer Loading Flow...
2019-05-23 10:37:02,503 INFO [main] org.apache.nifi.io.socket.SocketListener Now listening for connections from nodes on port 10000
2019-05-23 10:37:02,545 INFO [main] o.apache.nifi.controller.FlowController Successfully synchronized controller with proposed flow
2019-05-23 10:37:02,587 INFO [main] o.a.nifi.controller.StandardFlowService Connecting Node: 10.129.140.22:3000
2019-05-23 10:37:02,589 INFO [main] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at localhost:10001; will use this address for sending heartbeat messages
2019-05-23 10:37:02,590 WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket to localhost:10001 due to: java.net.ConnectException: Connection refused (Connection refused)
2019-05-23 10:37:04,001 INFO [SessionTracker] o.a.zookeeper.server.ZooKeeperServer Expiring session 0x16ae42f180d0003, timeout of 4000ms exceeded
2019-05-23 10:37:04,001 INFO [SessionTracker] o.a.zookeeper.server.ZooKeeperServer Expiring session 0x16ae42f180d0002, timeout of 4000ms exceeded
2019-05-23 10:37:05,026 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Received heartbeat from node previously disconnected due to Has Not Yet Connected to Cluster. Issuing reconnection request.
2019-05-23 10:37:05,028 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Requesting that node connect to cluster
2019-05-23 10:37:05,028 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=DISCONNECTED, Disconnect Code=Has Not Yet Connected to Cluster, Disconnect Reason=Has Not Yet Connected to Cluster, updateId=0] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=5]
2019-05-23 10:37:07,591 WARN [main] o.a.nifi.controller.StandardFlowService There is currently no Cluster Coordinator. This often happens upon restart of NiFi when running an embedded ZooKeeper. Will register this node to become the active Cluster Coordinator and will attempt to connect to cluster again
2019-05-23 10:37:07,594 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] Registered new Leader Selector for role Cluster Coordinator; this node is an active participant in the election.
2019-05-23 10:37:07,612 INFO [Leader Election Notification Thread-1] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener#1d6dcdcb This node has been elected Leader for Role 'Cluster Coordinator'
2019-05-23 10:37:07,612 INFO [Leader Election Notification Thread-1] o.apache.nifi.controller.FlowController This node elected Active Cluster Coordinator
2019-05-23 10:37:07,668 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Received heartbeat from node previously disconnected due to Has Not Yet Connected to Cluster. Issuing reconnection request.
2019-05-23 10:37:07,668 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Requesting that node connect to cluster
2019-05-23 10:37:07,669 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=DISCONNECTED, Disconnect Code=Has Not Yet Connected to Cluster, Disconnect Reason=Has Not Yet Connected to Cluster, updateId=1] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=6]
2019-05-23 10:37:07,675 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Received heartbeat from node previously disconnected due to Has Not Yet Connected to Cluster. Issuing reconnection request.
2019-05-23 10:37:07,675 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Requesting that node connect to cluster
2019-05-23 10:37:07,675 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=DISCONNECTED, Disconnect Code=Has Not Yet Connected to Cluster, Disconnect Reason=Has Not Yet Connected to Cluster, updateId=2] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=7]
2019-05-23 10:37:07,694 INFO [Process Cluster Protocol Request-1] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=5] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=5]
2019-05-23 10:37:07,695 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Received heartbeat from node previously disconnected due to Has Not Yet Connected to Cluster. Issuing reconnection request.
2019-05-23 10:37:07,699 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Requesting that node connect to cluster
2019-05-23 10:37:07,700 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=DISCONNECTED, Disconnect Code=Has Not Yet Connected to Cluster, Disconnect Reason=Has Not Yet Connected to Cluster, updateId=3] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=8]
2019-05-23 10:37:07,701 INFO [Process Cluster Protocol Request-5] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=7] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=7]
2019-05-23 10:37:07,702 INFO [Process Cluster Protocol Request-1] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 19834836-9bda-41b3-8fef-4a288d90c7bf (type=NODE_STATUS_CHANGE, length=1103 bytes) from localhost.localdomain in 33 millis
2019-05-23 10:37:07,702 INFO [Process Cluster Protocol Request-5] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 85b0bb3f-c2a6-4dfd-abd6-e9df14710c4d (type=NODE_STATUS_CHANGE, length=1103 bytes) from localhost.localdomain in 10 millis
2019-05-23 10:37:07,703 INFO [Process Cluster Protocol Request-3] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=6] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=6]
2019-05-23 10:37:07,705 INFO [Process Cluster Protocol Request-3] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 80447901-4ad3-44e3-91ad-d9f075624eae (type=NODE_STATUS_CHANGE, length=1103 bytes) from localhost.localdomain in 31 millis
2019-05-23 10:37:07,706 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Processing reconnection request from cluster coordinator.
2019-05-23 10:37:07,706 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Received a Reconnection Request that contained no DataFlow. Will attempt to connect to cluster using local flow.
2019-05-23 10:37:07,707 INFO [Process Cluster Protocol Request-2] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 22cdceee-c01f-445f-a091-38812e878d10 (type=RECONNECTION_REQUEST, length=3095 bytes) from 10.129.140.22:3000 in 34 millis
2019-05-23 10:37:07,708 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Processing reconnection request from cluster coordinator.
2019-05-23 10:37:07,708 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Received a Reconnection Request that contained no DataFlow. Will attempt to connect to cluster using local flow.
2019-05-23 10:37:07,709 INFO [Process Cluster Protocol Request-4] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 8605cf39-2034-4ee2-92c4-0fbe54e97fb2 (type=RECONNECTION_REQUEST, length=3013 bytes) from 10.129.140.22:3000 in 27 millis
2019-05-23 10:37:07,712 INFO [Reconnect 10.129.140.22:3000] o.a.n.c.c.node.NodeClusterCoordinator Successfully requested that 10.129.140.22:3000 join the cluster
2019-05-23 10:37:07,712 INFO [Reconnect 10.129.140.22:3000] o.a.n.c.c.node.NodeClusterCoordinator Successfully requested that 10.129.140.22:3000 join the cluster
2019-05-23 10:37:07,725 INFO [Process Cluster Protocol Request-6] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 0ca55348-44eb-416b-91dd-3d80da4c5ebe (type=RECONNECTION_REQUEST, length=3013 bytes) from 10.129.140.22:3000 in 29 millis
2019-05-23 10:37:07,725 INFO [Reconnect 10.129.140.22:3000] o.a.n.c.c.node.NodeClusterCoordinator Successfully requested that 10.129.140.22:3000 join the cluster
2019-05-23 10:37:07,728 INFO [Process Cluster Protocol Request-7] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=8] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=8]
2019-05-23 10:37:07,728 INFO [Process Cluster Protocol Request-7] o.a.n.c.p.impl.SocketProtocolListener Finished processing request c9b647d7-67ac-4d0a-833b-8a0a8cc0ba6d (type=NODE_STATUS_CHANGE, length=1103 bytes) from localhost.localdomain in 3 millis
2019-05-23 10:37:07,728 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Connecting Node: 10.129.140.22:3000
2019-05-23 10:37:07,725 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Processing reconnection request from cluster coordinator.
2019-05-23 10:37:07,732 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.h.AbstractHeartbeatMonitor Finished processing 4 heartbeats in 2 seconds, 708 millis
2019-05-23 10:37:07,732 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Received a Reconnection Request that contained no DataFlow. Will attempt to connect to cluster using local flow.
2019-05-23 10:37:07,733 INFO [Reconnect to Cluster] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at localhost:10000; will use this address for sending heartbeat messages
2019-05-23 10:37:07,734 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Connecting Node: 10.129.140.22:3000
2019-05-23 10:37:07,735 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Connecting Node: 10.129.140.22:3000
2019-05-23 10:37:07,736 INFO [Reconnect to Cluster] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at localhost:10000; will use this address for sending heartbeat messages
2019-05-23 10:37:07,736 INFO [Reconnect to Cluster] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at localhost:10000; will use this address for sending heartbeat messages
2019-05-23 10:37:07,748 INFO [Process Cluster Protocol Request-8] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 434daf63-1beb-4b82-9290-bb0da4e89b7f (type=RECONNECTION_REQUEST, length=2972 bytes) from 10.129.140.22:3000 in 16 millis
2019-05-23 10:37:07,749 INFO [Reconnect 10.129.140.22:3000] o.a.n.c.c.node.NodeClusterCoordinator Successfully requested that 10.129.140.22:3000 join the cluster**
Set these properties on both:
nifi.web.http.host=<host>
nifi.cluster.node.address=<host>
Beware of this value visibility in network scopes:
nifi.zookeeper.connect.string=localhost:2181
e.g 'localhost', in the other side you're using real IP addr
They share it during replication, primary/coordinator node election and flow election.
Related
Apache Nifi - refused to connect to localhost error
When I tried to connect to Nifi UI using http://localhost:8080/nifi, i am getting below error org.apache.nifi.web.server.JettyServer Failed to start web server... shutting down. java.net.BindException: Address already in use: bind at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConnector.java:331) at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:299) at org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80) at org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:235) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) at org.eclipse.jetty.server.Server.doStart(Server.java:398) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) at org.apache.nifi.web.server.JettyServer.start(JettyServer.java:935) at org.apache.nifi.NiFi.<init>(NiFi.java:158) at org.apache.nifi.NiFi.<init>(NiFi.java:72) at org.apache.nifi.NiFi.main(NiFi.java:297) 2020-02-27 11:51:11,834 INFO [Thread-1] org.apache.nifi.NiFi Initiating shutdown of Jetty web server... 2020-02-27 11:51:11,836 INFO [Thread-1] o.eclipse.jetty.server.AbstractConnector Stopped ServerConnector#355ee205{HTTP/1.1,[http/1.1]}{0.0.0.0:8080} 2020-02-27 11:51:11,837 INFO [Thread-1] org.eclipse.jetty.server.session node 0 Stopped scavenging Can anyone suggest what is the cause of this issue? Nifi version- 1.9.2,installed on windows machine Here is the nifi status logs, 12:33:16.886 [main] DEBUG org.apache.nifi.bootstrap.NotificationServiceManager - Found 0 service elements 12:33:16.896 [main] INFO org.apache.nifi.bootstrap.NotificationServiceManager - Successfully loaded the following 0 services: [] 12:33:16.897 [main] INFO org.apache.nifi.bootstrap.RunNiFi - Registered no Notification Services for Notification Type NIFI_STARTED 12:33:16.897 [main] INFO org.apache.nifi.bootstrap.RunNiFi - Registered no Notification Services for Notification Type NIFI_STOPPED 12:33:16.898 [main] INFO org.apache.nifi.bootstrap.RunNiFi - Registered no Notification Services for Notification Type NIFI_DIED 12:33:16.899 [main] DEBUG org.apache.nifi.bootstrap.Command - Status File: 12:33:16.900 [main] DEBUG org.apache.nifi.bootstrap.Command - Properties: {pid=9724} Failed to determine if Process 9724 is running; assuming that it is not 12:33:16.902 [main] INFO org.apache.nifi.bootstrap.Command - Apache NiFi is not running
The port use by nifi is already used by another process. you can change web server port in conf/nifi.properties
Unable to close file because the last block does not have enough number of replicas.
I got an error like this. The health test result for HDFS_CANARY_HEALTH has become bad: Canary test failed to write file in directory /tmp/.cloudera_health_monitoring_canary_files. 2019-07-14 00:02:13,312 INFO hive.metastore: Trying to connect to metastore with URI xxxx:abc 2019-07-14 00:02:13,322 INFO hive.metastore: Opened a connection to metastore, current connections: 1 2019-07-14 00:02:13,322 INFO hive.metastore: Connected to metastore. 2019-07-14 00:02:14,255 INFO hive.metastore: Closed a connection to metastore, current connections: 0 2019-07-14 00:02:21,444 INFO com.cloudera.cmon.tstore.leveldb.LDBTimeSeriesRollupManager: Finished rollup: duration=PT21.280S, numStreamsChecked=607071, numStreamsRolle dUp=199173 2019-07-14 00:03:04,172 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x2659d409 connecting to ZooKeeper ensemble=___ 2019-07-14 00:03:04,182 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=ReplicationAdmin connecting to ZooKeeper ensemble=____ 2019-07-14 00:03:04,198 INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x16b95f7c324bcf3 2019-07-14 00:03:08,346 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x2fc2fce5 connecting to ZooKeeper ensemble=___ 2019-07-14 00:03:08,354 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=ReplicationAdmin connecting to ZooKeeper ensemble=___ 2019-07-14 00:03:08,365 INFO org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x26b8ea7e7c6cfde 2019-07-14 00:03:11,897 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/.cloudera_health_monitoring_canary_files/.canary_file_2019_07_14-00_03_03 retrying... 2019-07-14 00:03:18,388 INFO org.apache.hadoop.hdfs.DFSClient: Could not complete /tmp/.cloudera_health_monitoring_canary_files/.canary_file_2019_07_14-00_03_03 retrying... 2019-07-14 00:03:18,562 ERROR com.cloudera.cmon.firehose.polling.hdfs.HdfsCanary: com.cloudera.cmon.firehose.polling.hdfs.HdfsCanary#5b4e2d83 for hdfs://hdfs-cluster_name Failed to write to /tmp/.cloudera_health_monitoring_canary_files/.canary_file_2019_07_14-00_03_03. Error: java.io.IOException: Unable to close file because the last block BP-972893351-10.14.208.100-1438206436439:blk_4349006744_3275530539 does not have enough number of replicas. My question is which "file" it referring to in ERROR: Unable to close file because the last block does not have enough number of replicas.
INFO [NiFi Bootstrap Command Listener] org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening for Boo tstrap requests on port 25249
I am trying to restart my apachi nifi , but it is showing below message not getting server up. Below is the log from nifi-bootstrap.log INFO [NiFi Bootstrap Command Listener] org.apache.nifi.bootstrap.RunNiFi Apache NiFi now running and listening for Boo tstrap requests on port 25249 2018-06-13 11:09:20,346 INFO [main] o.a.n.b.NotificationServiceManager Successfully loaded the following 0 services: [] 2018-06-13 11:09:20,354 INFO [main] org.apache.nifi.bootstrap.RunNiFi Registered no Notification Services for Notification Type NIFI_STARTED 2018-06-13 11:09:20,355 INFO [main] org.apache.nifi.bootstrap.RunNiFi Registered no Notification Services for Notification Type NIFI_STOPPED 2018-06-13 11:09:20,355 INFO [main] org.apache.nifi.bootstrap.RunNiFi Registered no Notification Services for Notification Type NIFI_DIED 2018-06-13 11:09:20,390 INFO [main] org.apache.nifi.bootstrap.Command Apache NiFi is currently running, listening to Bootstrap on port 25249, PID=24445 2018-06-13 11:09:53,964 ERROR [NiFi logging handler] org.apache.nifi.StdErr Failed to start web server: null 2018-06-13 11:09:53,971 ERROR [NiFi logging handler] org.apache.nifi.StdErr Shutting down... 2018-06-13 11:09:54,738 INFO [main] org.apache.nifi.bootstrap.RunNiFi NiFi never started. Will not restart NiFi Can someone suggest where we can setup bootstrap port . Regards, SP
Hadoop--Compile successfully but after submit job broke down
I write my own Scheduler in Hadoop2.6.0 inherit AbstractYarnScheduler. I compile successfully but when submit job in hadoop, RM broke down. Here is the log in Master Node 2015-07-19 13:31:59,931 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: application_1437327062733_0001 State change from NEW to NEW_SAVING **2015-07-19 13:31:59,932 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread** java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/proto/YarnServerResourceManagerServiceProtos$ApplicationStateDataProtoOrBuilder at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2013) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1978) at org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java:56) at org.apache.hadoop.yarn.util.Records.newRecord(Records.java:36) at org.apache.hadoop.yarn.server.resourcemanager.recovery.records.ApplicationStateData.newInstance(ApplicationStateData.java:43) at org.apache.hadoop.yarn.server.resourcemanager.recovery.records.ApplicationStateData.newInstance(ApplicationStateData.java:56) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:131) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreAppTransition.transition(RMStateStore.java:1) at org.apache.hadoop.yarn.state.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:362) at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:787) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:839) at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler.handle(RMStateStore.java:1) at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.proto.YarnServerResourceManagerServiceProtos$ApplicationStateDataProtoOrBuilder at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 20 more 2015-07-19 13:31:59,934 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye.. 2015-07-19 13:31:59,937 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted 2015-07-19 13:31:59,938 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup#0.0.0.0:8088 2015-07-19 13:31:59,938 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted 2015-07-19 13:31:59,938 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted 2015-07-19 13:31:59,944 WARN org.apache.hadoop.ipc.Server: IPC Server handler 2 on 8032, call org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport from 129.10.58.155:50992 Call#42 Retry#0 java.lang.NoSuchMethodError: org.apache.hadoop.yarn.server.utils.BuilderUtils.newApplicationResourceUsageReport(IILorg/apache/hadoop/yarn/api/records/Resource;Lorg/apache/hadoop/yarn/api/records/Resource;Lorg/apache/hadoop/yarn/api/records/Resource;)Lorg/apache/hadoop/yarn/api/records/ApplicationResourceUsageReport; at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.<clinit>(RMServerUtils.java:237) at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.createAndGetApplicationReport(RMAppImpl.java:520) at org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:296) at org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:170) at org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:401) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033) 2015-07-19 13:32:00,039 INFO org.apache.hadoop.ipc.Server: Stopping server on 8032 2015-07-19 13:32:00,040 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8032 2015-07-19 13:32:00,040 INFO org.apache.hadoop.ipc.Server: Stopping server on 8033 2015-07-19 13:32:00,041 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8033 2015-07-19 13:32:00,041 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2015-07-19 13:32:00,041 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to standby state 2015-07-19 13:32:00,041 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager metrics system... 2015-07-19 13:32:00,042 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2015-07-19 13:32:00,042 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system stopped. 2015-07-19 13:32:00,042 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system shutdown complete. 2015-07-19 13:32:00,042 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher is draining to stop, igonring any new events. 2015-07-19 13:32:01,042 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. 2015-07-19 13:32:02,043 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. 2015-07-19 13:32:03,043 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. 2015-07-19 13:32:04,043 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. 2015-07-19 13:32:05,044 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. 2015-07-19 13:32:06,044 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. 2015-07-19 13:32:07,044 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain. 2015-07-19 13:32:08,044 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: Waiting for AsyncDispatcher to drain.
According to the error information, the system can't find the class org.apache.hadoop.yarn.proto.YarnServerResourceManagerServiceProtos$ApplicationStateDataProtoOrBuilder, do you add the class to the classpath?
I think it could be some user session/permission issue, I got same error in my standalone instance (Ubuntu Desktop 16 LTE, jdk1.8.92 & Hadoop 2.7.2). It works normal again if I restart my machine & start over again. it keeps popping up same error if I just re-login, restart daemons, resubmit the job on the same terminal session. Were you able to fix this issue? Steps to reproduce on my machine are: (1)Start terminal, login (on terminal) to hadoop dedicated user hduser (a sudo user) using command: su hduser (2)start hadoop daemons using commands: start-dfs.sh & start-yarn.sh (3)I can see all processes with jps command. (4)Few MR jobs completed successfully. I submit same job after about 10-15 min. (5)hduser user session is thrown out & land in regular desktop user session.
While running a topology in storm we are getting error like this
While running a topology in storm we are getting error like this, 8983 [Thread-6] INFO com.netflix.curator.framework.imps.CuratorFrameworkImpl - Starting 9144 [main] INFO **backtype.storm.daemon.nimbus** - Shutting down master 9199 [Thread-6-EventThread] INFO backtype.storm.zookeeper - Zookeeper state upd ate: :connected:none 9241 [main] INFO backtype.storm.daemon.nimbus - Shut down master 9273 [Thread-6] INFO com.netflix.curator.framework.imps.CuratorFrameworkImpl - Starting 9306 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN org.apache.zookeeper.serv er.NIOServerCnxn - EndOfStreamException: Unable to read additional data from cli ent sessionid 0x143af55728d0003, likely client has closed socket 9354 [main] INFO backtype.storm.daemon.supervisor - Shutting down c094c3b1-a378 -4c4f-af35-9278647c217a:4beddc09-4675-4fb9-8bdc-9cf5013ce9ca 9358 [main] INFO backtype.storm.daemon.supervisor - Shut down c094c3b1-a378-4c4 f-af35-9278647c217a:4beddc09-4675-4fb9-8bdc-9cf5013ce9ca 9361 [main] INFO **backtype.storm.daemon.superviso**r - Shutting down supervisor c0 94c3b1-a378-4c4f-af35-9278647c217a 9364 [Thread-5] INFO **backtype.storm.event** - Event manager interrupted 9369 [Thread-6] INFO backtype.storm.event - Event manager interrupted 9425 [main] INFO **backtype.storm.daemon.supervisor** - Shutting down supervisor 38 6d8d71-c9b5-4b51-bd6e-f9f605034ea0 9428 [Thread-8] INFO backtype.storm.event - Event manager interrupted 9429 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN org.apache.zookeeper.serv er.NIOServerCnxn - EndOfStreamException: Unable to read additional data from cli ent sessionid 0x143af55728d0007, likely client has closed socket 9429 [Thread-9] INFO backtype.storm.event - Event manager interrupted 9473 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN org.apache.zookeeper.serv er.NIOServerCnxn - EndOfStreamException: Unable to read additional data from cli ent sessionid 0x143af55728d0009, likely client has closed socket 9476 [main] INFO backtype.storm.testing - Shutting down in process zookeeper 9503 [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2000] WARN org.apache.zookeeper.serv er.NIOServerCnxn - Ignoring exception **java.nio.channels.ClosedChannelException**: null at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.jav a:211) ~[na:1.7.0_03] at org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.j ava:242) ~[zookeeper-3.3.3.jar:3.3.3-1073969] 9510 [main] INFO **backtype.storm.testing** - Done shutting down in process zookeep er 9513 [main] INFO backtype.storm.testing - Deleting temporary path C:\Users\sowm iya\AppData\Local\Temp\c9b1bc1a-a950-4098-af77-f81a4d2b112f 9520 [main] INFO backtype.storm.testing - Deleting temporary path C:\Users\sowm iya\AppData\Local\Temp\7e75c468-18ea-4787-a4ac-496fb108db71 9527 [main] INFO backtype.storm.testing - Unable to delete file: C:\Users\sowmi ya\AppData\Local\Temp\7e75c468-18ea-4787-a4ac-496fb108db71\version-2\log.1 9529 [main] INFO backtype.storm.testing - Deleting temporary path C:\Users\sowm iya\AppData\Local\Temp\fa7b3c9b-ac93-4090-b9e2-63f10019e61f 9543 [main] INFO backtype.storm.testing - Deleting temporary path C:\Users\sowm iya\AppData\Local\Temp\55f1fd11-508e-43bb-b340-0d9b79f3af33 9579 [Thread-6-EventThread] INFO com.netflix.curator.framework.state.Connection StateManager - State change: SUSPENDED 9580 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.Connec tionStateManager - There are no ConnectionStateListeners registered. 9583 [Thread-6-EventThread] WARN backtype.storm.cluster - Received event :disco nnected::none: with disconnected Zookeeper. 11232 [Thread-6-SendThread(localhost:2000)] WARN org.apache.zookeeper.ClientCnx n - Session 0x143af55728d000b for server null, unexpected error, closing socket connection and attempting reconnect **java.net.ConnectException: Connection refused: no further information** at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.7.0_0 3] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701 ) ~[na:1.7.0_03] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119) ~[zookeeper-3.3.3.jar:3.3.3-1073969] 13992 [Thread-6-SendThread(localhost:2000)] WARN org.apache.zookeeper.ClientCnx n - Session 0x143af55728d000b for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused: no further information at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.7.0_0 3] at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701 ) ~[na:1.7.0_03] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119) Whwn we are trying to run the topology jar file all the operation like nimbus,zookeeper and supervisor process going to dead.please help us to know why this is happened. Please help us to rectify this error and help to proceed further. Thank you, Sowmiya Priya
This looks like a zookeeper issue. It looks like your processes are not being able to connect to zookeeper. Can't say more without more information.