NiFi Cluster setup - apache-nifi

Please help me to complete Nifi cluster setup. I can see Nifi is running on server, but GUI is not coming.
Java version:
openjdk version "1.8.0_302"
OpenJDK Runtime Environment (build 1.8.0_302-b08)
OpenJDK 64-Bit Server VM (build 25.302-b08, mixed mode)
NiFi version: nifi-1.17.0
Nifi.properties:
nifi.state.management.embedded.zookeeper.start=true
nifi.remote.input.host=Svxxx.xyz.com
nifi.remote.input.secure=true
nifi.remote.input.socket.port=10443
nifi.remote.input.http.enabled=true
nifi.web.https.host=Svxxx.xyz.com
nifi.web.https.port=9443
nifi.web.proxy.host=localhost:9443,Svxxx.xyz.com:9443
nifi.sensitive.props.key=propkeywith12chars
nifi.cluster.is.node=true
nifi.cluster.node.address=Svxxx.xyz.com
nifi.cluster.node.protocol.port=11443
nifi.cluster.load.balance.host=Svxxx.xyz.com
nifi.cluster.load.balance.port=6342
nifi.zookeeper.connect.string=Svxxx.xyz.com:2181,Svxxx.xyz.com:2181,Svxxx.xyz.com:2181
zookeeper. properties:
server.1=Svxxx.xyz.com:2888:3888;2181
server.2=Svxxx.xyz.com:2888:3888;2181
server.3=Svxxx.xyz.com:2888:3888;2181
Changes made in state-management.xml:
<cluster-provider>
<id>zk-provider</id>
<class>org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider</class>
<property name="Connect String">Svxxx.xyz.com:2181,Svxxx.xyz.com:2181,Svxxx.xyz.com:2181</property>
<property name="Root Node">/nifi</property>
<property name="Session Timeout">10 seconds</property>
<property name="Access Control">Open</property>
</cluster-provider>
Firewall status: disabled
Also created SSL certificate using toolkit, and put those on respective servers. Replaced truststore.jks and keystore.jks also accordingly.
nifi-app.log:
2022-10-17 22:00:05,669 WARN [main] o.a.nifi.controller.StandardFlowService There is currently no Cluster Coordinator. This often happens upon restart of NiFi when running an embedded ZooKeeper. Will register this node to become the active Cluster Coordinator and will attempt to connect to cluster again
2022-10-17 22:00:05,670 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] Attempted to register Leader Election for role 'Cluster Coordinator' but this role is already registered
2022-10-17 22:00:11,403 WARN [Heartbeat Monitor Thread-1] o.a.n.c.l.e.CuratorLeaderElectionManager Unable to determine leader for role 'Cluster Coordinator'; returning null
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /nifi/leaders/Cluster Coordinator
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:2480)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:242)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:231)
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:93)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:228)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:219)
at org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:41)
at org.apache.curator.framework.recipes.locks.LockInternals.getSortedChildren(LockInternals.java:154)
at org.apache.curator.framework.recipes.locks.LockInternals.getParticipantNodes(LockInternals.java:134)
at org.apache.curator.framework.recipes.locks.InterProcessMutex.getParticipantNodes(InterProcessMutex.java:170)
at org.apache.curator.framework.recipes.leader.LeaderSelector.getLeader(LeaderSelector.java:337)
at org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager.getLeader(CuratorLeaderElectionManager.java:281)
at org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener.verifyLeader(CuratorLeaderElectionManager.java:571)
at org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener.isLeader(CuratorLeaderElectionManager.java:525)
at org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$LeaderRole.isLeader(CuratorLeaderElectionManager.java:466)
at org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager.isLeader(CuratorLeaderElectionManager.java:262)
at org.apache.nifi.cluster.coordination.node.NodeClusterCoordinator.isActiveClusterCoordinator(NodeClusterCoordinator.java:824)
at org.apache.nifi.cluster.coordination.heartbeat.AbstractHeartbeatMonitor.monitorHeartbeats(AbstractHeartbeatMonitor.java:132)
at org.apache.nifi.cluster.coordination.heartbeat.AbstractHeartbeatMonitor$1.run(AbstractHeartbeatMonitor.java:84)
at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2022-10-17 22:00:12,371 INFO [Cleanup Archive for default] o.a.n.c.repository.FileSystemRepository Successfully deleted 0 files (0 bytes) from archive
2022-10-17 22:00:12,371 INFO [Cleanup Archive for default] o.a.n.c.repository.FileSystemRepository Archive cleanup completed for container default; will now allow writing to this container. Bytes used = 10.53 GB, bytes free = 26.46 GB, capacity = 36.99 GB
2022-10-17 22:00:14,203 INFO [Write-Ahead Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog org.wali.MinimalLockingWriteAheadLog#22e813fc checkpointed with 1 Records and 0 Swap Files in 5 milliseconds (Stop-the-world time = 2 milliseconds, Clear Edit Logs time = 2 millis), max Transaction ID 3
2022-10-17 22:00:18,823 WARN [main] o.a.n.c.l.e.CuratorLeaderElectionManager Unable to determine leader for role 'Cluster Coordinator'; returning null
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /nifi/leaders/Cluster Coordinator

Related

Using Fiware Draco 2.1.0 How to setup NiFi cluster using external zookeeper

I'm trying to set up a NiFi cluster using an external zookeeper version is 3.4.10 container is run kubectl pod.
I have changed the following things in nifi.properties
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=XXXXXXXXXXXXXXXX
nifi.cluster.node.protocol.port=8082
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=1 min
nifi.cluster.flow.election.max.candidates=
# cluster load balancing properties #
nifi.cluster.load.balance.host=
nifi.cluster.load.balance.port=6342
nifi.cluster.load.balance.connections.per.node=1
nifi.cluster.load.balance.max.thread.count=8
nifi.cluster.load.balance.comms.timeout=30 sec
# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=xxxx:2181,xxxx:2181,xxxx:2181
nifi.zookeeper.connect.timeout=10 secs
nifi.zookeeper.session.timeout=10 secs
nifi.zookeeper.root.node=/nifi
I am not able to connect zookeeper getting a connection error
2022-09-23 10:13:23,665 ERROR [main-EventThread] o.a.c.f.imps.CuratorFrameworkImpl Background operation retry gave up
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:885)
at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:677)
at org.apache.curator.framework.imps.WatcherRemovalFacade.processBackgroundOperation(WatcherRemovalFacade.java:152)
at org.apache.curator.framework.imps.GetConfigBuilderImpl$2.processResult(GetConfigBuilderImpl.java:222)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:601)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
2022-09-23 10:13:24,331 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: RECONNECTED
2022-09-23 10:13:24,331 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener#30dbc287 Connection State changed to RECONNECTED
2022-09-23 10:13:24,431 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: SUSPENDED
2022-09-23 10:13:24,431 INFO [Curator-ConnectionStateManager-0] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener#30dbc287 Connection State changed to SUSPENDED
Connection State continues changing reconnecting and suspending
How can I set up a NiFi cluster for using an external zookeeper connection?
Did any one is facing simiar issues with this combination?
Can you please assist

Kafka connector to hdfs: java.io.FileNotFoundException: File does not exist

Everything was installed via ambari, HDP. I've ingested a sample file to kafka. The topic is testjson. Data ingested from csv file in filebeat.
topics successfully ingested into kafka.
/bin/kafka-topics.sh --list --zookeeper localhost:2181
result:
test
test060920
test1
test12
testjson
From kafka i would like to ingest testjson to hdfs.
quickstart-hdfs.properties
name=hdfs-sink
connector.class=io.confluent.connect.hdfs3.Hdfs3SinkConnector
tasks.max=1
topics=testjson
hdfs.url=hdfs://x.x.x.x:8020
flush.size=3
confluent.license=
confluent.topic.bootstrap.servers=x.x.x.x:6667
connect-standalone.properties
bootstrap.servers=x.x.x.x:6667
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors,
plugin.path=/usr/share/java,/usr/share/confluent-hub-components
run it by
/usr/hdp/3.1.4.0-315/kafka/bin/connect-standalone.sh /etc/kafka/connect-standalone-json.properties /etc/kafka-connect-hdfs/quickstart-hdfs.properties
It returns errors as
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /user/root/topics/+tmp/testjson/partition=0/26c82453-4980-40de-a9d4-276aa0f3899e_tmp.avro (inode 22757) [Lease. Holder: DFSClient_NONMAPREDUCE_496631619_33, pending creates: 1]
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2815)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:591)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2694)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1511)
at org.apache.hadoop.ipc.Client.call(Client.java:1457)
at org.apache.hadoop.ipc.Client.call(Client.java:1367)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy47.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:510)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy48.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1081)
... 3 more
[2020-06-23 10:35:30,240] ERROR WorkerSinkTask{id=hdfs-sink-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:177)
org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:586)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:322)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:225)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:193)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.avro.AvroRuntimeException: already open
at org.apache.avro.file.DataFileWriter.assertNotOpen(DataFileWriter.java:85)
at org.apache.avro.file.DataFileWriter.setCodec(DataFileWriter.java:93)
at io.confluent.connect.hdfs3.avro.AvroRecordWriterProvider$1.write(AvroRecordWriterProvider.java:59)
at io.confluent.connect.hdfs3.TopicPartitionWriter.writeRecord(TopicPartitionWriter.java:675)
at io.confluent.connect.hdfs3.TopicPartitionWriter.write(TopicPartitionWriter.java:374)
at io.confluent.connect.hdfs3.DataWriter.write(DataWriter.java:359)
at io.confluent.connect.hdfs3.Hdfs3SinkTask.put(Hdfs3SinkTask.java:108)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:564)
... 10 more
[2020-06-23 10:35:30,243] ERROR WorkerSinkTask{id=hdfs-sink-0} Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:178)
questions:
What happened? I noticed Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /user/root/topics/+tmp/testjson/partition=0/26c82453-4980-40de-a9d4-276aa0f3899e_tmp.avro (inode 22757) . But i dont understand what it got to do? I thought it was the destination in hdfs, I Have created root user. sudo -u hdfs hadoop fs -mkdir /user/root and add the permission for root sudo -u hdfs hadoop fs -chown root /user/root
Edit
I checked out /user/root/topics/+tmp/testjson/partition=0 and 26c82453-4980-40de-a9d4-276aa0f3899e_tmp.avrowas there. I dont know what should i do. Each time i rerun this, t always show missing different file, and after i stopped it then check it, The file is generated and it was there.
Edit 2
I checked out
root#ambari:/home/hadoop# hdfs dfs -ls /user/root/topics/+tmp/testjson
Found 1 items
drwxr-xr-x - root hdfs 0 2020-06-24 06:18 /user/root/topics/+tmp/testjson/partition=0
root#ambari:/home/hadoop# hdfs dfs -ls /user/root/topics/+tmp/testjson/partition=0
Found 2 items
-rw-r--r-- 3 root hdfs 0 2020-06-24 06:18 /user/root/topics/+tmp/testjson/partition=0/05cc9305-c370-44f8-8e9d-b311fb284e26_tmp.avro
-rw-r--r-- 3 root hdfs 0 2020-06-24 03:26 /user/root/topics/+tmp/testjson/partition=0/26c82453-4980-40de-a9d4-276aa0f3899e_tmp.avro
_tmp.avro files were generated everytime i run. /usr/hdp/3.1.4.0-315/kafka/bin/connect-standalone.sh connect-standalone-json.properties quickstart-hdfs.properties. Incase of permission, i decided to change dfs.permissions.enabled into false (default is true) and rerun. Still no luck.
EDIT3
new error log
[2020-06-26 09:57:28,703] WARN The configuration 'config.storage.topic' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,704] WARN The configuration 'status.storage.topic' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,705] WARN The configuration 'config.storage.replication.factor' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,706] WARN The configuration 'offset.flush.interval.ms' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,706] WARN The configuration 'key.converter.schemas.enable' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,706] WARN The configuration 'offset.storage.file.filename' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,706] WARN The configuration 'status.storage.replication.factor' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,706] WARN The configuration 'value.converter.schemas.enable' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,706] WARN The configuration 'offset.storage.replication.factor' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,706] WARN The configuration 'offset.storage.topic' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,706] WARN The configuration 'value.converter' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,707] WARN The configuration 'key.converter' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2020-06-26 09:57:28,707] INFO Kafka version : 2.0.0.3.1.4.0-315 (org.apache.kafka.common.utils.AppInfoParser:109)
[2020-06-26 09:57:28,707] INFO Kafka commitId : 4243d589e2b33433 (org.apache.kafka.common.utils.AppInfoParser:110)
[2020-06-26 09:57:28,712] INFO Cluster ID: CQkRgktxRZmGv_C-Q87ViQ (org.apache.kafka.clients.Metadata:273)
[2020-06-26 09:57:28,733] INFO [Consumer clientId=consumer-3, groupId=connect-cluster] Discovered group coordinator 10.64.2.236:6667 (id: 2147482646 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:677)
[2020-06-26 09:57:28,742] INFO [Consumer clientId=consumer-3, groupId=connect-cluster] Resetting offset for partition connect-configs-0 to offset 0. (org.apache.kafka.clients.consumer.internals.Fetcher:583)
[2020-06-26 09:57:28,745] INFO Finished reading KafkaBasedLog for topic connect-configs (org.apache.kafka.connect.util.KafkaBasedLog:153)
[2020-06-26 09:57:28,745] INFO Started KafkaBasedLog for topic connect-configs (org.apache.kafka.connect.util.KafkaBasedLog:155)
[2020-06-26 09:57:28,745] INFO Started KafkaConfigBackingStore (org.apache.kafka.connect.storage.KafkaConfigBackingStore:253)
[2020-06-26 09:57:28,745] INFO Herder started (org.apache.kafka.connect.runtime.distributed.DistributedHerder:217)
[2020-06-26 09:57:28,753] INFO Cluster ID: CQkRgktxRZmGv_C-Q87ViQ (org.apache.kafka.clients.Metadata:273)
[2020-06-26 09:57:28,753] INFO [Worker clientId=connect-1, groupId=connect-cluster] Discovered group coordinator 10.64.2.236:6667 (id: 2147482646 rack: null) (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:677)
[2020-06-26 09:57:28,762] INFO [Worker clientId=connect-1, groupId=connect-cluster] (Re-)joining group (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:509)
Jun 26, 2020 9:57:29 AM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.RootResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.RootResource will be ignored.
Jun 26, 2020 9:57:29 AM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource will be ignored.
Jun 26, 2020 9:57:29 AM org.glassfish.jersey.internal.inject.Providers checkProviderRuntime
WARNING: A provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource registered in SERVER runtime does not implement any provider interfaces applicable in the SERVER runtime. Due to constraint configuration problems the provider org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource will be ignored.
Jun 26, 2020 9:57:29 AM org.glassfish.jersey.internal.Errors logErrors
WARNING: The following warnings have been detected: WARNING: The (sub)resource method createConnector in org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource contains empty path annotation.
WARNING: The (sub)resource method listConnectors in org.apache.kafka.connect.runtime.rest.resources.ConnectorsResource contains empty path annotation.
WARNING: The (sub)resource method listConnectorPlugins in org.apache.kafka.connect.runtime.rest.resources.ConnectorPluginsResource contains empty path annotation.
WARNING: The (sub)resource method serverInfo in org.apache.kafka.connect.runtime.rest.resources.RootResource contains empty path annotation.
[2020-06-26 09:57:29,315] INFO Started o.e.j.s.ServletContextHandler#3d6a6bee{/,null,AVAILABLE} (org.eclipse.jetty.server.handler.ContextHandler:855)
[2020-06-26 09:57:29,328] INFO Started http_8083#376a312c{HTTP/1.1,[http/1.1]}{0.0.0.0:8083} (org.eclipse.jetty.server.AbstractConnector:292)
[2020-06-26 09:57:29,329] INFO Started #5042ms (org.eclipse.jetty.server.Server:410)
[2020-06-26 09:57:29,329] INFO Advertised URI: http://10.64.2.236:8083/ (org.apache.kafka.connect.runtime.rest.RestServer:267)
[2020-06-26 09:57:29,329] INFO REST server listening at http://10.64.2.236:8083/, advertising URL http://10.64.2.236:8083/ (org.apache.kafka.connect.runtime.rest.RestServer:217)
[2020-06-26 09:57:29,329] INFO Kafka Connect started (org.apache.kafka.connect.runtime.Connect:55)
[2020-06-26 09:57:31,782] INFO [Worker clientId=connect-1, groupId=connect-cluster] Successfully joined group with generation 1 (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:473)
[2020-06-26 09:57:31,783] INFO Joined group and got assignment: Assignment{error=0, leader='connect-1-c3e95584-0853-44d1-b3a9-4c97593f4f2d', leaderUrl='http://10.64.2.236:8083/', offset=-1, connectorIds=[], taskIds=[]} (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1216)
[2020-06-26 09:57:31,784] INFO Starting connectors and tasks using config offset -1 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:858)
[2020-06-26 09:57:31,784] INFO Finished starting connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:868)
as suggested i run it by /usr/hdp/3.1.4.0-315/kafka/bin/connect-distributed.sh connect-distributed.properties quickstart-hdfs.properties
cconnect-distributed.properties
bootstrap.servers=10.64.2.236:6667
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.topic=connect-offsets
offset.storage.replication.factor=1
offset.storage.file.filename=/tmp/connect.offsets
config.storage.topic=connect-configs
config.storage.replication.factor=1
status.storage.topic=connect-status
status.storage.replication.factor=1
offset.flush.interval.ms=10000
and i didnt generate anything in the tmp file root#ambari:/test_stream# hdfs dfs -ls /user/root/topics/+tmp
Any suggestion and respond will be appreciated so much. Thank you.
There is currently an open issue on this problem - https://github.com/confluentinc/kafka-connect-hdfs/issues/298. As a workaround you can change flush.size to 1. Or switch to another output file format because the problem stems rather from Avro library (org.apache.avro.AvroRuntimeException: already open).

Connection Refused when creating a two nodes cluster in Apache NiFi

I'm facing a problem regarding a refused connection on the cluster node protocol port.
I'm using the following configs to create the two nodes cluster:
For the First node manager :
####################
# State Management #
####################
nifi.state.management.configuration.file=./conf/state-management.xml
# The ID of the local state provider
nifi.state.management.provider.local=local-provider
# The ID of the cluster-wide state provider. This will be ignored if NiFi is not clustered but must be populated if running in a cluster.
nifi.state.management.provider.cluster=zk-provider
# Specifies whether or not this instance of NiFi should run an embedded ZooKeeper server
nifi.state.management.embedded.zookeeper.start=true
# Properties file that provides the ZooKeeper properties to use if <nifi.state.management.embedded.zookeeper.start> is set to true
nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties
# web properties #
nifi.web.war.directory=./lib
nifi.web.http.host=10.129.140.22
nifi.web.http.port=3000
nifi.web.http.network.interface.default=
nifi.web.https.host=
nifi.web.https.port=
nifi.web.https.network.interface.default=
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200
nifi.web.max.header.size=16 KB
nifi.web.proxy.context.path=
nifi.web.proxy.host=
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=
nifi.cluster.node.protocol.port=10000
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=5 mins
nifi.cluster.flow.election.max.candidates=
# cluster load balancing properties #
nifi.cluster.load.balance.host=
nifi.cluster.load.balance.port=6342
nifi.cluster.load.balance.connections.per.node=4
nifi.cluster.load.balance.max.thread.count=8
nifi.cluster.load.balance.comms.timeout=30 sec
# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=localhost:2181
nifi.zookeeper.connect.timeout=3 secs
nifi.zookeeper.session.timeout=3 secs
nifi.zookeeper.root.node=/nifi
For the second node slave:
####################
# State Management #
####################
nifi.state.management.configuration.file=./conf/state-management.xml
# The ID of the local state provider
nifi.state.management.provider.local=local-provider
# The ID of the cluster-wide state provider. This will be ignored if NiFi is not clustered but must be populated if running in a cluster.
nifi.state.management.provider.cluster=zk-provider
# Specifies whether or not this instance of NiFi should run an embedded ZooKeeper server
nifi.state.management.embedded.zookeeper.start=false
# Properties file that provides the ZooKeeper properties to use if <nifi.state.management.embedded.zookeeper.start> is set to true
nifi.state.management.embedded.zookeeper.properties=./conf/zookeeper.properties
# web properties #
nifi.web.war.directory=./lib
nifi.web.http.host=
nifi.web.http.port=9021
nifi.web.http.network.interface.default=
nifi.web.https.host=
nifi.web.https.port=
nifi.web.https.network.interface.default=
nifi.web.jetty.working.directory=./work/jetty
nifi.web.jetty.threads=200
nifi.web.max.header.size=16 KB
nifi.web.proxy.context.path=
nifi.web.proxy.host=
# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true
nifi.cluster.node.address=
nifi.cluster.node.protocol.port=10001
nifi.cluster.node.protocol.threads=10
nifi.cluster.node.protocol.max.threads=50
nifi.cluster.node.event.history.size=25
nifi.cluster.node.connection.timeout=5 sec
nifi.cluster.node.read.timeout=5 sec
nifi.cluster.node.max.concurrent.requests=100
nifi.cluster.firewall.file=
nifi.cluster.flow.election.max.wait.time=5 mins
nifi.cluster.flow.election.max.candidates=
# cluster load balancing properties #
nifi.cluster.load.balance.host=10.129.140.22
nifi.cluster.load.balance.port=6343
nifi.cluster.load.balance.connections.per.node=4
nifi.cluster.load.balance.max.thread.count=8
nifi.cluster.load.balance.comms.timeout=30 sec
# zookeeper properties, used for cluster management #
nifi.zookeeper.connect.string=10.129.140.22:2181
nifi.zookeeper.connect.timeout=3 secs
nifi.zookeeper.session.timeout=3 secs
nifi.zookeeper.root.node=/nifi
The logs fils shows the following :
For the slave
2019-05-23 10:37:07,384 INFO [main] o.a.n.c.repository.FileSystemRepository Initializing FileSystemRepository with 'Always Sync' set to false
2019-05-23 10:37:07,541 INFO [main] o.apache.nifi.controller.FlowController Not enabling RAW Socket Site-to-Site functionality because nifi.remote.input.socket.port is not set
2019-05-23 10:37:07,546 INFO [main] o.apache.nifi.controller.FlowController Checking if there is already a Cluster Coordinator Elected...
2019-05-23 10:37:07,591 INFO [main] o.a.c.f.imps.CuratorFrameworkImpl Starting
2019-05-23 10:37:07,658 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: CONNECTED
2019-05-23 10:37:07,693 INFO [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl backgroundOperationsLoop exiting
2019-05-23 10:37:07,697 INFO [main] o.apache.nifi.controller.FlowController The Election for Cluster Coordinator has already begun (Leader is localhost:10000). Will not register to be elected for this role until after connecting to the cluster and inheriting the cluster's flow.
2019-05-23 10:37:07,699 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=true] Registered new Leader Selector for role Cluster Coordinator; this node is a silent observer in the election.
2019-05-23 10:37:07,699 INFO [main] o.a.c.f.imps.CuratorFrameworkImpl Starting
2019-05-23 10:37:07,703 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] Registered new Leader Selector for role Cluster Coordinator; this node is a silent observer in the election.
2019-05-23 10:37:07,703 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] started
2019-05-23 10:37:07,703 INFO [main] o.a.n.c.c.h.AbstractHeartbeatMonitor Heartbeat Monitor started
2019-05-23 10:37:07,706 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: CONNECTED
2019-05-23 10:37:09,587 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#1a6a4595{nifi-api,/nifi-api,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-api-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-api-1.9.2.war}
2019-05-23 10:37:09,850 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=77ms
2019-05-23 10:37:09,852 INFO [main] o.e.j.s.h.C._nifi_content_viewer No Spring WebApplicationInitializer types detected on classpath
2019-05-23 10:37:09,873 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#4b1b2255{nifi-content-viewer,/nifi-content-viewer,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-content-viewer-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-content-viewer-1.9.2.war}
2019-05-23 10:37:09,895 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=6ms
2019-05-23 10:37:09,896 WARN [main] o.e.j.webapp.StandardDescriptorProcessor Duplicate mapping from / to default
2019-05-23 10:37:09,915 INFO [main] o.e.j.s.h.ContextHandler._nifi_docs No Spring WebApplicationInitializer types detected on classpath
2019-05-23 10:37:09,917 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#4965454c{nifi-docs,/nifi-docs,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-docs-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-docs-1.9.2.war}
2019-05-23 10:37:09,936 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=8ms
2019-05-23 10:37:09,955 INFO [main] o.e.j.server.handler.ContextHandler._ No Spring WebApplicationInitializer types detected on classpath
2019-05-23 10:37:09,957 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#1e4a4ed5{nifi-error,/,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-error-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-error-1.9.2.war}
2019-05-23 10:37:09,967 INFO [main] o.eclipse.jetty.server.AbstractConnector Started ServerConnector#4518bffd{HTTP/1.1,[http/1.1]}{0.0.0.0:9021}
2019-05-23 10:37:09,967 INFO [main] org.eclipse.jetty.server.Server Started #28769ms
2019-05-23 10:37:09,978 INFO [main] org.apache.nifi.web.server.JettyServer Loading Flow...
2019-05-23 10:37:09,982 INFO [main] org.apache.nifi.io.socket.SocketListener Now listening for connections from nodes on port 10001
2019-05-23 10:37:10,026 INFO [main] o.apache.nifi.controller.FlowController Successfully synchronized controller with proposed flow
2019-05-23 10:37:10,071 INFO [main] o.a.nifi.controller.StandardFlowService Connecting Node: localhost:9021
2019-05-23 10:37:10,073 INFO [main] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at localhost:10000; will use this address for sending heartbeat messages
2019-05-23 10:37:10,074 WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket to localhost:10000 due to: java.net.ConnectException: Connection refused (Connection refused)
2019-05-23 10:37:12,715 WARN [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Failed to determine which node is elected active Cluster Coordinator: ZooKeeper reports the address as localhost:10000, but there is no node with this address. Attempted to determine the node's information but failed to retrieve its information due to org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket due to: java.net.ConnectException: Connection refused (Connection refused)
2019-05-23 10:37:12,720 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for localhost:9021 -- Received heartbeat from node previously disconnected due to Has Not Yet Connected to Cluster. Issuing reconnection request.
2019-05-23 10:37:12,721 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for localhost:9021 -- Requesting that node connect to cluster
2019-05-23 10:37:12,721 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Status of localhost:9021 changed from NodeConnectionStatus[nodeId=localhost:9021, state=DISCONNECTED, Disconnect Code=Has Not Yet Connected to Cluster, Disconnect Reason=Has Not Yet Connected to Cluster, updateId=1] to NodeConnectionStatus[nodeId=localhost:9021, state=CONNECTING, updateId=3]
2019-05-23 10:37:15,075 INFO [main] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at localhost:10000; will use this address for sending heartbeat messages
2019-05-23 10:37:15,076 WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket to localhost:10000 due to: java.net.ConnectException: Connection refused (Connection refused)
For the manager
2019-05-23 10:36:59,752 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2019-05-23 10:36:59,752 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:java.io.tmpdir=/tmp
2019-05-23 10:36:59,752 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:java.compiler=<NA>
2019-05-23 10:36:59,752 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:os.name=Linux
2019-05-23 10:36:59,752 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:os.arch=amd64
2019-05-23 10:36:59,753 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:os.version=4.15.0-20-generic
2019-05-23 10:36:59,753 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:user.name=root
2019-05-23 10:36:59,753 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:user.home=/root
2019-05-23 10:36:59,753 INFO [main] o.a.zookeeper.server.ZooKeeperServer Server environment:user.dir=/home/superman/nifi-1.9.2
2019-05-23 10:36:59,753 INFO [main] o.a.zookeeper.server.ZooKeeperServer tickTime set to 2000
2019-05-23 10:36:59,754 INFO [main] o.a.zookeeper.server.ZooKeeperServer minSessionTimeout set to -1
2019-05-23 10:36:59,754 INFO [main] o.a.zookeeper.server.ZooKeeperServer maxSessionTimeout set to -1
2019-05-23 10:36:59,855 INFO [main] o.apache.nifi.controller.FlowController Checking if there is already a Cluster Coordinator Elected...
2019-05-23 10:36:59,903 INFO [main] o.a.c.f.imps.CuratorFrameworkImpl Starting
2019-05-23 10:36:59,950 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] o.a.zookeeper.server.ZooKeeperServer Client attempting to establish new session at /127.0.0.1:40388
2019-05-23 10:36:59,950 INFO [SyncThread:0] o.a.z.server.persistence.FileTxnLog Creating new log file: log.3c
2019-05-23 10:36:59,963 INFO [SyncThread:0] o.a.zookeeper.server.ZooKeeperServer Established session 0x16ae443f4130000 with negotiated timeout 4000 for client /127.0.0.1:40388
2019-05-23 10:36:59,975 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: CONNECTED
2019-05-23 10:36:59,998 INFO [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl backgroundOperationsLoop exiting
2019-05-23 10:37:00,003 INFO [main] o.apache.nifi.controller.FlowController The Election for Cluster Coordinator has already begun (Leader is localhost:10001). Will not register to be elected for this role until after connecting to the cluster and inheriting the cluster's flow.
2019-05-23 10:37:00,005 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=true] Registered new Leader Selector for role Cluster Coordinator; this node is a silent observer in the election.
2019-05-23 10:37:00,005 INFO [main] o.a.c.f.imps.CuratorFrameworkImpl Starting
2019-05-23 10:37:00,017 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] Registered new Leader Selector for role Cluster Coordinator; this node is a silent observer in the election.
2019-05-23 10:37:00,017 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] started
2019-05-23 10:37:00,017 INFO [main] o.a.n.c.c.h.AbstractHeartbeatMonitor Heartbeat Monitor started
2019-05-23 10:37:00,019 INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181] o.a.zookeeper.server.ZooKeeperServer Client attempting to establish new session at /127.0.0.1:40390
2019-05-23 10:37:00,020 INFO [SyncThread:0] o.a.zookeeper.server.ZooKeeperServer Established session 0x16ae443f4130001 with negotiated timeout 4000 for client /127.0.0.1:40390
2019-05-23 10:37:00,020 INFO [main-EventThread] o.a.c.f.state.ConnectionStateManager State change: CONNECTED
2019-05-23 10:37:02,022 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#1a05ff8e{nifi-api,/nifi-api,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-api-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-api-1.9.2.war}
2019-05-23 10:37:02,373 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=165ms
2019-05-23 10:37:02,375 INFO [main] o.e.j.s.h.C._nifi_content_viewer No Spring WebApplicationInitializer types detected on classpath
2019-05-23 10:37:02,401 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#251e2f4a{nifi-content-viewer,/nifi-content-viewer,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-content-viewer-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-content-viewer-1.9.2.war}
2019-05-23 10:37:02,419 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=6ms
2019-05-23 10:37:02,420 WARN [main] o.e.j.webapp.StandardDescriptorProcessor Duplicate mapping from / to default
2019-05-23 10:37:02,421 INFO [main] o.e.j.s.h.ContextHandler._nifi_docs No Spring WebApplicationInitializer types detected on classpath
2019-05-23 10:37:02,441 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#1abea1ed{nifi-docs,/nifi-docs,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-docs-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-docs-1.9.2.war}
2019-05-23 10:37:02,457 INFO [main] o.e.j.a.AnnotationConfiguration Scanning elapsed time=6ms
2019-05-23 10:37:02,475 INFO [main] o.e.j.server.handler.ContextHandler._ No Spring WebApplicationInitializer types detected on classpath
2019-05-23 10:37:02,478 INFO [main] o.e.jetty.server.handler.ContextHandler Started o.e.j.w.WebAppContext#6f5288c5{nifi-error,/,file:///home/superman/nifi-1.9.2/work/jetty/nifi-web-error-1.9.2.war/webapp/,AVAILABLE}{./work/nar/framework/nifi-framework-nar-1.9.2.nar-unpacked/NAR-INF/bundled-dependencies/nifi-web-error-1.9.2.war}
2019-05-23 10:37:02,488 INFO [main] o.eclipse.jetty.server.AbstractConnector Started ServerConnector#167ed1cf{HTTP/1.1,[http/1.1]}{10.129.140.22:3000}
2019-05-23 10:37:02,488 INFO [main] org.eclipse.jetty.server.Server Started #26145ms
2019-05-23 10:37:02,500 INFO [main] org.apache.nifi.web.server.JettyServer Loading Flow...
2019-05-23 10:37:02,503 INFO [main] org.apache.nifi.io.socket.SocketListener Now listening for connections from nodes on port 10000
2019-05-23 10:37:02,545 INFO [main] o.apache.nifi.controller.FlowController Successfully synchronized controller with proposed flow
2019-05-23 10:37:02,587 INFO [main] o.a.nifi.controller.StandardFlowService Connecting Node: 10.129.140.22:3000
2019-05-23 10:37:02,589 INFO [main] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at localhost:10001; will use this address for sending heartbeat messages
2019-05-23 10:37:02,590 WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket to localhost:10001 due to: java.net.ConnectException: Connection refused (Connection refused)
2019-05-23 10:37:04,001 INFO [SessionTracker] o.a.zookeeper.server.ZooKeeperServer Expiring session 0x16ae42f180d0003, timeout of 4000ms exceeded
2019-05-23 10:37:04,001 INFO [SessionTracker] o.a.zookeeper.server.ZooKeeperServer Expiring session 0x16ae42f180d0002, timeout of 4000ms exceeded
2019-05-23 10:37:05,026 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Received heartbeat from node previously disconnected due to Has Not Yet Connected to Cluster. Issuing reconnection request.
2019-05-23 10:37:05,028 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Requesting that node connect to cluster
2019-05-23 10:37:05,028 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=DISCONNECTED, Disconnect Code=Has Not Yet Connected to Cluster, Disconnect Reason=Has Not Yet Connected to Cluster, updateId=0] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=5]
2019-05-23 10:37:07,591 WARN [main] o.a.nifi.controller.StandardFlowService There is currently no Cluster Coordinator. This often happens upon restart of NiFi when running an embedded ZooKeeper. Will register this node to become the active Cluster Coordinator and will attempt to connect to cluster again
2019-05-23 10:37:07,594 INFO [main] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] Registered new Leader Selector for role Cluster Coordinator; this node is an active participant in the election.
2019-05-23 10:37:07,612 INFO [Leader Election Notification Thread-1] o.a.n.c.l.e.CuratorLeaderElectionManager org.apache.nifi.controller.leader.election.CuratorLeaderElectionManager$ElectionListener#1d6dcdcb This node has been elected Leader for Role 'Cluster Coordinator'
2019-05-23 10:37:07,612 INFO [Leader Election Notification Thread-1] o.apache.nifi.controller.FlowController This node elected Active Cluster Coordinator
2019-05-23 10:37:07,668 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Received heartbeat from node previously disconnected due to Has Not Yet Connected to Cluster. Issuing reconnection request.
2019-05-23 10:37:07,668 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Requesting that node connect to cluster
2019-05-23 10:37:07,669 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=DISCONNECTED, Disconnect Code=Has Not Yet Connected to Cluster, Disconnect Reason=Has Not Yet Connected to Cluster, updateId=1] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=6]
2019-05-23 10:37:07,675 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Received heartbeat from node previously disconnected due to Has Not Yet Connected to Cluster. Issuing reconnection request.
2019-05-23 10:37:07,675 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Requesting that node connect to cluster
2019-05-23 10:37:07,675 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=DISCONNECTED, Disconnect Code=Has Not Yet Connected to Cluster, Disconnect Reason=Has Not Yet Connected to Cluster, updateId=2] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=7]
2019-05-23 10:37:07,694 INFO [Process Cluster Protocol Request-1] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=5] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=5]
2019-05-23 10:37:07,695 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Received heartbeat from node previously disconnected due to Has Not Yet Connected to Cluster. Issuing reconnection request.
2019-05-23 10:37:07,699 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Event Reported for 10.129.140.22:3000 -- Requesting that node connect to cluster
2019-05-23 10:37:07,700 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=DISCONNECTED, Disconnect Code=Has Not Yet Connected to Cluster, Disconnect Reason=Has Not Yet Connected to Cluster, updateId=3] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=8]
2019-05-23 10:37:07,701 INFO [Process Cluster Protocol Request-5] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=7] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=7]
2019-05-23 10:37:07,702 INFO [Process Cluster Protocol Request-1] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 19834836-9bda-41b3-8fef-4a288d90c7bf (type=NODE_STATUS_CHANGE, length=1103 bytes) from localhost.localdomain in 33 millis
2019-05-23 10:37:07,702 INFO [Process Cluster Protocol Request-5] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 85b0bb3f-c2a6-4dfd-abd6-e9df14710c4d (type=NODE_STATUS_CHANGE, length=1103 bytes) from localhost.localdomain in 10 millis
2019-05-23 10:37:07,703 INFO [Process Cluster Protocol Request-3] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=6] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=6]
2019-05-23 10:37:07,705 INFO [Process Cluster Protocol Request-3] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 80447901-4ad3-44e3-91ad-d9f075624eae (type=NODE_STATUS_CHANGE, length=1103 bytes) from localhost.localdomain in 31 millis
2019-05-23 10:37:07,706 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Processing reconnection request from cluster coordinator.
2019-05-23 10:37:07,706 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Received a Reconnection Request that contained no DataFlow. Will attempt to connect to cluster using local flow.
2019-05-23 10:37:07,707 INFO [Process Cluster Protocol Request-2] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 22cdceee-c01f-445f-a091-38812e878d10 (type=RECONNECTION_REQUEST, length=3095 bytes) from 10.129.140.22:3000 in 34 millis
2019-05-23 10:37:07,708 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Processing reconnection request from cluster coordinator.
2019-05-23 10:37:07,708 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Received a Reconnection Request that contained no DataFlow. Will attempt to connect to cluster using local flow.
2019-05-23 10:37:07,709 INFO [Process Cluster Protocol Request-4] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 8605cf39-2034-4ee2-92c4-0fbe54e97fb2 (type=RECONNECTION_REQUEST, length=3013 bytes) from 10.129.140.22:3000 in 27 millis
2019-05-23 10:37:07,712 INFO [Reconnect 10.129.140.22:3000] o.a.n.c.c.node.NodeClusterCoordinator Successfully requested that 10.129.140.22:3000 join the cluster
2019-05-23 10:37:07,712 INFO [Reconnect 10.129.140.22:3000] o.a.n.c.c.node.NodeClusterCoordinator Successfully requested that 10.129.140.22:3000 join the cluster
2019-05-23 10:37:07,725 INFO [Process Cluster Protocol Request-6] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 0ca55348-44eb-416b-91dd-3d80da4c5ebe (type=RECONNECTION_REQUEST, length=3013 bytes) from 10.129.140.22:3000 in 29 millis
2019-05-23 10:37:07,725 INFO [Reconnect 10.129.140.22:3000] o.a.n.c.c.node.NodeClusterCoordinator Successfully requested that 10.129.140.22:3000 join the cluster
2019-05-23 10:37:07,728 INFO [Process Cluster Protocol Request-7] o.a.n.c.c.node.NodeClusterCoordinator Status of 10.129.140.22:3000 changed from NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=8] to NodeConnectionStatus[nodeId=10.129.140.22:3000, state=CONNECTING, updateId=8]
2019-05-23 10:37:07,728 INFO [Process Cluster Protocol Request-7] o.a.n.c.p.impl.SocketProtocolListener Finished processing request c9b647d7-67ac-4d0a-833b-8a0a8cc0ba6d (type=NODE_STATUS_CHANGE, length=1103 bytes) from localhost.localdomain in 3 millis
2019-05-23 10:37:07,728 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Connecting Node: 10.129.140.22:3000
2019-05-23 10:37:07,725 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Processing reconnection request from cluster coordinator.
2019-05-23 10:37:07,732 INFO [Heartbeat Monitor Thread-1] o.a.n.c.c.h.AbstractHeartbeatMonitor Finished processing 4 heartbeats in 2 seconds, 708 millis
2019-05-23 10:37:07,732 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Received a Reconnection Request that contained no DataFlow. Will attempt to connect to cluster using local flow.
2019-05-23 10:37:07,733 INFO [Reconnect to Cluster] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at localhost:10000; will use this address for sending heartbeat messages
2019-05-23 10:37:07,734 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Connecting Node: 10.129.140.22:3000
2019-05-23 10:37:07,735 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Connecting Node: 10.129.140.22:3000
2019-05-23 10:37:07,736 INFO [Reconnect to Cluster] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at localhost:10000; will use this address for sending heartbeat messages
2019-05-23 10:37:07,736 INFO [Reconnect to Cluster] o.a.n.c.c.n.LeaderElectionNodeProtocolSender Determined that Cluster Coordinator is located at localhost:10000; will use this address for sending heartbeat messages
2019-05-23 10:37:07,748 INFO [Process Cluster Protocol Request-8] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 434daf63-1beb-4b82-9290-bb0da4e89b7f (type=RECONNECTION_REQUEST, length=2972 bytes) from 10.129.140.22:3000 in 16 millis
2019-05-23 10:37:07,749 INFO [Reconnect 10.129.140.22:3000] o.a.n.c.c.node.NodeClusterCoordinator Successfully requested that 10.129.140.22:3000 join the cluster**
Set these properties on both:
nifi.web.http.host=<host>
nifi.cluster.node.address=<host>
Beware of this value visibility in network scopes:
nifi.zookeeper.connect.string=localhost:2181
e.g 'localhost', in the other side you're using real IP addr
They share it during replication, primary/coordinator node election and flow election.

Unable to get NiFi to work in cluster

Based on all the examples I've read I thought setting up NiFi as a cluster would be easy. Apparently I can't get it to work. I'm using NiFi 1.5. I only have 2 host and pretended that there was a third, but NiFi is not starting as a cluster. These are the changes I've made to the config files.
state-management.xml file
<cluster-provider>
<id>zk-provider</id>
<class>org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider</class>
<property name="Connect String">etl-1:2181,etl-2:2181,etl-3:2181</property>
<property name="Root Node">/ssd/nifi-1.5.0</property>
<property name="Session Timeout">10 seconds</property>
<property name="Access Control">Open</property>
</cluster-provider>
zookeeper.properities file
dataDir=./state/zookeeper
server.1=etl-1:2888:3888
server.2=etl-2:2888:3888
server.3=etl-3:2888:3888
nifi.properties file
nifi.zookeeper.connect.string=etl-1:2181,etl-2:2181,etl-3:2181
nifi.zookeeper.root.node=/ssd/nifi-1.5.0
nifi.cluster.is.node=yes
nifi.cluster.node.address=etl-2 (this is set to etl-1 on the other node)
nifi.state.management.embedded.zookeeper.start=true
[nifi#etl-2 zookeeper]$ cat /ssd/nifi-1.5.0/state/zookeeper/myid
2
[nifi#etl-1 logs]$ cat /ssd/nifi-1.5.0/state/zookeeper/myid
1
I've updated logback.xml to DEBUG, but there are so many message I can't seem to find out's what wrong. My best guess is that zookeeper in starting in local instead of cluster.
ls -l /ssd/nifi-1.5.0/state/
total 4
drwxrwxr-x. 18 nifi nifi 4096 Feb 23 18:49 local
drwxrwxr-x. 2 nifi nifi 18 Feb 23 15:26 zookeeper
I found the problem. I had nifi.cluster.is.node=yes instead of nifi.cluster.is.node=true.
Am trying to run Nifi as a 3 node cluster on Windows 10, with same properties set in all 3 conf files as mentioned above, but still its failing to start nifi as a 3 node cluster.
Below are the settings done on my files
state-management.xml file
<cluster-provider>
<id>zk-provider</id>
<class>org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider</class>
<property name="Connect String">node1:2181,node2:2181,node3:2181</property>
<property name="Root Node">/nifi</property>
<property name="Session Timeout">10 seconds</property>
<property name="Access Control">Open</property>
</cluster-provider>
zookeper properties
dataDir=./state/zookeeper
autopurge.snapRetainCount=30
server.1=node1:2888:3888
server.2=node2:2888:3888
server.3=node3:2888:3888
nifi properties:
nifi.web.http.host=node1
nifi.web.http.port=8080
nifi.cluster.is.node=true
nifi.cluster.node.address=node1
nifi.cluster.node.protocol.port=11122
nifi.zookeeper.connect.string=node1:2181,node2:2181,node3:2181
nifi.zookeeper.root.node=/nifi
In other 2 noes, the host name are placed respectively
But when i start the nodes, it displays the below error
C:\nifi-1.11.0-bin\Node1\bin>run-nifi.bat
2020-02-11 20:29:20,237 INFO [main] org.apache.nifi.bootstrap.Command Starting Apache NiFi...
2020-02-11 20:29:20,238 INFO [main] org.apache.nifi.bootstrap.Command Working Directory: C:\NIFI-1~1.0-B\Node1
2020-02-11 20:29:20,239 INFO [main] org.apache.nifi.bootstrap.Command Command: C:\Program Files\Java\jdk1.8.0_241\bin\java.exe -classpath C:\NIFI-1~1.0-B\Node1.\conf;C:\NIFI-1~1.0-B\Node1.\lib\javax.servlet-api-3.1.0.jar;C:\NIFI-1~1.0-B\Node1.\lib\jcl-over-slf4j-1.7.26.jar;C:\NIFI-1~1.0-B\Node1.\lib\jetty-schemas-3.1.jar;C:\NIFI-1~1.0-B\Node1.\lib\jul-to-slf4j-1.7.26.jar;C:\NIFI-1~1.0-B\Node1.\lib\log4j-over-slf4j-1.7.26.jar;C:\NIFI-1~1.0-B\Node1.\lib\logback-classic-1.2.3.jar;C:\NIFI-1~1.0-B\Node1.\lib\logback-core-1.2.3.jar;C:\NIFI-1~1.0-B\Node1.\lib\nifi-api-1.11.0.jar;C:\NIFI-1~1.0-B\Node1.\lib\nifi-framework-api-1.11.0.jar;C:\NIFI-1~1.0-B\Node1.\lib\nifi-nar-utils-1.11.0.jar;C:\NIFI-1~1.0-B\Node1.\lib\nifi-properties-1.11.0.jar;C:\NIFI-1~1.0-B\Node1.\lib\nifi-runtime-1.11.0.jar;C:\NIFI-1~1.0-B\Node1.\lib\slf4j-api-1.7.26.jar -Dorg.apache.jasper.compiler.disablejsr199=true -Xmx512m -Xms512m -Djavax.security.auth.useSubjectCredsOnly=true -Djava.security.egd=file:/dev/urandom -Dsun.net.http.allowRestrictedHeaders=true -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true -Djava.protocol.handler.pkgs=sun.net.www.protocol -Dzookeeper.admin.enableServer=false -Dnifi.properties.file.path=C:\NIFI-1~1.0-B\Node1.\conf\nifi.properties -Dnifi.bootstrap.listen.port=53535 -Dapp=NiFi -Dorg.apache.nifi.bootstrap.config.log.dir=C:\NIFI-1~1.0-B\Node1\bin..\logs org.apache.nifi.NiFi
2020-02-11 20:29:20,554 WARN [main] org.apache.nifi.bootstrap.Command Failed to set permissions so that only the owner can read pid file C:\NIFI-1~1.0-B\Node1\bin..\run\nifi.pid; this may allows others to have access to the key needed to communicate with NiFi. Permissions should be changed so that only the owner can read this file
2020-02-11 20:29:20,561 WARN [main] org.apache.nifi.bootstrap.Command Failed to set permissions so that only the owner can read status file C:\NIFI-1~1.0-B\Node1\bin..\run\nifi.status; this may allows others to have access to the key needed to communicate with NiFi. Permissions should be changed so that only the owner can read this file
2020-02-11 20:29:20,573 INFO [main] org.apache.nifi.bootstrap.Command Launched Apache NiFi with Process ID 4284
C:\nifi-1.11.0-bin\Node1\bin>status-nifi.bat
20:31:57.265 [main] DEBUG org.apache.nifi.bootstrap.NotificationServiceManager - Found 0 service elements
20:31:57.271 [main] INFO org.apache.nifi.bootstrap.NotificationServiceManager - Successfully loaded the following 0 services: []
20:31:57.272 [main] INFO org.apache.nifi.bootstrap.RunNiFi - Registered no Notification Services for Notification Type NIFI_STARTED
20:31:57.273 [main] INFO org.apache.nifi.bootstrap.RunNiFi - Registered no Notification Services for Notification Type NIFI_STOPPED
20:31:57.274 [main] INFO org.apache.nifi.bootstrap.RunNiFi - Registered no Notification Services for Notification Type NIFI_DIED
20:31:57.277 [main] DEBUG org.apache.nifi.bootstrap.Command - Status File: C:\NIFI-1~1.0-B\Node1\bin..\run\nifi.status
20:31:57.278 [main] DEBUG org.apache.nifi.bootstrap.Command - Status File: C:\NIFI-1~1.0-B\Node1\bin..\run\nifi.status
20:31:57.279 [main] DEBUG org.apache.nifi.bootstrap.Command - Properties: {pid=4284, port=53536}
20:31:57.280 [main] DEBUG org.apache.nifi.bootstrap.Command - Pinging 53536
20:31:58.343 [main] DEBUG org.apache.nifi.bootstrap.Command - Process with PID 4284 is not running
20:31:58.345 [main] INFO org.apache.nifi.bootstrap.Command - Apache NiFi is not running

Hbase managed zookeeper suddenly trying to connect to localhost instead of zookeeper quorum

I was running some tests with table mappers and reducers on large scale problems. After a certain point my reducers started failing when the job was 80% done. From what I can tell when looking at the syslogs the problem is that one of my zookeepers is attempting to connect to the localhost as opposed to the other zookeepers in the quorum
Oddly it seems to do just fine connecting to the other nodes when mapping is going on, its reducing that it has a problem with. Here are selected portions of the syslog which might be relevant to figuring out whats going on
2014-06-27 09:44:01,599 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=hdev02:5181,hdev01:5181,hdev03:5181 sessionTimeout=10000 watcher=hconnection-0x4aee260b, quorum=hdev02:5181,hdev01:5181,hdev03:5181, baseZNode=/hbase
2014-06-27 09:44:01,612 INFO [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x4aee260b connecting to ZooKeeper ensemble=hdev02:5181,hdev01:5181,hdev03:5181
2014-06-27 09:44:01,614 INFO [main-SendThread(hdev02:5181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server hdev02/172.17.43.36:5181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2014-06-27 09:44:01,615 INFO [main-SendThread(hdev02:5181)] org.apache.zookeeper.ClientCnxn: Socket connection established to hdev02/172.17.43.36:5181, initiating session
2014-06-27 09:44:01,617 INFO [main-SendThread(hdev02:5181)] org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect
2014-06-27 09:44:01,723 WARN [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=hdev02:5181,hdev01:5181,hdev03:5181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
2014-06-27 09:44:01,723 INFO [main] org.apache.hadoop.hbase.util.RetryCounter: Sleeping
***
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 1 on-disk map-outputs
2014-06-27 09:55:12,012 INFO [main] org.apache.hadoop.mapred.Merger: Merging 1 sorted segments
2014-06-27 09:55:12,013 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 33206049 bytes
2014-06-27 09:55:12,208 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merged 1 segments, 33206079 bytes to disk to satisfy reduce memory limit
2014-06-27 09:55:12,209 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging 2 files, 265119413 bytes from disk
2014-06-27 09:55:12,209 INFO [main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
2014-06-27 09:55:12,210 INFO [main] org.apache.hadoop.mapred.Merger: Merging 2 sorted segments
2014-06-27 09:55:12,212 INFO [main] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 265119345 bytes
2014-06-27 09:55:12,279 INFO [main] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=hconnection-0x65afdbbb, quorum=localhost:2181, baseZNode=/hbase
2014-06-27 09:55:12,281 INFO [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x65afdbbb connecting to ZooKeeper ensemble=localhost:2181
2014-06-27 09:55:12,282 INFO [main-SendThread(localhost.localdomain:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost.localdomain/127.0.0.1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2014-06-27 09:55:12,283 WARN [main-SendThread(localhost.localdomain:2181)] org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
2014-06-27 09:55:12,384 WARN [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
2014-06-27 09:55:12,384 INFO [main] org.apache.hadoop.hbase.util.RetryCounter: Sleeping 1000ms before retry #0...
2014-06-27 09:55:13,385 INFO [main-SendThread(localhost.localdomain:2181)] org.apache.zookeeper.ClientCnxn: Opening socket connection to server localhost.localdomain/127.0.0.1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2014-06-27 09:55:13,385 WARN [main-SendThread(localhost.localdomain:2181)] org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing
***
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper, quorum=localhost:2181, exception=org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
2014-06-27 09:55:13,486 ERROR [main] org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 1 attempts
2014-06-27 09:55:13,486 WARN [main] org.apache.hadoop.hbase.zookeeper.ZKUtil: hconnection-0x65afdbbb, quorum=localhost:2181, baseZNode=/hbase Unable to set watcher on znode (/hbase/hbaseid)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
I'm pretty sure its configured correctly, here is the relevant portion of my hbase-site.xml.
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>5181</value>
<description>Property from ZooKeeper's config zoo.cfg.
The port at which the clients will connect.
</description>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>10000</value>
<description></description>
</property>
<property>
<name>hbase.client.retries.number</name>
<value>10</value>
<description></description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hdev01,hdev02,hdev03</value>
<description></description>
</property>
So far as I can tell hdev03 is the only server that has any problem with this. Netstating all relevant ports doesn't show me anything strange.
I've had same problem when running HBase through Spark on Yarn. Everything was fine until suddenly it started to trying to connect to localhost instead of quorum. Setting port and quorum programmatically before HBase call fixed the issue
conf.set("hbase.zookeeper.quorum","my.server")
conf.set("hbase.zookeeper.property.clientPort","5181")
I'm using MapR, and it has "unusual" (5181) zookeeper port
Hard to say what is happening with the information, given. I have found the Hadoop Stack (HBase especially) to be quite hostile to even the slightest bit of misconfiguration in DNS or the hosts file.
As the quorum in your hbase-site.xml looks good, I'd start checking with network/hostname resolution related configurations:
Has the nodename slipped into the localhost entry in /etc/hosts on hdev03?
Is there an entry for the host itself in hdev03s /etc/hosts (there should)?
Has Reverse DNS been correctly configured in case you are using DNS for name resolution instead of the hosts file?
These are just a few pointers in the direction I'd look with this kind of issue. Hope it helps!
Add '--driver-class-path ~/hbase-1.1.2/conf' into spark-submit command, so that the task can find the configured zookeeper servers instead of 127.0.0.1.

Resources