Problem in Flink UI on Mesos cluster with two slave nodes - hadoop

I have four physical nodes with docker installed on each of them. I configured Mesos,Flink,Zookeeper,Hadoop and Marathon on docker of each one. I had already had three nodes,one slave and two masters, that I had run Flink on Marathon and its UI had been run without any problems. After that, I changed the cluster,two masters and two slaves. I added this Json file in Marathon, it was ran, but Flink UI was not shown in both slave nodes. The error is in following.
{
"id": "flink",
"cmd": "/home/flink-1.7.2/bin/mesos-appmaster.sh -Djobmanager.heap.mb=1024 -Djobmanager.rpc.port=6123 -Drest.port=8081 -Dmesos.resourcemanager.tasks.mem=1024 -Dtaskmanager.heap.mb=1024 -Dtaskmanager.numberOfTaskSlots=2 -Dparallelism.default=2 -Dmesos.resourcemanager.tasks.cpus=1",
"cpus": 1.0,
"mem": 1024,
"instances": 2
}
Error:
Service temporarily unavailable due to an ongoing leader election. Please refresh
I cleared Zookeeper contents with this commands:
/home/zookeeper-3.4.14/bin/zkCleanup.sh /var/lib/zookeeper/data/ -n 10
rm -rf /var/lib/zookeeper/data/version-2
rm /var/lib/zookeeper/data/zookeeper_server.pid
Also, I ran this command and delete Flink contents in Zookeeper:
/home/zookeeper-3.4.14/bin/zkCli.sh
delete /flink/default/leader/....
But still one of Flink UI has problem.
I have configured Flink high availability like this:
high-availability: zookeeper
high-availability.storageDir: hdfs:///flink/ha/
high-availability.zookeeper.quorum: 0.0.0.0:2181,10.32.0.3:2181,10.32.0.4:2181,10.32.0.5:2181
fs.hdfs.hadoopconf: /opt/hadoop/etc/hadoop
fs.hdfs.hdfssite: /opt/hadoop/etc/hadoop/hdfs-site.xml
recovery.zookeeper.path.mesos-workers: /mesos-workers
env.java.home: /opt/java
mesos.master: 10.32.0.2:5050,10.32.0.3:5050
Because I used Mesos cluster, I did not change any thing in flink-conf.yaml.
This is part of slave log which has error:
- Remote connection to [null] failed with java.net.ConnectException:
Connection refused: localhost/127.0.0.1:37797
2019-07-03 07:22:42,922 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink#localhost:37797] has failed, address is now gated for [50] ms.
Reason: [Association failed with [akka.tcp://flink#localhost:37797]]
Caused by: [Connection refused: localhost/127.0.0.1:37797]
2019-07-03 07:22:43,003 WARN akka.remote.transport.netty.NettyTransport
- Remote connection to [null] failed with java.net.ConnectException:
Connection refused: localhost/127.0.0.1:37797
2019-07-03 07:22:43,004 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink#localhost:37797]
has failed, address is now gated for [50] ms.
Reason: [Association failed with [akka.tcp://flink#localhost:37797]]
Caused by: [Connection refused: localhost/127.0.0.1:37797]
2019-07-03 07:22:43,072 WARN akka.remote.transport.netty.NettyTransport
- Remote connection to [null] failed with java.net.ConnectException:
Connection refused: localhost/127.0.0.1:37797
2019-07-03 07:22:43,073 WARN akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://flink#localhost:37797]
has failed, address is now gated for [50] ms.
Reason: [Association failed with [akka.tcp://flink#localhost:37797]]
Caused by: [Connection refused: localhost/127.0.0.1:37797]
2019-07-03 07:23:45,891 WARN
org.apache.flink.runtime.webmonitor.retriever.impl.RpcGatewayRetriever
- Error while retrieving the leader gateway. Retrying to connect to
akka.tcp://flink#localhost:37797/user/dispatcher.
This is Zookeeper log for the node that has the error in Flink UI:
2019-07-03 09:43:33,425 [myid:] - INFO [main:QuorumPeerConfig#136] - Reading configuration from: /home/zookeeper-3.4.14/bin/../conf/zoo.cfg
2019-07-03 09:43:33,434 [myid:] - INFO [main:QuorumPeer$QuorumServer#185] - Resolved hostname: 0.0.0.0 to address: /0.0.0.0
2019-07-03 09:43:33,435 [myid:] - INFO [main:QuorumPeer$QuorumServer#185] - Resolved hostname: 10.32.0.3 to address: /10.32.0.3
2019-07-03 09:43:33,435 [myid:] - INFO [main:QuorumPeer$QuorumServer#185] - Resolved hostname: 10.32.0.2 to address: /10.32.0.2
2019-07-03 09:43:33,435 [myid:] - INFO [main:QuorumPeer$QuorumServer#185] - Resolved hostname: 10.32.0.5 to address: /10.32.0.5
2019-07-03 09:43:33,435 [myid:] - WARN [main:QuorumPeerConfig#354] - Non-optimial configuration, consider an odd number of servers.
2019-07-03 09:43:33,436 [myid:] - INFO [main:QuorumPeerConfig#398] - Defaulting to majority quorums
2019-07-03 09:43:33,438 [myid:3] - INFO [main:DatadirCleanupManager#78] - autopurge.snapRetainCount set to 3
2019-07-03 09:43:33,438 [myid:3] - INFO [main:DatadirCleanupManager#79] - autopurge.purgeInterval set to 0
2019-07-03 09:43:33,438 [myid:3] - INFO [main:DatadirCleanupManager#101] - Purge task is not scheduled.
2019-07-03 09:43:33,445 [myid:3] - INFO [main:QuorumPeerMain#130] - Starting quorum peer
2019-07-03 09:43:33,450 [myid:3] - INFO [main:ServerCnxnFactory#117] - Using org.apache.zookeeper.server.NIOServerCnxnFactory as server connection factory
2019-07-03 09:43:33,452 [myid:3] - INFO [main:NIOServerCnxnFactory#89] - binding to port 0.0.0.0/0.0.0.0:2181
2019-07-03 09:43:33,458 [myid:3] - INFO [main:QuorumPeer#1159] - tickTime set to 2000
2019-07-03 09:43:33,458 [myid:3] - INFO [main:QuorumPeer#1205] - initLimit set to 10
2019-07-03 09:43:33,458 [myid:3] - INFO [main:QuorumPeer#1179] - minSessionTimeout set to -1
2019-07-03 09:43:33,459 [myid:3] - INFO [main:QuorumPeer#1190] - maxSessionTimeout set to -1
2019-07-03 09:43:33,464 [myid:3] - INFO [main:QuorumPeer#1470] - QuorumPeer communication is not secured!
2019-07-03 09:43:33,464 [myid:3] - INFO [main:QuorumPeer#1499] - quorum.cnxn.threads.size set to 20
2019-07-03 09:43:33,465 [myid:3] - INFO [main:QuorumPeer#669] - currentEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2019-07-03 09:43:33,519 [myid:3] - INFO [main:QuorumPeer#684] - acceptedEpoch not found! Creating with a reasonable default of 0. This should only happen when you are upgrading your installation
2019-07-03 09:43:33,566 [myid:3] - INFO [ListenerThread:QuorumCnxManager$Listener#736] - My election bind port: /0.0.0.0:3888
2019-07-03 09:43:33,574 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumPeer#910] - LOOKING
2019-07-03 09:43:33,575 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:FastLeaderElection#813] - New election. My id = 3, proposed zxid=0x0
2019-07-03 09:43:33,581 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), LOOKING (n.state), 1 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,581 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), LEADING (n.state), 1 (n.sid), 0x3 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,581 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 3 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,582 [myid:3] - INFO [WorkerSender[myid=3]:QuorumCnxManager#347] - Have smaller server identifier, so dropping the connection: (4, 3)
2019-07-03 09:43:33,583 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), LOOKING (n.state), 3 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,583 [myid:3] - INFO [WorkerSender[myid=3]:QuorumCnxManager#347] - Have smaller server identifier, so dropping the connection: (4, 3)
2019-07-03 09:43:33,583 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), LEADING (n.state), 1 (n.sid), 0x3 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,584 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), LOOKING (n.state), 2 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,585 [myid:3] - INFO [/0.0.0.0:3888:QuorumCnxManager$Listener#743] - Received connection request /10.32.0.5:42182
2019-07-03 09:43:33,585 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), FOLLOWING (n.state), 2 (n.sid), 0x3 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,585 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), FOLLOWING (n.state), 2 (n.sid), 0x3 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,587 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), LOOKING (n.state), 4 (n.sid), 0x2 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,587 [myid:3] - WARN [RecvWorker:4:QuorumCnxManager$RecvWorker#1025] - Connection broken for id 4, my id = 3, error =
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1010)
2019-07-03 09:43:33,589 [myid:3] - WARN [RecvWorker:4:QuorumCnxManager$RecvWorker#1028] - Interrupting SendWorker
2019-07-03 09:43:33,588 [myid:3] - INFO [/0.0.0.0:3888:QuorumCnxManager$Listener#743] - Received connection request /10.32.0.5:42184
2019-07-03 09:43:33,589 [myid:3] - WARN [SendWorker:4:QuorumCnxManager$SendWorker#941] - Interrupted while waiting for message on queue
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:418)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1094)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$700(QuorumCnxManager.java:74)
at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:929)
2019-07-03 09:43:33,589 [myid:3] - WARN [SendWorker:4:QuorumCnxManager$SendWorker#951] - Send worker leaving thread
2019-07-03 09:43:33,590 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), FOLLOWING (n.state), 4 (n.sid), 0x3 (n.peerEpoch) LOOKING (my state)
2019-07-03 09:43:33,590 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumPeer#980] - FOLLOWING
2019-07-03 09:43:33,591 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection#595] - Notification: 1 (message format version), 1 (n.leader), 0x200000004 (n.zxid), 0x5 (n.round), FOLLOWING (n.state), 4 (n.sid), 0x3 (n.peerEpoch) FOLLOWING (my state)
2019-07-03 09:43:33,593 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Learner#86] - TCP NoDelay set to: true
2019-07-03 09:43:33,597 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:zookeeper.version=3.4.14-4c25d480e66aadd371de8bd2fd8da255ac140bcf, built on 03/06/2019 16:18 GMT
2019-07-03 09:43:33,597 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:host.name=629a802d822d
2019-07-03 09:43:33,597 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:java.version=1.8.0_191
2019-07-03 09:43:33,597 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:java.vendor=Oracle Corporation
2019-07-03 09:43:33,597 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:java.class.path=/home/zookeeper-3.4.14/bin/../zookeeper-server/target/classes:/home/zookeeper-3.4.14/bin/../build/classes:/home/zookeeper-3.4.14/bin/../zookeeper-server/target/lib/*.jar:/home/zookeeper-3.4.14/bin/../build/lib/*.jar:/home/zookeeper-3.4.14/bin/../lib/slf4j-log4j12-1.7.25.jar:/home/zookeeper-3.4.14/bin/../lib/slf4j-api-1.7.25.jar:/home/zookeeper-3.4.14/bin/../lib/netty-3.10.6.Final.jar:/home/zookeeper-3.4.14/bin/../lib/log4j-1.2.17.jar:/home/zookeeper-3.4.14/bin/../lib/jline-0.9.94.jar:/home/zookeeper-3.4.14/bin/../lib/audience-annotations-0.5.0.jar:/home/zookeeper-3.4.14/bin/../zookeeper-3.4.14.jar:/home/zookeeper-3.4.14/bin/../zookeeper-server/src/main/resources/lib/*.jar:/home/zookeeper-3.4.14/bin/../conf:
2019-07-03 09:43:33,598 [myid:3] - INFO
[QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:java.io.tmpdir=/tmp
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:java.compiler=<NA>
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:os.name=Linux
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:os.arch=amd64
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:os.version=4.18.0-21-generic
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:user.name=root
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:user.home=/root
2019-07-03 09:43:33,598 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Environment#100] - Server environment:user.dir=/
2019-07-03 09:43:33,599 [myid:3] - INFO
[QuorumPeer[myid=3]/0.0.0.0:2181:ZooKeeperServer#174] - Created server with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir /var/lib/zookeeper/data/version-2 snapdir /var/lib/zookeeper/data/version-2
2019-07-03 09:43:33,600 [myid:3] - INFO
[QuorumPeer[myid=3]/0.0.0.0:2181:Follower#65] - FOLLOWING - LEADER ELECTION TOOK - 25
2019-07-03 09:43:33,601 [myid:3] - INFO
[QuorumPeer[myid=3]/0.0.0.0:2181:QuorumPeer$QuorumServer#185] - Resolved hostname: 10.32.0.2 to address: /10.32.0.2
2019-07-03 09:43:33,637 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:Learner#336] - Getting a snapshot from leader 0x300000000
2019-07-03 09:43:33,644 [myid:3] - INFO [QuorumPeer[myid=3]/0.0.0.0:2181:FileTxnSnapLog#301] - Snapshotting: 0x300000000 to /var/lib/zookeeper/data/version-2/snapshot.300000000
2019-07-03 09:44:24,320 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#222] - Accepted socket connection from /150.20.11.157:55744
2019-07-03 09:44:24,324 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#949] - Client attempting to establish new session at /150.20.11.157:55744
2019-07-03 09:44:24,327 [myid:3] - WARN
[QuorumPeer[myid=3]/0.0.0.0:2181:Follower#119] - Got zxid 0x300000001 expected 0x1
2019-07-03 09:44:24,327 [myid:3] - INFO [SyncThread:3:FileTxnLog#216] - Creating new log file: log.300000001
2019-07-03 09:44:24,384 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694] - Established session 0x300393be5860000 with negotiated timeout 10000 for client /150.20.11.157:55744
2019-07-03 09:44:24,892 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#222] - Accepted socket connection from /150.20.11.157:55746
2019-07-03 09:44:24,892 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#949] - Client attempting to establish new session at /150.20.11.157:55746
2019-07-03 09:44:24,908 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694] - Established session 0x300393be5860001 with negotiated timeout 10000 for client /150.20.11.157:55746
2019-07-03 09:44:26,410 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#222] - Accepted socket connection from /150.20.11.157:55748
2019-07-03 09:44:26,411 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#903] - Connection request from old client /150.20.11.157:55748; will be dropped if server is in r-o mode
2019-07-03 09:44:26,411 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#949] - Client attempting to establish new session at /150.20.11.157:55748
2019-07-03 09:44:26,422 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694] - Established session 0x300393be5860002 with negotiated timeout 10000 for client /150.20.11.157:55748
2019-07-03 09:45:41,553 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1056] - Closed socket connection for client /150.20.11.157:55746 which had sessionid 0x300393be5860001
2019-07-03 09:45:41,567 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1056] - Closed socket connection for client /150.20.11.157:55744 which had sessionid 0x300393be5860000
2019-07-03 09:45:41,597 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#376] - Unable to read additional data from client sessionid 0x300393be5860002, likely client has closed socket
2019-07-03 09:45:41,597 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1056] - Closed socket connection for client /150.20.11.157:55748 which had sessionid 0x300393be5860002
2019-07-03 09:46:20,896 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#222] - Accepted socket connection from /10.32.0.5:45998
2019-07-03 09:46:20,901 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#949] - Client attempting to establish new session at /10.32.0.5:45998
2019-07-03 09:46:20,916 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694] - Established session 0x300393be5860003 with negotiated timeout 40000 for client /10.32.0.5:45998
2019-07-03 09:46:43,827 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#222] - Accepted socket connection from /150.20.11.157:55864
2019-07-03 09:46:43,830 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#949] - Client attempting to establish new session at /150.20.11.157:55864
2019-07-03 09:46:43,856 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694] - Established session 0x300393be5860004 with negotiated timeout 10000 for client /150.20.11.157:55864
2019-07-03 09:46:44,336 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#222] -
Accepted socket connection from /150.20.11.157:55866
2019-07-03 09:46:44,336 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer#949] - Client attempting to establish new session at /150.20.11.157:55866
2019-07-03 09:46:44,348 [myid:3] - INFO [CommitProcessor:3:ZooKeeperServer#694]
- Established session 0x300393be5860005 with negotiated timeout 10000 for client /150.20.11.157:55866
Would you please guide me how to use both Mesos slaves to run Flink platform?
Any help would be really appreciated.

Related

Kafka Consumer stuck in rebalancing state

We are having a Kafka consumer, which all of a sudden(without any activity) went into a rebalancing state and got stuck. This caused the CPU of the k8 pod to shoot and GC time was also nearly 70-80%. The node didn't recover from that state. Upon deleting all the topics, it recovered after almost 4-5 hours.
Kafka version - 2.1.1
No of topics - 520( with 10 partitions each)
Consumer groups - 1
partition assignment strategy - sticky
Attaching some of the info logs here at that time.
2021-10-12 10:41:21
2021-10-12 05:11:21.160 INFO 6 CID: UID: RID: --- [cation_consumer] s.consumer.internals.AbstractCoordinator : [Consumer clientId=consumer-staging_notification_consumer-10, groupId=staging_notification_consumer] Member consumer-staging_notification_consumer-10-8a7eb399-c4e0-4443-b330-bfac42ca89ae sending LeaveGroup request to coordinator kafka5:9092 (id: 2147483642 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2021-10-12 05:10:27.340 INFO 6 CID:4967 UID: RID:17c72e2d517-4d5e --- [ntainer#9-0-C-1] s.consumer.internals.ConsumerCoordinator : [Consumer clientId=consumer-staging_notification_consumer-9, groupId=staging_notification_consumer] Giving away all assigned partitions as lost since generation has been reset,indicating that consumer is no longer part of the group
2021-10-12 10:40:27
2021-10-12 05:10:27.340 INFO 6 CID:4967 UID: RID:17c72e2d517-4d5e --- [ntainer#9-0-C-1] s.consumer.internals.ConsumerCoordinator : [Consumer clientId=consumer-staging_notification_consumer-9, groupId=staging_notification_consumer] Lost previously assigned partitions staging_notification_topic_app_low_mobi_3119-4, staging_notification_topic_app_low_mobi_2023-4, staging_notification_topic_app_low_mobi_5170-9, staging_notification_topic_app_low_mobi_3540-9, staging_notification_topic_app_low_mobi_5722-9,
2021-10-12 10:40:37
2021-10-12 05:10:37.247 INFO 6 CID: UID: RID: --- [cation_consumer] s.consumer.internals.AbstractCoordinator : [Consumer clientId=consumer-staging_notification_consumer-12, groupId=staging_notification_consumer] Attempt to heartbeat failed since group is rebalancing
2021-10-12 10:40:37
2021-10-12 05:10:37.183 INFO 6 CID: UID: RID: --- [cation_consumer] s.consumer.internals.AbstractCoordinator : [Consumer clientId=consumer-staging_notification_consumer-10, groupId=staging_notification_consumer] Attempt to heartbeat failed since group is rebalancing
2021-10-12 10:40:57
2021-10-12 05:10:57.435 INFO 6 CID: UID: RID: --- [cation_consumer] s.consumer.internals.AbstractCoordinator : [Consumer clientId=consumer-staging_notification_consumer-10, groupId=staging_notification_consumer] Attempt to heartbeat failed since group is rebalancing
2021-10-12 10:40:57
2021-10-12 05:10:57.435 INFO 6 CID: UID: RID: --- [cation_consumer] s.consumer.internals.AbstractCoordinator : [Consumer clientId=consumer-staging_notification_consumer-12, groupId=staging_notification_consumer] Attempt to heartbeat failed since group is rebalancing
2021-10-12 10:41:56
2021-10-12 05:11:56.922 INFO 6 CID:4967 UID: RID:17c72e2579c-22e6 --- [ntainer#9-3-C-1] s.consumer.internals.ConsumerCoordinator : [Consumer clientId=consumer-staging_notification_consumer-12, groupId=staging_notification_consumer] Failing OffsetCommit request since the consumer is not part of an active group
2021-10-12 10:41:56
2021-10-12 05:11:56.923 ERROR 6 CID:4967 UID: RID:17c72e2579c-22e6 --- [ntainer#9-3-C-1] essageListenerContainer$ListenerConsumer : Consumer exception
2021-10-12 10:41:56
java.lang.IllegalStateException: This error handler cannot process 'org.apache.kafka.clients.consumer.CommitFailedException's; no record information is available
2021-10-12 10:41:56
at org.springframework.kafka.listener.SeekUtils.seekOrRecover(SeekUtils.java:151)
2021-10-12 10:41:56
at org.springframework.kafka.listener.SeekToCurrentErrorHandler.handle(SeekToCurrentErrorHandler.java:113)
2021-10-12 10:41:56
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.handleConsumerException(KafkaMessageListenerContainer.java:1427)
2021-10-12 10:41:56
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1124)
2021-10-12 10:41:56
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
2021-10-12 10:41:56
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2021-10-12 10:41:56
at java.lang.Thread.run(Thread.java:748)
2021-10-12 10:41:56
Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Offset commit cannot be completed since the consumer is not part of an active group for auto partition assignment; it is likely that the consumer was kicked out of the group.
2021-10-12 10:41:56
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:1134)
2021-10-12 10:41:56
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:999)
2021-10-12 10:41:56
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1504)
2021-10-12 10:41:56
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doCommitSync(KafkaMessageListenerContainer.java:2396)
2021-10-12 10:41:56
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitSync(KafkaMessageListenerContainer.java:2391)
2021-10-12 10:41:56
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitIfNecessary(KafkaMessageListenerContainer.java:2377)
2021-10-12 10:41:56
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.processCommits(KafkaMessageListenerContainer.java:2191)
2021-10-12 10:41:56
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1149)
2021-10-12 10:41:56
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1075)
2021-10-12 10:41:56
... 3 common frames omitted
2021-10-12 10:41:58
2021-10-12 05:11:58.442 INFO 6 CID:4967 UID: RID:17c72e2579c-22e6 --- [ntainer#9-3-C-1] s.consumer.internals.ConsumerCoordinator : [Consumer clientId=consumer-staging_notification_consumer-12, groupId=staging_notification_consumer] Giving away all assigned partitions as lost since generation has been reset,indicating that consumer is no longer part of the group
021-10-12 10:58:36
2021-10-12 05:28:36.039 INFO 6 CID:4412 UID: RID:17c72ebadab-64d7 --- [ntainer#3-1-C-1] s.consumer.internals.AbstractCoordinator : [Consumer clientId=consumer-staging_notification_consumer-17, groupId=staging_notification_consumer] Join group failed with org.apache.kafka.common.errors.RebalanceInProgressException: The group is rebalancing, so a rejoin is needed.
2021-10-12 10:58:36
2021-10-12 05:28:36.044 INFO 6 CID:4412 UID: RID:17c72ebadab-64d7 --- [ntainer#3-1-C-1] s.consumer.internals.AbstractCoordinator : [Consumer clientId=consumer-staging_notification_consumer-17, groupId=staging_notification_consumer] (Re-)joining group
021-10-12 10:58:36
2021-10-12 05:28:36.039 INFO 6 CID:4412 UID: RID:17c72ebadab-64d7 --- [ntainer#3-1-C-1] s.consumer.internals.AbstractCoordinator : [Consumer clientId=consumer-staging_notification_consumer-17, groupId=staging_notification_consumer] Join group failed with org.apache.kafka.common.errors.RebalanceInProgressException: The group is rebalancing, so a rejoin is needed.
2021-10-12 10:58:36
2021-10-12 05:28:36.044 INFO 6 CID:4412 UID: RID:17c72ebadab-64d7 --- [ntainer#3-1-C-1] s.consumer.internals.AbstractCoordinator : [Consumer clientId=consumer-staging_notification_consumer-17, groupId=staging_notification_consumer] (Re-)joining group
2021-10-12 11:05:36
2021-10-12 05:35:36.955 INFO 6 CID:3435 UID: RID:17c72f29913-886c --- [ntainer#7-3-C-1] s.consumer.internals.AbstractCoordinator : [Consumer clientId=consumer-staging_notification_consumer-5, groupId=staging_notification_consumer] Join group failed with org.apache.kafka.common.errors.DisconnectException
This was resolved when I deleted the topics.
Similar behaviour is observed when I increase the number of threads in concurrent listener factory, the consumer fails to come up with similar logs of constant rebalancing
The first log indicates that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms (five minutes default).
When this happens, the consumer client will actively initiate a LeaveGroup request to the coordinator to trigger rebalance.
You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
Of course, a better way is to check the reason for the slow processing of the program and optimize it.

How can I turn off netty client DNS connect retry?

I'm using Netty-httpClient for Spring webClient.
When I connect www.naver.com and set the host like this.
127.0.0.1 www.naver.com
125.209.222.142 www.naver.com
And run httpClient.get() result like this.
DEBUG r.n.r.PooledConnectionProvider - [id:83c8069f] Created a new pooled channel, now: 0 active connections, 0 inactive connections and 0 pending acquire requests.
DEBUG r.n.t.SslProvider - [id:83c8069f] SSL enabled using engine sun.security.ssl.SSLEngineImpl#5412b204 and SNI www.naver.com:443
DEBUG i.n.b.AbstractByteBuf - -Dio.netty.buffer.checkAccessible: true
DEBUG i.n.b.AbstractByteBuf - -Dio.netty.buffer.checkBounds: true
DEBUG i.n.u.ResourceLeakDetectorFactory - Loaded default ResourceLeakDetector: io.netty.util.ResourceLeakDetector#7ec1e7a5
DEBUG r.n.t.TransportConfig - [id:83c8069f] Initialized pipeline DefaultChannelPipeline{(reactor.left.sslHandler = io.netty.handler.ssl.SslHandler), (reactor.left.loggingHandler = reactor.netty.transport.logging.ReactorNettyLoggingHandler), (reactor.left.sslReader = reactor.netty.tcp.SslProvider$SslReadHandler), (reactor.left.httpCodec = io.netty.handler.codec.http.HttpClientCodec), (reactor.right.reactiveBridge = reactor.netty.channel.ChannelOperationsHandler)}
INFO reactor - [id:83c8069f] REGISTERED
DEBUG r.n.t.TransportConnector - [id:83c8069f] Connecting to [www.naver.com/127.0.0.1:443].
INFO reactor - [id:83c8069f] CONNECT: www.naver.com/127.0.0.1:443
INFO reactor - [id:83c8069f] CLOSE
DEBUG r.n.t.TransportConnector - [id:83c8069f] Connect attempt to [www.naver.com/127.0.0.1:443] failed.
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: www.naver.com/127.0.0.1:443
Caused by: java.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:330)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:707)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:655)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:581)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
DEBUG r.n.r.PooledConnectionProvider - [id:0e3c272c] Created a new pooled channel, now: 0 active connections, 0 inactive connections and 0 pending acquire requests.
DEBUG r.n.t.SslProvider - [id:0e3c272c] SSL enabled using engine sun.security.ssl.SSLEngineImpl#47b241b and SNI www.naver.com:443
DEBUG r.n.t.TransportConfig - [id:0e3c272c] Initialized pipeline DefaultChannelPipeline{(reactor.left.sslHandler = io.netty.handler.ssl.SslHandler), (reactor.left.loggingHandler = reactor.netty.transport.logging.ReactorNettyLoggingHandler), (reactor.left.sslReader = reactor.netty.tcp.SslProvider$SslReadHandler), (reactor.left.httpCodec = io.netty.handler.codec.http.HttpClientCodec), (reactor.right.reactiveBridge = reactor.netty.channel.ChannelOperationsHandler)}
INFO reactor - [id:0e3c272c] REGISTERED
INFO reactor - [id:83c8069f] UNREGISTERED
DEBUG r.n.t.TransportConnector - [id:0e3c272c] Connecting to [www.naver.com/125.209.222.142:443].
INFO reactor - [id:0e3c272c] CONNECT: www.naver.com/125.209.222.142:443
DEBUG r.n.r.DefaultPooledConnectionProvider - [id:0e3c272c, L:/192.168.55.77:52910 - R:www.naver.com/125.209.222.142:443] Registering pool release on close event for channel
DEBUG r.n.r.PooledConnectionProvider - [id:0e3c272c, L:/192.168.55.77:52910 - R:www.naver.com/125.209.222.142:443] Channel connected, now: 1 active connections, 0 inactive connections and 0 pending acquire requests.
DEBUG i.n.u.Recycler - -Dio.netty.recycler.maxCapacityPerThread: 4096
DEBUG i.n.u.Recycler - -Dio.netty.recycler.maxSharedCapacityFactor: 2
DEBUG i.n.u.Recycler - -Dio.netty.recycler.linkCapacity: 16
DEBUG i.n.u.Recycler - -Dio.netty.recycler.ratio: 8
DEBUG i.n.u.Recycler - -Dio.netty.recycler.delayedQueue.ratio: 8
INFO reactor - [id:0e3c272c, L:/192.168.55.77:52910 - R:www.naver.com/125.209.222.142:443] ACTIVE
see that. I don't want connect twice and don't spend more time to connect.
just fail if can not make connect.
Can I avoid this situation?
This is managing under the issue.
https://github.com/reactor/reactor-netty/issues/1822

Why so many connections are used by Spring reactive with Mongo

I got the exception 'MongoWaitQueueFullException' and I realize the number of connections that my application is using. I use the default configuration of Spring boot (2.2.7.RELEASE) with reactive MongoDB (4.2.8). Transactions are used.
Even when running an integration test that basically creates a bit more than 200 elements then groups them (200 groups). 10 connections are used. When this algorithm is executed over a real data-set, this exception is thrown. The default limit of the waiting queue (500) was reached. This does not make the application scalable.
My question is: is there a way to design a reactive application that helps to reduce the number of connections?
This is the output of my test. Basically, it scans all translations of bundle files and them group them per translation key. An element is persisted per translation key.
return Flux
.fromIterable(bundleFile.getFiles())
.map(ScannedBundleFileEntry::getLocale)
.flatMap(locale ->
handler
.scanTranslations(bundleFileEntity.toLocation(), locale, context)
.index()
.map(indexedTranslation ->
createTranslation(
workspaceEntity,
bundleFileEntity,
locale.getId(),
indexedTranslation.getT1(), // index
indexedTranslation.getT2().getKey(), // bundle key
indexedTranslation.getT2().getValue() // translation
)
)
.flatMap(bundleKeyTemporaryRepository::save)
)
.thenMany(groupIntoBundleKeys(bundleFileEntity))
.then(bundleKeyTemporaryRepository.deleteByBundleFile(bundleFileEntity.getId()))
.then(Mono.just(bundleFileEntity));
The grouping function:
private Flux<BundleKeyEntity> groupIntoBundleKeys(BundleFileEntity bundleFile) {
return this
.findBundleKeys(bundleFile)
.groupBy(BundleKeyGroupKey::new)
.flatMap(bundleKeyGroup ->
bundleKeyGroup
.collectList()
.map(bundleKeys -> {
final BundleKeyGroupKey key = bundleKeyGroup.key();
final BundleKeyEntity entity = new BundleKeyEntity(key.getWorkspace(), key.getBundleFile(), key.getKey());
bundleKeys.forEach(entity::mergeInto);
return entity;
})
)
.flatMap(bundleKeyEntityRepository::save);
}
The test output:
560 [main] INFO o.s.b.t.c.SpringBootTestContextBootstrapper - Neither #ContextConfiguration nor #ContextHierarchy found for test class [be.sgerard.i18n.controller.TranslationControllerTest], using SpringBootContextLoader
569 [main] INFO o.s.t.c.s.AbstractContextLoader - Could not detect default resource locations for test class [be.sgerard.i18n.controller.TranslationControllerTest]: no resource found for suffixes {-context.xml, Context.groovy}.
870 [main] INFO o.s.b.t.c.SpringBootTestContextBootstrapper - Loaded default TestExecutionListener class names from location [META-INF/spring.factories]: [org.springframework.boot.test.mock.mockito.MockitoTestExecutionListener, org.springframework.boot.test.mock.mockito.ResetMocksTestExecutionListener, org.springframework.boot.test.autoconfigure.restdocs.RestDocsTestExecutionListener, org.springframework.boot.test.autoconfigure.web.client.MockRestServiceServerResetTestExecutionListener, org.springframework.boot.test.autoconfigure.web.servlet.MockMvcPrintOnlyOnFailureTestExecutionListener, org.springframework.boot.test.autoconfigure.web.servlet.WebDriverTestExecutionListener, org.springframework.test.context.web.ServletTestExecutionListener, org.springframework.test.context.support.DirtiesContextBeforeModesTestExecutionListener, org.springframework.test.context.support.DependencyInjectionTestExecutionListener, org.springframework.test.context.support.DirtiesContextTestExecutionListener, org.springframework.test.context.transaction.TransactionalTestExecutionListener, org.springframework.test.context.jdbc.SqlScriptsTestExecutionListener, org.springframework.test.context.event.EventPublishingTestExecutionListener, org.springframework.security.test.context.support.WithSecurityContextTestExecutionListener, org.springframework.security.test.context.support.ReactorContextTestExecutionListener]
897 [main] INFO o.s.b.t.c.SpringBootTestContextBootstrapper - Using TestExecutionListeners: [org.springframework.test.context.support.DirtiesContextBeforeModesTestExecutionListener#4372b9b6, org.springframework.boot.test.mock.mockito.MockitoTestExecutionListener#232a7d73, org.springframework.boot.test.autoconfigure.SpringBootDependencyInjectionTestExecutionListener#4b41e4dd, org.springframework.test.context.support.DirtiesContextTestExecutionListener#22ffa91a, org.springframework.test.context.transaction.TransactionalTestExecutionListener#74960bfa, org.springframework.test.context.jdbc.SqlScriptsTestExecutionListener#42721fe, org.springframework.test.context.event.EventPublishingTestExecutionListener#40844aab, org.springframework.security.test.context.support.WithSecurityContextTestExecutionListener#1f6c9cd8, org.springframework.security.test.context.support.ReactorContextTestExecutionListener#5b619d14, org.springframework.boot.test.mock.mockito.ResetMocksTestExecutionListener#66746f57, org.springframework.boot.test.autoconfigure.restdocs.RestDocsTestExecutionListener#447a020, org.springframework.boot.test.autoconfigure.web.client.MockRestServiceServerResetTestExecutionListener#7f36662c, org.springframework.boot.test.autoconfigure.web.servlet.MockMvcPrintOnlyOnFailureTestExecutionListener#28e8dde3, org.springframework.boot.test.autoconfigure.web.servlet.WebDriverTestExecutionListener#6d23017e]
1551 [background-preinit] INFO o.h.v.i.x.c.ValidationBootstrapParameters - HV000006: Using org.hibernate.validator.HibernateValidator as validation provider.
1677 [main] INFO b.s.i.c.TranslationControllerTest - Starting TranslationControllerTest on sgerard with PID 538 (started by sgerard in /home/sgerard/sandboxes/github-oauth/server)
1678 [main] INFO b.s.i.c.TranslationControllerTest - The following profiles are active: test
3250 [main] INFO o.s.d.r.c.RepositoryConfigurationDelegate - Bootstrapping Spring Data Reactive MongoDB repositories in DEFAULT mode.
3747 [main] INFO o.s.d.r.c.RepositoryConfigurationDelegate - Finished Spring Data repository scanning in 493ms. Found 9 Reactive MongoDB repository interfaces.
5143 [main] INFO o.s.c.s.PostProcessorRegistrationDelegate$BeanPostProcessorChecker - Bean 'org.springframework.security.config.annotation.method.configuration.ReactiveMethodSecurityConfiguration' of type [org.springframework.security.config.annotation.method.configuration.ReactiveMethodSecurityConfiguration] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
5719 [main] INFO org.mongodb.driver.cluster - Cluster created with settings {hosts=[localhost:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
5996 [cluster-ClusterId{value='5f42490f1c60f43aff9d7d46', description='null'}-localhost:27017] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:1, serverValue:4337}] to localhost:27017
6010 [cluster-ClusterId{value='5f42490f1c60f43aff9d7d46', description='null'}-localhost:27017] INFO org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=localhost:27017, type=REPLICA_SET_PRIMARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 2, 8]}, minWireVersion=0, maxWireVersion=8, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=12207332, setName='rs0', canonicalAddress=4802c4aff450:27017, hosts=[4802c4aff450:27017], passives=[], arbiters=[], primary='4802c4aff450:27017', tagSet=TagSet{[]}, electionId=7fffffff0000000000000013, setVersion=1, lastWriteDate=Sun Aug 23 12:46:30 CEST 2020, lastUpdateTimeNanos=384505436362981}
6019 [main] INFO org.mongodb.driver.cluster - Cluster created with settings {hosts=[localhost:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
6040 [cluster-ClusterId{value='5f42490f1c60f43aff9d7d47', description='null'}-localhost:27017] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:2, serverValue:4338}] to localhost:27017
6042 [cluster-ClusterId{value='5f42490f1c60f43aff9d7d47', description='null'}-localhost:27017] INFO org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=localhost:27017, type=REPLICA_SET_PRIMARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 2, 8]}, minWireVersion=0, maxWireVersion=8, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=1727974, setName='rs0', canonicalAddress=4802c4aff450:27017, hosts=[4802c4aff450:27017], passives=[], arbiters=[], primary='4802c4aff450:27017', tagSet=TagSet{[]}, electionId=7fffffff0000000000000013, setVersion=1, lastWriteDate=Sun Aug 23 12:46:30 CEST 2020, lastUpdateTimeNanos=384505468960066}
7102 [nioEventLoopGroup-2-2] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:3, serverValue:4339}] to localhost:27017
11078 [main] INFO o.s.b.a.e.web.EndpointLinksResolver - Exposing 1 endpoint(s) beneath base path ''
11158 [main] INFO o.h.v.i.x.c.ValidationBootstrapParameters - HV000006: Using org.hibernate.validator.HibernateValidator as validation provider.
11720 [main] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:4, serverValue:4340}] to localhost:27017
12084 [main] INFO o.s.s.c.ThreadPoolTaskScheduler - Initializing ExecutorService 'taskScheduler'
12161 [main] INFO b.s.i.c.TranslationControllerTest - Started TranslationControllerTest in 11.157 seconds (JVM running for 13.532)
20381 [nioEventLoopGroup-2-3] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:5, serverValue:4341}] to localhost:27017
20408 [nioEventLoopGroup-2-2] INFO b.s.i.s.w.WorkspaceManagerImpl - Synchronize, there is no workspace for the branch [master], let's create it.
20416 [nioEventLoopGroup-2-3] INFO b.s.i.s.w.WorkspaceManagerImpl - The workspace [master] alias [e3cea374-0d37-4c57-bdbf-8bd14d279c12] has been created.
20421 [nioEventLoopGroup-2-3] INFO b.s.i.s.w.WorkspaceManagerImpl - Initializing workspace [master] alias [e3cea374-0d37-4c57-bdbf-8bd14d279c12].
20525 [nioEventLoopGroup-2-2] INFO b.s.i.s.i18n.TranslationManagerImpl - A bundle file has been found located in [server/src/main/resources/i18n] named [exception] with 2 file(s).
20812 [nioEventLoopGroup-2-4] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:6, serverValue:4342}] to localhost:27017
21167 [nioEventLoopGroup-2-8] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:10, serverValue:4345}] to localhost:27017
21167 [nioEventLoopGroup-2-6] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:8, serverValue:4344}] to localhost:27017
21393 [nioEventLoopGroup-2-5] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:7, serverValue:4343}] to localhost:27017
21398 [nioEventLoopGroup-2-7] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:9, serverValue:4346}] to localhost:27017
21442 [nioEventLoopGroup-2-2] INFO b.s.i.s.i18n.TranslationManagerImpl - A bundle file has been found located in [server/src/main/resources/i18n] named [validation] with 2 file(s).
21503 [nioEventLoopGroup-2-2] INFO b.s.i.s.i18n.TranslationManagerImpl - A bundle file has been found located in [server/src/test/resources/be/sgerard/i18n/service/i18n/file] named [file] with 2 file(s).
21621 [nioEventLoopGroup-2-2] INFO b.s.i.s.i18n.TranslationManagerImpl - A bundle file has been found located in [front/src/main/web/src/assets/i18n] named [i18n] with 2 file(s).
22745 [SpringContextShutdownHook] INFO o.s.s.c.ThreadPoolTaskScheduler - Shutting down ExecutorService 'taskScheduler'
22763 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:4, serverValue:4340}] to localhost:27017 because the pool has been closed.
22766 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:9, serverValue:4346}] to localhost:27017 because the pool has been closed.
22767 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:6, serverValue:4342}] to localhost:27017 because the pool has been closed.
22768 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:8, serverValue:4344}] to localhost:27017 because the pool has been closed.
22768 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:5, serverValue:4341}] to localhost:27017 because the pool has been closed.
22769 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:10, serverValue:4345}] to localhost:27017 because the pool has been closed.
22770 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:7, serverValue:4343}] to localhost:27017 because the pool has been closed.
22776 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:3, serverValue:4339}] to localhost:27017 because the pool has been closed.
Process finished with exit code 0
Spring Reactive is asynchronous. Imagine you have 3 items in your dataset. It opens a connection for the save of the first item. But it won't wait for it to finish and use for the second save. Instead, it opens a second connection as soon as possible. Thus you'll end up overloading all the possible connections in the pool.

Zuul Gateway not forwarding call to Eureka registered Instance

I have spent days on this simple issue , I am giving up and finally posting this issue which I am facing locally. I am trying to set up a microservices flow in my local for my hand itching learning purpose. This is no brainer. I have Eureka , Zuul Gateway , Simple Microservice. When I try to reach to the underlying service with the "url route" its working. But when I try to do serviceId look up its not working. Guys help me fixing it.
Git hub link is Git hub source code link
I have also raised an issue Git hut Issue link
Eureka Screenshot
Zuul Gateway logs
2019-10-06 11:11:24.611 INFO 26980 --- [nio-2020-exec-4] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring DispatcherServlet 'dispatcherServlet'
2019-10-06 11:11:24.611 INFO 26980 --- [nio-2020-exec-4] o.s.web.servlet.DispatcherServlet : Initializing Servlet 'dispatcherServlet'
2019-10-06 11:11:24.633 INFO 26980 --- [nio-2020-exec-4] o.s.web.servlet.DispatcherServlet : Completed initialization in 22 ms
2019-10-06 11:11:25.103 INFO 26980 --- [nio-2020-exec-4] c.netflix.config.ChainedDynamicProperty : Flipping property: CHECKOUT-SERVICE.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
2019-10-06 11:11:25.157 INFO 26980 --- [nio-2020-exec-4] c.n.u.concurrent.ShutdownEnabledTimer : Shutdown hook installed for: NFLoadBalancer-PingTimer-CHECKOUT-SERVICE
2019-10-06 11:11:25.157 INFO 26980 --- [nio-2020-exec-4] c.netflix.loadbalancer.BaseLoadBalancer : Client: CHECKOUT-SERVICE instantiated a LoadBalancer: DynamicServerListLoadBalancer:{NFLoadBalancer:name=CHECKOUT-SERVICE,current list of Servers=[],Load balancer stats=Zone stats: {},Server stats: []}ServerList:null
2019-10-06 11:11:25.167 INFO 26980 --- [nio-2020-exec-4] c.n.l.DynamicServerListLoadBalancer : Using serverListUpdater PollingServerListUpdater
2019-10-06 11:11:25.215 INFO 26980 --- [nio-2020-exec-4] c.netflix.config.ChainedDynamicProperty : Flipping property: CHECKOUT-SERVICE.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
2019-10-06 11:11:25.218 INFO 26980 --- [nio-2020-exec-4] c.n.l.DynamicServerListLoadBalancer : DynamicServerListLoadBalancer for client CHECKOUT-SERVICE initialized: DynamicServerListLoadBalancer:{NFLoadBalancer:name=CHECKOUT-SERVICE,current list of Servers=[192.168.0.6:8098],Load balancer stats=Zone stats: {defaultzone=[Zone:defaultzone; Instance count:1; Active connections count: 0; Circuit breaker tripped count: 0; Active connections per server: 0.0;]
},Server stats: [[Server:192.168.0.6:8098; Zone:defaultZone; Total Requests:0; Successive connection failure:0; Total blackout seconds:0; Last connection made:Wed Dec 31 19:00:00 EST 1969; First connection made: Wed Dec 31 19:00:00 EST 1969; Active Connections:0; total failure count in last (1000) msecs:0; average resp time:0.0; 90 percentile resp time:0.0; 95 percentile resp time:0.0; min resp time:0.0; max resp time:0.0; stddev resp time:0.0]
]}ServerList:org.springframework.cloud.netflix.ribbon.eureka.DomainExtractingServerList#6f7f7ca0
2019-10-06 11:11:26.177 INFO 26980 --- [erListUpdater-0] c.netflix.config.ChainedDynamicProperty : Flipping property: CHECKOUT-SERVICE.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
Never mind guys it was a mistake from my side in resolving the API path

Errors reading data from 1G file on localCluster mode with apache storm

Hi I'm using storm with local cluster mode for developing.
I ran a simple code that contains spout and two bolts, the code example count words from log file.
code example url :
http://kaviddiss.com/2013/05/17/how-to-get-started-with-storm-framework-in-5-minutes/
the code works perfectly with small log files (7.3M), but when I try to run a big log file (100M-1000M) I'm getting exceptions.
I set a long delay till the cluster is going down.
May I miss some configuration options here?
exceptions:
11326 [Thread-6] INFO backtype.storm.daemon.supervisor - Launching worker with assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id "HelloStorm-1-1403522378", :executors ([3 3] [ 4 4] [2 2] [1 1])} for this supervisor 868aff95-7b63-44d1-ad55-2dd07d9c7ba2 on port 1024 with id df052251-45ec-4bc3-a486-c2bf11a8a0fa
11336 [Thread-6] INFO backtype.storm.daemon.worker - Launching worker for HelloStorm-1-1403522378 on 868aff95-7b63-44d1-ad55-2dd07d9c7ba2:1024 with id df052251-45ec-4bc3-a486-c2bf11a8a0fa and conf {"dev.zookeeper.path" "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil, "topology.builtin.metrics.bucket.size.secs" 60, "topology.fall.back.on.java.serialization" true, "topology.ma x.error.report.per.interval" 5, "zmq.linger.millis" 0, "topology.skip.missing.kryo.registrations" true, "storm.messaging.netty.client_worker_threads" 1, "ui.childopts" "-Xmx768m", "storm.zookeeper. session.timeout" 20000, "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 50, "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m", "java.library.path" "/usr/local/li b:/opt/local/lib:/usr/lib", "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/var/tmp//77d5cd63-9539-44a4-892a-9e91553987df", "storm.messaging.netty.buffer_size" 5242880, "supervisor.w orker.start.timeout.secs" 120, "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs" 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64, "topology.worker.sha red.thread.pool.size" 4, "nimbus.host" "localhost", "storm.messaging.netty.min_wait_ms" 100, "storm.zookeeper.port" 2000, "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size" 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root" "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000, "supervisor.enable" true, "storm.messaging.netty.server_worker_t hreads" 1, "storm.zookeeper.servers" ["localhost"], "transactional.zookeeper.root" "/transactional", "topology.acker.executors" nil, "topology.transfer.buffer.size" 1024, "topology.worker.childopts " nil, "drpc.queue.size" 128, "worker.childopts" "-Xmx768m", "supervisor.heartbeat.frequency.secs" 5, "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772, "supervisor.monitor. frequency.secs" 3, "drpc.childopts" "-Xmx768m", "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3, "topology.tasks" nil, "storm.messaging.netty.max_retries" 30, "topology.spout.w ait.strategy" "backtype.storm.spout.SleepSpoutWaitStrategy", "nimbus.thrift.max_buffer_size" 1048576, "topology.max.spout.pending" nil, "storm.zookeeper.retry.interval" 1000, "topology.sleep.spout. wait.strategy.time.ms" 1, "nimbus.topology.validator" "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports" (1024 1025 1026), "topology.debug" false, "nimbus.task.launch.secs" 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs" 30, "task.refresh.poll.secs" 10, "topology.workers" 1, "supervisor.childopts" "-Xmx256m", "nimbus.thrift.port" 6627, "topol ogy.stats.sample.rate" 0.05, "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer" "backtype.storm.serialization.types.ListDelegateSerializer", "topology.disruptor.wait.strategy" "com.lm ax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30, "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory" "backtype.storm.serialization.DefaultKryoFactory", "drpc.invoc ations.port" 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times" 5, "storm.thrift.transport" "backtype.storm.security.auth.SimpleTransportPlugin", "topology.state.synchroniz ation.timeout.secs" 60, "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs" 600, "storm.messaging.transport" "backtype.storm.messaging.netty.Context", "logviewer.appender.name" "A1", "storm.messaging.netty.max_wait_ms" 1000, "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port" 8080, "nimbus.childopts" "-Xmx1024m", "storm.cluster.mode" "local", "topolog y.optimize" true, "topology.max.task.parallelism" nil}
11337 [Thread-6] INFO com.netflix.curator.framework.imps.CuratorFrameworkImpl - Starting
11344 [Thread-6-EventThread] INFO backtype.storm.zookeeper - Zookeeper state update: :connected:none
11358 [Thread-6] INFO com.netflix.curator.framework.imps.CuratorFrameworkImpl - Starting
11611 [Thread-6] INFO backtype.storm.daemon.executor - Loading executor line-reader-spout:[2 2]
11618 [Thread-6] INFO backtype.storm.daemon.executor - Loaded executor tasks line-reader-spout:[2 2]
11632 [Thread-16-line-reader-spout] INFO backtype.storm.daemon.executor - Opening spout line-reader-spout:(2)
Start Time: 18512885554479686
11634 [Thread-16-line-reader-spout] INFO backtype.storm.daemon.executor - Opened spout line-reader-spout:(2)
11636 [Thread-16-line-reader-spout] INFO backtype.storm.daemon.executor - Activating spout line-reader-spout:(2)
11638 [Thread-6] INFO backtype.storm.daemon.executor - Finished loading executor line-reader-spout:[2 2]
11677 [Thread-6] INFO backtype.storm.daemon.executor - Loading executor word-counter:[3 3]
11721 [Thread-6] INFO backtype.storm.daemon.executor - Loaded executor tasks word-counter:[3 3]
11725 [Thread-6] INFO backtype.storm.daemon.executor - Finished loading executor word-counter:[3 3]
11733 [Thread-6] INFO backtype.storm.daemon.executor - Loading executor word-spitter:[4 4]
11735 [Thread-6] INFO backtype.storm.daemon.executor - Loaded executor tasks word-spitter:[4 4]
11737 [Thread-6] INFO backtype.storm.daemon.executor - Finished loading executor word-spitter:[4 4]
11746 [Thread-6] INFO backtype.storm.daemon.executor - Loading executor __system:[-1 -1]
11747 [Thread-6] INFO backtype.storm.daemon.executor - Loaded executor tasks __system:[-1 -1]
11748 [Thread-6] INFO backtype.storm.daemon.executor - Finished loading executor __system:[-1 -1]
11761 [Thread-6] INFO backtype.storm.daemon.executor - Loading executor __acker:[1 1]
11765 [Thread-6] INFO backtype.storm.daemon.executor - Loaded executor tasks __acker:[1 1]
11767 [Thread-6] INFO backtype.storm.daemon.executor - Timeouts disabled for executor __acker:[1 1]
11768 [Thread-6] INFO backtype.storm.daemon.executor - Finished loading executor __acker:[1 1]
11768 [Thread-6] INFO backtype.storm.daemon.worker - Launching receive-thread for 868aff95-7b63-44d1-ad55-2dd07d9c7ba2:1024
11786 [Thread-6] INFO backtype.storm.daemon.worker - Worker has topology config {"storm.id" "HelloStorm-1-1403522378", "dev.zookeeper.path" "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.se cs" nil, "topology.builtin.metrics.bucket.size.secs" 60, "topology.fall.back.on.java.serialization" true, "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 0, "topology.skip.missing.k ryo.registrations" true, "storm.messaging.netty.client_worker_threads" 1, "ui.childopts" "-Xmx768m", "storm.zookeeper.session.timeout" 20000, "nimbus.reassign" true, "topology.trident.batch.emit.in terval.millis" 50, "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m", "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib", "topology.executor.send.buffer.size" 1024, "storm.l ocal.dir" "/var/tmp//77d5cd63-9539-44a4-892a-9e91553987df", "storm.messaging.netty.buffer_size" 5242880, "supervisor.worker.start.timeout.secs" 120, "topology.enable.message.timeouts" true, "inputF ile" "test_log.log", "nimbus.cleanup.inbox.freq.secs" 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64, "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "localhost", "storm.messaging.netty.min_wait_ms" 100, "storm.zookeeper.port" 2000, "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size" 1024, "transactional.zookeeper.servers" nil, "stor m.zookeeper.root" "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000, "supervisor.enable" true, "storm.messaging.netty.server_worker_threads" 1, "storm.zookeeper.servers" ["localhost"], "transactional.zookeeper.root" "/transactional", "topology.acker.executors" nil, "topology.kryo.decorators" (), "topology.name" "HelloStorm", "topology.transfer.buffer.size" 1024, "topology.worker .childopts" nil, "drpc.queue.size" 128, "worker.childopts" "-Xmx768m", "supervisor.heartbeat.frequency.secs" 5, "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772, "superviso r.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m", "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3, "topology.tasks" nil, "storm.messaging.netty.max_retries" 30, "topolo gy.spout.wait.strategy" "backtype.storm.spout.SleepSpoutWaitStrategy", "nimbus.thrift.max_buffer_size" 1048576, "topology.max.spout.pending" 1, "storm.zookeeper.retry.interval" 1000, "topology.slee p.spout.wait.strategy.time.ms" 1, "nimbus.topology.validator" "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports" (1024 1025 1026), "topology.debug" false, "nimbus.task.launc h.secs" 120, "nimbus.supervisor.timeout.secs" 60, "topology.kryo.register" nil, "topology.message.timeout.secs" 30, "task.refresh.poll.secs" 10, "topology.workers" 1, "supervisor.childopts" "-Xmx25 6m", "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05, "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer" "backtype.storm.serialization.types.ListDelegateSerializer", "top ology.disruptor.wait.strategy" "com.lmax.disruptor.BlockingWaitStrategy", "nimbus.task.timeout.secs" 30, "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory" "backtype.storm.serializ ation.DefaultKryoFactory", "drpc.invocations.port" 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times" 5, "storm.thrift.transport" "backtype.storm.security.auth.SimpleTransp ortPlugin", "topology.state.synchronization.timeout.secs" 60, "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs" 600, "storm.messaging.transport" "backtype.storm.messaging.nett y.Context", "logviewer.appender.name" "A1", "storm.messaging.netty.max_wait_ms" 1000, "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port" 8080, "nimbus.childopts" "-Xmx1024m", "storm.cluster.mode" "local", "topology.optimize" true, "topology.max.task.parallelism" nil}
11786 [Thread-6] INFO backtype.storm.daemon.worker - Worker df052251-45ec-4bc3-a486-c2bf11a8a0fa for storm HelloStorm-1-1403522378 on 868aff95-7b63-44d1-ad55-2dd07d9c7ba2:1024 has finished loading
11801 [Thread-18-word-counter] INFO backtype.storm.daemon.executor - Preparing bolt word-counter:(3)
11821 [Thread-18-word-counter] INFO backtype.storm.daemon.executor - Prepared bolt word-counter:(3)
11823 [Thread-20-word-spitter] INFO backtype.storm.daemon.executor - Preparing bolt word-spitter:(4)
11825 [Thread-20-word-spitter] INFO backtype.storm.daemon.executor - Prepared bolt word-spitter:(4)
11838 [Thread-24-__acker] INFO backtype.storm.daemon.executor - Preparing bolt __acker:(1)
11840 [Thread-22-__system] INFO backtype.storm.daemon.executor - Preparing bolt __system:(-1)
11854 [Thread-24-__acker] INFO backtype.storm.daemon.executor - Prepared bolt __acker:(1)
12173 [Thread-22-__system] INFO backtype.storm.daemon.executor - Prepared bolt __system:(-1)
112055 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
112058 [main-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
112058 [Thread-6-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
112058 [Thread-6-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
121441 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
121442 [main-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
121442 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
121442 [main-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
121443 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
121443 [main-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
121443 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
121444 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
134654 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: SUSPENDED
134655 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
134655 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
134656 [main-EventThread] WARN com.netflix.curator.ConnectionState - Session expired event received
134656 [main-EventThread] WARN backtype.storm.cluster - Received event :disconnected::none: with disconnected Zookeeper.
134656 [main-EventThread] WARN com.netflix.curator.ConnectionState - Session expired event received
134657 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: LOST
134657 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
134657 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: LOST
139931 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
149745 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
149745 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
149746 [main-EventThread] WARN com.netflix.curator.ConnectionState - Session expired event received
149746 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: LOST
149747 [main-EventThread] WARN backtype.storm.cluster - Received event :expired::none: with disconnected Zookeeper.
149747 [main-EventThread] WARN com.netflix.curator.ConnectionState - Session expired event received
149747 [main-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: LOST
149747 [main-EventThread] WARN backtype.storm.cluster - Received event :expired::none: with disconnected Zookeeper.
158929 [main-EventThread] WARN backtype.storm.cluster - Received event :expired::none: with disconnected Zookeeper.
158931 [main-EventThread] WARN backtype.storm.cluster - Received event :expired::none: with disconnected Zookeeper.
158931 [Thread-6-EventThread] WARN com.netflix.curator.ConnectionState - Session expired event received
158931 [Thread-6-EventThread] INFO com.netflix.curator.framework.state.ConnectionStateManager - State change: LOST
158931 [Thread-6-EventThread] WARN backtype.storm.cluster - Received event :expired::none: with disconnected Zookeeper.
158932 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
158933 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
176934 [ConnectionStateManager-0] WARN com.netflix.curator.framework.state.ConnectionStateManager - There are no ConnectionStateListeners registered.
357333 [CuratorFramework-5] ERROR com.netflix.curator.ConnectionState - Connection timed out
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:72) ~[curator-client-1.0.1.jar:na]
at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:74) [curator-client-1.0.1.jar:na]
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:353) [curator-framework-1.0.1.jar:na]
at com.netflix.curator.framework.imps.BackgroundSyncImpl.performBackgroundOperation(BackgroundSyncImpl.java:39) [curator-framework-1.0.1.jar:na]
at com.netflix.curator.framework.imps.OperationAndData.callPerformBackgroundOperation(OperationAndData.java:40) [curator-framework-1.0.1.jar:na]
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.backgroundOperationsLoop(CuratorFrameworkImpl.java:547) [curator-framework-1.0.1.jar:na]
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.access$200(CuratorFrameworkImpl.java:50) [curator-framework-1.0.1.jar:na]
at com.netflix.curator.framework.imps.CuratorFrameworkImpl$2.call(CuratorFrameworkImpl.java:177) [curator-framework-1.0.1.jar:na]
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) [na:1.6.0_65]
at java.util.concurrent.FutureTask.run(FutureTask.java:138) [na:1.6.0_65]
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [na:1.6.0_65]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [na:1.6.0_65]
at java.lang.Thread.run(Thread.java:680) [na:1.6.0_65]
[update]
I got new exception running 70M file:
622366 [CuratorFramework-9] ERROR com.netflix.curator.framework.imps.CuratorFrameworkImpl - Background exception was not retry-able or retry gave up
java.lang.OutOfMemoryError: GC overhead limit exceeded
The problem seems to be exactly as described: you've loaded more data into memory than your JVM can support. I assume this is happening to the spout. For very large files you'll need to break up the processing by either splitting the files in advance or streaming the files in instead of trying to load the whole file into memory.

Resources