How does Kafka identify an idle connection? - spring-boot

I'm doing some tests on my producer (springboot), sending requests every 30 seconds for 1 hour.
I notice that after a few minutes (~10 minutes), with TRACE enabled, I have the error below:
2022-07-18 19:15:25.025 DEBUG 12991 --- [streams-kafka-1] o.apache.kafka.common.network.Selector : [Producer clientId=event-streams-kafka-1] Connection with localhost/127.0.0.1 disconnected
java.io.EOFException: null
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:97) ~[kafka-clients-3.0.0.jar:na]
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:452) ~[kafka-clients-3.0.0.jar:na]
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:402) ~[kafka-clients-3.0.0.jar:na]
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:674) [kafka-clients-3.0.0.jar:na]
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:576) [kafka-clients-3.0.0.jar:na]
at org.apache.kafka.common.network.Selector.poll(Selector.java:481) [kafka-clients-3.0.0.jar:na]
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:551) [kafka-clients-3.0.0.jar:na]
at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:328) [kafka-clients-3.0.0.jar:na]
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:243) [kafka-clients-3.0.0.jar:na]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_311]
2022-07-18 19:15:25.027 TRACE 12991 --- [streams-kafka-1] o.apache.kafka.common.network.Selector : [Producer clientId=event-streams-kafka-1] Read from closing channel failed, ignoring exception
java.io.EOFException: null
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:97) ~[kafka-clients-3.0.0.jar:na]
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:452) ~[kafka-clients-3.0.0.jar:na]
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:402) ~[kafka-clients-3.0.0.jar:na]
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:674) [kafka-clients-3.0.0.jar:na]
at org.apache.kafka.common.network.Selector.maybeReadFromClosingChannel(Selector.java:700) [kafka-clients-3.0.0.jar:na]
at org.apache.kafka.common.network.Selector.close(Selector.java:935) [kafka-clients-3.0.0.jar:na]
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:625) [kafka-clients-3.0.0.jar:na]
at org.apache.kafka.common.network.Selector.poll(Selector.java:481) [kafka-clients-3.0.0.jar:na]
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:551) [kafka-clients-3.0.0.jar:na]
at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:328) [kafka-clients-3.0.0.jar:na]
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:243) [kafka-clients-3.0.0.jar:na]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_311]
2022-07-18 19:15:25.027 DEBUG 12991 --- [streams-kafka-1] org.apache.kafka.clients.NetworkClient : [Producer clientId=event-streams-kafka-1] Node -2 disconnected.
I checked in Kafka documentations, that the connections.max.idle.ms parameter could solve my idle disconnects problem.
I changed the connections.max.idle.ms=-1 parameter on the application side and got no improvement.
When I made the change from the broker side, the error stopped occurring.
But my big question is, what makes Kafka understand that my connection is idle? Since my application has been sending requests to Kafka every 30 seconds, is there any way to send a signal to kafka to prevent it from closing my connection?
This is a problem for me, as changing this parameter on the broker side will affect all applications that use Kafka in a productive environment. I don't feel comfortable with that.

The usual occurrence of this issue just because any old KAFKA clients trying to connect to the broker which is having a latest version, for the idle connection scenarios. As you mentioned, the broker side setting idle.ms would eventually fix this issue. But the recommendations from the community, is to upgrade your kafka client to a latest version, which handles the idle connection appropriately behind the scene.
More details here : https://issues.apache.org/jira/browse/KAFKA-3205

Related

How to have load balancer for nifi handlehttprequest processor

I have 3 node Nifi cluster running(1.5 version).
I have built the pipeline with handlehttprequest and intermediate set of processor for transformation/cleaning the incoming data followed by handlehttpresponse.
I have provided the http end point to client applications.
There are various client applications which would be posting the data through the rest end point provided by nifi.
I had issue with Handlehttprequest running with execution mode as all nodes. Nifi UI becomes unresponsive after some time and I could see below error message in the nifi-app.log and I could see error message in the processor(handlehttprequest).
Changing the execution to primary node, solves the issue.
So need help with below:
Handlehttprequest can't work on multinode cluster environment?
If so, how to make use of multiple nodes to handle high throughput of incoming data posted by client applications?
how to put load balancer for handlehttprequest endpoint and distribute the incoming data to multiple nodes on the cluster?
error message at processor level:
HandleHttpRequest[id=74ee1128-2de6-3979-a818-2b598186f7aa] HandleHttpRequest[id=74ee1128-2de6-3979-a818-2b598186f7aa] failed to process due to org.apache.nifi.processor.exception.ProcessException: Failed to initialize the server; rolling back session: Failed to initialize the server
==========================
Nifi-app.log
2018-08-03 14:45:22,551 INFO [Timer-Driven Process Thread-6] org.eclipse.jetty.server.Server jetty-9.4.3.v20170317
2018-08-03 14:45:22,551 ERROR [Timer-Driven Process Thread-3] o.a.n.p.standard.HandleHttpRequest HandleHttpRequest[id=74ee1128-2de6-3979-a818-2b598186f7aa] Failed to process session due to org.apache.nifi.processor.exception.ProcessException: Failed to initialize the server: {}
org.apache.nifi.processor.exception.ProcessException: Failed to initialize the server
at org.apache.nifi.processors.standard.HandleHttpRequest.onTrigger(HandleHttpRequest.java:488)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1122)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:147)
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:128)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.BindException: Cannot assign requested address
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
==============================================
Any help in providing the direction is appreciated.
Thanks,
Vish
This error generally means there is an issue with the server binding to the host or port. Since it sounds like the port is available it is probably an issue with the hostname.
In a cluster you would have to leave the hostname property blank in HandleHttpRequest so it will bind to the address of each node, or you could use a dynamic expression like ${hostname}.
For load balancing, it should be the same as any other configuration to put a load balancer in front of a web application. There are some articles that cover it already:
https://pierrevillard.com/2017/02/10/haproxy-load-balancing-in-front-of-apache-nifi/

javax.net.ssl.SSLException: handshake timed out in corda node

While initiating flow from spring web server by passing required values from PartyA to PartyB in corda, I am getting following exception in my initiating node PartyA, kindly do the needfull.
entered verifysend method
E 12:01:47+0530 [Thread-4 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3#6cb2b947)] core.client.createConnection - AMQ214016: Failed to create netty connection
javax.net.ssl.SSLException: handshake timed out
at io.netty.handler.ssl.SslHandler.handshake(...)(Unknown Source) ~[netty-all-4.1.9.Final.jar:4.1.9.Final]
=========collecting ended=========
Even through my flow is running successfully by giving response as Transaction id:.... commited to ledger in web server, but my flow is taking around 5 minutes to create a unconsumed state.
I think you are run into low-memory issue, so one of your nodes was crashed. Hence lead to the handshake error.
The current minimal requirement for starting a node is 1GB of JVM Heap and 2GB minimal host RAM.

Oracle RAC database creates load on one server and client fails to connect

Almost all the client connections are directed to one server and the application fails to connect after some time with IOException. Seems load balancing is not working properly. We are forced to restart the listener at this point.
jdbc URL : jdbc:oracle:thin:#(DESCRIPTION=(FAILOVER=ON)(LOAD_BALANCE=yes)(ADDRESS=(PROTOCOL=TCP)(HOST=scan-ip)(PORT=1538))(CONNECT_DATA=(service_name=production)(SERVER=DEDICATED)))
Exception:
Caused by: org.apache.commons.dbcp.SQLNestedException: Cannot create PoolableConnectionFactory (Io exception: Connection reset)
at org.apache.commons.dbcp.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:1549)
at org.apache.commons.dbcp.BasicDataSource.createDataSource(BasicDataSource.java:1388)
at org.apache.commons.dbcp.BasicDataSource.getConnection(BasicDataSource.java:1044)
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)
... 60 more
Caused by: java.sql.SQLException: Io exception: Connection reset
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:112)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:146)
at oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:255)
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:387)
at oracle.jdbc.driver.PhysicalConnection.<init>(PhysicalConnection.java:414)
at oracle.jdbc.driver.T4CConnection.<init>(T4CConnection.java:165)
at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:35)
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:801)
at org.apache.commons.dbcp.DriverConnectionFactory.createConnection(DriverConnectionFactory.java:38)
at org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:582)
at org.apache.commons.dbcp.BasicDataSource.validateConnectionFactory(BasicDataSource.java:1556)
at org.apache.commons.dbcp.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:1545)
... 64 more
Version:
11.2.0.4.0 - 64bit
Just to make sure - when you say "all connection are directed to one node" - do you mean that the session are created on one node or that the scan listener taking care of the connection is only being used on one node?
Check and see if you SCAN address contain 3 IP addresses (nslookup scan-ip), make sure that your SCAN listeners are being distributed across nodes (srvctl status scan and srvctl status scan_listener). You can also check to see that the database service you're trying to use (production) is being registered on all appropriate listeners (check database parameter "remote_listener" to make sure where the service is being registered to).

What will cause zookeeper Client session timed out

I deployed a long running Storm topology. After several hours running, the whole topology went down. I checked worker logs, and found these logs . As it says, zookeeper client session timed out and it caused reconnection. I suspect it was relate to my broken topology. Now I try to find out what can cause clients timeout.
2016-02-29T10:34:12.386+0800 o.a.s.z.ClientCnxn [INFO] Client session timed out, have not heard from server in 23789ms for sessionid 0x252f862028c0083, closing socket connection and attempting reconnect
2016-02-29T10:34:12.986+0800 o.a.s.c.f.s.ConnectionStateManager [INFO] State change: SUSPENDED
2016-02-29T10:34:13.059+0800 b.s.cluster [WARN] Received event :disconnected::none: with disconnected Zookeeper.
2016-02-29T10:34:13.197+0800 o.a.s.z.ClientCnxn [INFO] Opening socket connection to server zk-3.cloud.mos/172.16.13.147:2181. Will not attempt to authenticate using SASL (unknown error)
2016-02-29T10:34:13.241+0800 o.a.s.z.ClientCnxn [WARN] Session 0x252f862028c0083 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[na:1.8.0_31]
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716) ~[na:1.8.0_31]
at org.apache.storm.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) ~[storm-core-0.9.6.jar:0.9.6]
at org.apache.storm.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) ~[storm-core-0.9.6.jar:0.9.6]
Your client can no longer talk to the ZooKeeper server. The first thing that happened was there was no answer to the heartbeats within the negotiated session timeout:
2016-02-29T10:34:12.386+0800 o.a.s.z.ClientCnxn [INFO] Client session timed out, have not heard from server in 23789ms for sessionid 0x252f862028c0083, closing socket connection and attempting reconnect
Then when it tried to reconnect, it got a connection refused:
2016-02-29T10:34:13.241+0800 o.a.s.z.ClientCnxn [WARN] Session 0x252f862028c0083 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
This means either your ZooKeeper server:
Is not reachable (network connection down)
Is dead (so nothing is listening on the socket)
Is GCing itself to death and cannot communicate (although that might have issued a connection timeout error, I'm not sure)
To tell more you will need to check the ZooKeeper server logs on your (Hadoop?) cluster.
Its worked for me by increasing the connection timeout in server.properties:
zookeeper.connection.timeout.ms=60000
One way that this can happen is if you start zookeeper, then break in the terminal, then try to start kafka.
In order to use kafka, you really should use 3 terminal windows (or 3 PuTTY sessions if you are SSHing into your instance from Windows)
First Session for Zookeeper server.
Second Session for Kafka server.
Third Session for running Kafka commands to do things like create topics.
I have started Kafka in cluster mode with 3 zookeeper server and 3 Kafka server. All zookeeper server started successfully but while starting Kafka server its get disconnected stating "fatal error during Kafka server startup. prepare to shutdown (kafka.server.kafkaserver)". while investigation, I found that Kafka server get disconnected every time after 18 seconds[which is zookeeper.connection.timeout.ms = 18000 default value] so I updated the same and issue get resolved.
always use 2181 as port number for zookeeper connection until you haven't configured your zookeeper !!!

Reason for region server to become dead

I have one 3 node hbase cluster running on amazon Ec2. Which is working perfectly fine. Now, I try to insert the data from EMR to EC2 using two separate insert queries. So first insert query works perfectly fine and insert the data and after that all of my region servers become dead. So, could you please suggest me general guidelines to debug this problem and why generally region servers become dead?
Moreover, even i explicitly start the region servers after sometime again they become dead.
Update question :
Earlier i was thinking it might be a problem due to HBASE_HEAPSIZE which is by default set to 1GB. But i also increased that to 5.5 Gb still region servers are becoming dead.
Below is the logs which i am getting on every region server after they are dead.
2013-10-07 18:16:27,949 WARN org.apache.zookeeper.ClientCnxn: Session 0x141916dfbe50000 for server null, unexpected error, closing socket connection and attempting rec$
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:597)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
2013-10-07 18:16:27,990 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: localhost/10.179.42.93:50020. Already tried 1 time(s).
2013-10-07 18:16:28,049 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server master/10.179.42.93:2181
2013-10-07 18:16:28,049 INFO org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client'$
2013-10-07 18:16:28,049 WARN org.apache.zookeeper.ClientCnxn: Session 0x141916dfbe50001 for server null, unexpected error, closing socket connection and attempting rec$
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:597)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
2013-10-07 18:16:28,177 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server slave/10.178.5.52:2181
2013-10-07 18:16:28,177 INFO org.apache.zookeeper.client.ZooKeeperSaslClient: Client will not SASL-authenticate because the default JAAS configuration section 'Client'$
2013-10-07 18:16:28,178 WARN org.apache.zookeeper.ClientCnxn: Session 0x141916dfbe50001 for server null, unexpected error, closing socket connection and attempting rec$
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:597)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:286)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1035)
You can check logs for RegionServer. Here is the more information about log location.
http://hbase.apache.org/book/trouble.log.html
If you have to explicitly turn on the region server every time,then its problematic situation.
The best way is to spin up a new EMR instance with HBASE.

Resources