Kafka Cannot Configure Topics on Application Startup, but Later Can Communicate - spring-boot

We have a spring boot application using spring-kafka (2.2.5.RELEASE) that always gets this error when starting up:
Could not configure topics
org.springframework.kafka.KafkaException: Timed out waiting to get existing
topics; nested exception is java.util.concurrent.TimeoutException
However, the application continues to startup:
org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1]
INFO o.s.k.l.KafkaMessageListenerContainer - partitions revoked: []
INFO o.s.k.l.KafkaMessageListenerContainer - partitions assigned: [my-reply-topic-1]
INFO o.s.k.l.KafkaMessageListenerContainer - partitions assigned: [my-request-topic-0]
INFO o.s.b.w.e.tomcat.TomcatWebServer -
Tomcat started on port(s): 8080 (http) with context path ''
At this point, the application interacts with Kafka as expected.
We like to keep our logs clean, so we would like to understand why this Exception is thrown. Also, it is a bit confusing, because when we move to a different environment where the networking has not been established between the application and the kafka broker(s), we get the same error, but the application does not function. Having the same Exception occur when there is truly a problem and when it can be ignored is irksome when trying to troubleshoot connectivity issues.
Is there a way, on application startup, to determine whether connectivity has been established with Kafka rather than just waiting for a timeout message (which may be a red herring anyway)?

If the topic(s) exist already, remove any NewTopic beans from the application context and the KafkaAdmin won't try to connect to the broker at all.

Related

Spring Boot + Kafka handle if Kafka is not available

I'm trying to push some non-critical data into Kafka, but I'd still like the application to run without Kafka. However my Spring Boot (with spring-kafka 2.7.2) now won't start normall and it's in a loop of:
[AdminClient clientId=adminclient-1] Connection to node -1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available.
Basically this blocks all http requests to my app because I guess the startup process cannot be completed until it connects to Kafka. After maybe a minute, I get the "Tomcat started xx" message, so now the application runs, I guess?
Except when now anything calls a Kafka related services, it again goes into a loop of:
org.apache.kafka.clients.NetworkClient : [Producer clientId=producer-1] Connection to node -1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available.
[ad | producer-1] org.apache.kafka.clients.NetworkClient : [Producer clientId=producer-1] Bootstrap broker localhost:9092 (id: -1 rack: null) disconnected
This again goes for 60 seconds, then it stops, and I get internal server error.
I'd like to simply treat Kafka as an optional service even though on production we will have it up constantly, this might not be true for any of the testing or development servers or developer machines etc. Basically "if it is running, push this data to this topic, if its not running, fine" situation.
Is there a way to config Spring with Kafka so it does not block everything like this (and ideally this all should run in a background thread, which is another thing, but very weird that Kafka would block all the main thread of a request)?
Thanks
I have experienced the same in one of the projects so I did enabled/disabled the Kafka service class through conditionalOnExpression like below.
#Service
#ConditionalOnExpression(value = "'${kafka.enabled}'.equalsIgnoreCase('true')")
public class kafkaPosting {
....
}
Please note to add the property (kafka.enabled) in your application.properties file.
Moreover, if you are Autowiring Kafka bean into your service class then don't forget to add the below to the injection.
#Autowired(required=false)
KafkaClient kafkaClient;
This will help to bypass the kafka based on the configured properties.

Hikari CP (Spring Boot) Connection Recovery Problem After DB Failure

We have several microservices build on Spring Boot (2.2.4) and Hikari CP (3.4.2) with PostgreSQL.
Recently we have faced DB failure around 30 seconds. After the connections are lost some of the containers are failed to recover connections while others which has exactly the same configuration and application are just fine. Unfortunately we don't have the log indicating the pool sizes(idle active waiting) on time of the error.
We have received some broken pipe and connection lost errors on all containers when the connections are lost. After DB recovery we got the following exception only on some (2/18) containers that are failed to recover.
StackTrace:
org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:402) ... 20 moreCaused by:
java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms. at
com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:689) at
com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:196) at
com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:161) at
com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:128) at
org.hibernate.engine.jdbc.connections.internal.DatasourceConnectionProviderImpl.getConnection(DatasourceConnectionProviderImpl.java:122) at
org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:38) at
org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:104)
... 30 moreCaused by:org.postgresql.util.PSQLException: This connection has been closed. at
org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:857) at
org.postgresql.jdbc.PgConnection.setNetworkTimeout(PgConnection.java:1639) at
com.zaxxer.hikari.pool.PoolBase.setNetworkTimeout(PoolBase.java:556) at
com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:169) at
com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:185) ... 35 more
we have seen similar(on the same system) situations and tests where the DB failovers and connections are restored on Hikari without any problem. But in this case one of the containers are restored by itself after 1 hour and others after restart.
As far as we know Hikari is not returning the broken connections on the pool and evicts them from the pool after marked as broken or closed. Any ideas what might happened to those containers while the others(exactly same image and configuration) are just fine.
PS: we cannot reproduce the problem.
Hikari configuration:
allowPoolSuspension.............false
connectionInitSql...............none
connectionTestQuery.............none
connectionTimeout...............30000
idleTimeout.....................600000
initializationFailTimeout.......1
isolateInternalQueries..........false
leakDetectionThreshold..........0
maxLifetime.....................1800000
maximumPoolSize.................15
minimumIdle.....................15
validationTimeout...............5000
You can configure something like:
connectionTestQuery=select 1
This way Hikari tests that the connection is still alive before handling it over to Hibernate.

Spring mongo driver connection pool close error

I have spring webflux/reactive server utilizing a singleton mongo database instance running on the same machine. Now, I have an rest endpoint in the server which triggers an external etl(python script using pymongo connection) on the db. But it then leads to pool close error on my spring server, and any subsequent database operations from the server fails.
2020-02-16T01:25:58.320+0530 [QUIET] [system.out] 2020-02-16 01:25:58.321 INFO 93553 --- [extShutdownHook] org.mongodb.driver.connection : Closed connection [connectionId{localValue:3, serverValue:133}] to localhost:27017 because the pool has been closed.
My etl runs for 10 secs, but mongo driver never reconnects after connection is closed from the pymongo side.
I tried mongodb configuration flags, but failed, don't know if there is a way. I am also willing to reconnect to mongodb on every rest call to avoid this, but any ideas/suggestions there.
I was hoping mongo_client should provide onClose() function, for the application to handle disconnections, but could not find any such handler.

Kafka Consumer Hangs Indefinitely after Rebalancing

I am trying to utilize a kafka consumer library that is prewritten in my organization. It takes JSON data from a Kafka topic and stores it in a Mongo database. While I cannot post this code, it is a very simple architecture that uses Apache Camel routes, then stores consumed messages into Mongo using the Springboot Mongo dependency.
I am running into a situation where when deploying to OpenShift, and scaling up more than 1 pod, receiving the below exception, and then the application hangs without any more input or processing. I believe the failure is happening within the logic that is within the kafka client library(s).
I have tried running two instances of the application locally, under different ports. That works perfectly without error. I have tried setting the heartbeat interval, session timeout, batch size, max fetch bytes, number of concurrent consumers, SEDA mode on/off, and request timeout. Changing those Kafka settings up, down, on, off and undefined, the issues remain.
2019-05-23 16:15:51 [Camel (camel-1) thread #1 - KafkaConsumer[mytopic]] ERROR o.a.k.c.c.i.ConsumerCoordinator - Error UNKNOWN_MEMBER_ID occurred while committing offsets for group mytopic-status
2019-05-23 16:15:51 [Camel (camel-1) thread #7 - KafkaConsumer[mytopic]] ERROR o.a.k.c.c.i.ConsumerCoordinator - Error UNKNOWN_MEMBER_ID occurred while committing offsets for group mytopic-status
2019-05-23 16:15:51 [Camel (camel-1) thread #7 - KafkaConsumer[mytopic]] WARN o.a.c.component.kafka.KafkaConsumer - Error consuming mytopic-Thread 0 from kafka topic. Caused by: [org.apache.kafka.clients.consumer.CommitFailedException - Commit cannot be completed due to group rebalance]
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed due to group rebalance
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:552)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:493)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:358)
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:968)
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:936)
at org.apache.camel.component.kafka.KafkaConsumer$KafkaFetchRecords.run(KafkaConsumer.java:132)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
111

Ballerina JMS connection pooling?

I have a small Ballerina program where I receive a message from one JMS queue, call a stored procedure via JDBC and send a reply to another JMS queue.
The DB can process multiple requests in parallel, so I set up a JDBC connection pool for it. How do I set up a similar JMS connection pool?
Or should I just have a pool of Ballerina services instead?
The current ballerina implementation does not support any pooling of JMS resources.
The model of ballerina, however, should allow parallel processing, without explicit coding.
Using the code from the following GIST jmsreceiver.bal the processing was done in parallel.
It produced the following log:
Initiating service(s) in 'receiver.bal'
2018-12-08 18:38:38,963 INFO [ballerina/jms] - Message receiver created for queue MyQueue
2018-12-08 18:38:57,445 INFO [] - rcv ID:EMS-SERVER.55865C0BF16270:1500
2018-12-08 18:38:58,461 INFO [] - snd ID:EMS-SERVER.55865C0BF16270:1500
2018-12-08 18:38:58,466 INFO [] - rcv ID:EMS-SERVER.55865C0BF16270:1501
2018-12-08 18:38:58,474 INFO [] - rcv ID:EMS-SERVER.55865C0BF16271:1502
2018-12-08 18:38:59,469 INFO [] - snd ID:EMS-SERVER.55865C0BF16270:1501
2018-12-08 18:38:59,472 INFO [] - rcv ID:EMS-SERVER.55865C0BF16270:1503
2018-12-08 18:38:59,478 INFO [] - snd ID:EMS-SERVER.55865C0BF16271:1502
I'm not really familiar with Ballerina, but reading through the Ballerina JMS tutorial it appears Ballerina can use Java libraries. If that is the case, then you should check out https://github.com/messaginghub/pooled-jms. It was forked from the mature ActiveMQ JMS Pool and enhanced to provide JMS 2.0 functionality. It is built on top of Apache Commons Pool, and it is generic (i.e. no ties to ActiveMQ) so it will work with any JMS implementation you choose.
Here's a simple example of how to use it. You just need to instantiate a JmsPoolConnectionFactory and then call setConnectionFactory with the connection factory you would normall get from JNDI. After that you just use it like any normal JMS connection factory.

Resources