HikariPool-2 - Thread starvation or clock leap detected when stopping tomcat service - spring

I have springboot 2.7.4 application running on tomcat 9 and java 8
the application is hosted on windows server 2012 and connecting to SQL Server 2012 database
Issue : I got several connection errors like (500) due to database down, and after that the application when out of memory, when trying to stop tomcat service, it took too much time and the application log file is full of the following logs :
12:51:34,278 [WARN] com.zaxxer.hikari.pool.HikariPool : HikariPool-2 - Thread starvation or clock leap detected (housekeeper delta=3m4s836ms512µs970ns).
12:55:03,680 [WARN] com.zaxxer.hikari.pool.HikariPool : HikariPool-2 - Thread starvation or clock leap detected (housekeeper delta=1m27s135ms295µs753ns).
12:58:50,064 [WARN] com.zaxxer.hikari.pool.HikariPool : HikariPool-2 - Thread starvation or clock leap detected (housekeeper delta=4m13s693ms590µs504ns).
13:03:22,118 [WARN] com.zaxxer.hikari.pool.HikariPool : HikariPool-2 - Thread starvation or clock leap detected (housekeeper delta=2m27s243ms229µs567ns).
Question : why do I get out of memory after several connection failures ?
Do I have to add HikariPool configuration for that ?

Related

Why does my Spring app shutdown gracefully on POSIX but not on Windows?

I wrote a simple Spring app using H2 and embedded Redis server, and I have #Configuration for the embedded Redis server, so that it'll shutdown itself (#PreDestroy) before shutdown. It works fine on POSIX. I've tested it on macOS, arch, ubuntu, and it worked fine on all of them with any IDE or editor (i.e. when I clicked the stop button on IntelliJ or VS Code, both the spring app and the redis server gracefully shutdown after printing something like this).
[InstanceCleaner] --snipped-- : Stopping redis server...
[InstanceCleaner] --snipped-- : Redis exited
[ionShutdownHook] --snipped-- : Closing JPA EntityManagerFactory --snipped--
[ionShutdownHook] --snipped-- : HHH000477: Starting delayed evict --snipped--
[ionShutdownHook] --snipped-- : HikariPool-1 - Shutdown initiated...
[ionShutdownHook] --snipped-- : HikariPool-1 - Shutdown completed.
However, for some reason, when I tried to stop the app on Windows by doing the exact same thing (i.e. clicking the stop button or disconnect button or stop button on Spring extension), the spring app doesn't shutdown gracefully.
It just stops without doing anything, and it leaves the embedded redis server running. This already-running redis server prevents me from running the app again. To start the app again without any problem, I either need to terminate the redis process manually, or use other ways to shutdown the app gracefully (I found two methods, one of them is sending ctrl + c to the terminal which runs the spring app, and the other one is making a controller which triggers the shutdown and map them on /shutdown and use the url everytime I want to shutdown the app).
It's not exactly a huge hassle, but I'd like to know why I got this behavior on Windows but not on POSIX (and a fix if there is one).

Connection was closed and evicted message with HikariCP after certain idle time

My Spring boot application is using HikariCP. I am getting following error message "connection closed and connection was evicted"
com.zaxxer.hikari.pool.PoolBase: HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection#1610c743 (This connection has been closed.). Possibly consider using a shorter maxLifetime value.
[nnection closer] com.zaxxer.hikari.pool.PoolBase: HikariPool-1 - Closing connection org.postgresql.jdbc.PgConnection#1610c743: (connection was evicted)
I got this message when I kept my application ON overnight and I tried to access an API from the application after around 10 hours of idle time approximately. The first call after 10 hours of idle time took 11 seconds to return response and gave me above message in logs.
Subsequent calls after this took response time around 2 seconds and I did not see this particular message.
Does any one has any idea why did I got this particular message and why it took so long for the first call after the idle time. My application is deployed on Azure Spring Cloud. Following are the library versions
HikariCP version: 4.0.3
Spring boot: 2.5.5
PostgresSql: 42.2.23
Hikari property value which I have changed. I have changed this because the default of 30minutes was giving me connection timeout and exception was generated. After changing the maxLifetime property now I don't get any connection timeout exception
hikari:
maxLifetime: 300000

Hikari CP (Spring Boot) Connection Recovery Problem After DB Failure

We have several microservices build on Spring Boot (2.2.4) and Hikari CP (3.4.2) with PostgreSQL.
Recently we have faced DB failure around 30 seconds. After the connections are lost some of the containers are failed to recover connections while others which has exactly the same configuration and application are just fine. Unfortunately we don't have the log indicating the pool sizes(idle active waiting) on time of the error.
We have received some broken pipe and connection lost errors on all containers when the connections are lost. After DB recovery we got the following exception only on some (2/18) containers that are failed to recover.
StackTrace:
org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:402) ... 20 moreCaused by:
java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms. at
com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:689) at
com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:196) at
com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:161) at
com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:128) at
org.hibernate.engine.jdbc.connections.internal.DatasourceConnectionProviderImpl.getConnection(DatasourceConnectionProviderImpl.java:122) at
org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:38) at
org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:104)
... 30 moreCaused by:org.postgresql.util.PSQLException: This connection has been closed. at
org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:857) at
org.postgresql.jdbc.PgConnection.setNetworkTimeout(PgConnection.java:1639) at
com.zaxxer.hikari.pool.PoolBase.setNetworkTimeout(PoolBase.java:556) at
com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:169) at
com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:185) ... 35 more
we have seen similar(on the same system) situations and tests where the DB failovers and connections are restored on Hikari without any problem. But in this case one of the containers are restored by itself after 1 hour and others after restart.
As far as we know Hikari is not returning the broken connections on the pool and evicts them from the pool after marked as broken or closed. Any ideas what might happened to those containers while the others(exactly same image and configuration) are just fine.
PS: we cannot reproduce the problem.
Hikari configuration:
allowPoolSuspension.............false
connectionInitSql...............none
connectionTestQuery.............none
connectionTimeout...............30000
idleTimeout.....................600000
initializationFailTimeout.......1
isolateInternalQueries..........false
leakDetectionThreshold..........0
maxLifetime.....................1800000
maximumPoolSize.................15
minimumIdle.....................15
validationTimeout...............5000
You can configure something like:
connectionTestQuery=select 1
This way Hikari tests that the connection is still alive before handling it over to Hibernate.

Kafka Consumer Hangs Indefinitely after Rebalancing

I am trying to utilize a kafka consumer library that is prewritten in my organization. It takes JSON data from a Kafka topic and stores it in a Mongo database. While I cannot post this code, it is a very simple architecture that uses Apache Camel routes, then stores consumed messages into Mongo using the Springboot Mongo dependency.
I am running into a situation where when deploying to OpenShift, and scaling up more than 1 pod, receiving the below exception, and then the application hangs without any more input or processing. I believe the failure is happening within the logic that is within the kafka client library(s).
I have tried running two instances of the application locally, under different ports. That works perfectly without error. I have tried setting the heartbeat interval, session timeout, batch size, max fetch bytes, number of concurrent consumers, SEDA mode on/off, and request timeout. Changing those Kafka settings up, down, on, off and undefined, the issues remain.
2019-05-23 16:15:51 [Camel (camel-1) thread #1 - KafkaConsumer[mytopic]] ERROR o.a.k.c.c.i.ConsumerCoordinator - Error UNKNOWN_MEMBER_ID occurred while committing offsets for group mytopic-status
2019-05-23 16:15:51 [Camel (camel-1) thread #7 - KafkaConsumer[mytopic]] ERROR o.a.k.c.c.i.ConsumerCoordinator - Error UNKNOWN_MEMBER_ID occurred while committing offsets for group mytopic-status
2019-05-23 16:15:51 [Camel (camel-1) thread #7 - KafkaConsumer[mytopic]] WARN o.a.c.component.kafka.KafkaConsumer - Error consuming mytopic-Thread 0 from kafka topic. Caused by: [org.apache.kafka.clients.consumer.CommitFailedException - Commit cannot be completed due to group rebalance]
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed due to group rebalance
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:552)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:493)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:665)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:644)
at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:380)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:274)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:320)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:213)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:193)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:358)
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:968)
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:936)
at org.apache.camel.component.kafka.KafkaConsumer$KafkaFetchRecords.run(KafkaConsumer.java:132)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
111

Kafka Cannot Configure Topics on Application Startup, but Later Can Communicate

We have a spring boot application using spring-kafka (2.2.5.RELEASE) that always gets this error when starting up:
Could not configure topics
org.springframework.kafka.KafkaException: Timed out waiting to get existing
topics; nested exception is java.util.concurrent.TimeoutException
However, the application continues to startup:
org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1]
INFO o.s.k.l.KafkaMessageListenerContainer - partitions revoked: []
INFO o.s.k.l.KafkaMessageListenerContainer - partitions assigned: [my-reply-topic-1]
INFO o.s.k.l.KafkaMessageListenerContainer - partitions assigned: [my-request-topic-0]
INFO o.s.b.w.e.tomcat.TomcatWebServer -
Tomcat started on port(s): 8080 (http) with context path ''
At this point, the application interacts with Kafka as expected.
We like to keep our logs clean, so we would like to understand why this Exception is thrown. Also, it is a bit confusing, because when we move to a different environment where the networking has not been established between the application and the kafka broker(s), we get the same error, but the application does not function. Having the same Exception occur when there is truly a problem and when it can be ignored is irksome when trying to troubleshoot connectivity issues.
Is there a way, on application startup, to determine whether connectivity has been established with Kafka rather than just waiting for a timeout message (which may be a red herring anyway)?
If the topic(s) exist already, remove any NewTopic beans from the application context and the KafkaAdmin won't try to connect to the broker at all.

Resources