Hikari CP (Spring Boot) Connection Recovery Problem After DB Failure - spring-boot

We have several microservices build on Spring Boot (2.2.4) and Hikari CP (3.4.2) with PostgreSQL.
Recently we have faced DB failure around 30 seconds. After the connections are lost some of the containers are failed to recover connections while others which has exactly the same configuration and application are just fine. Unfortunately we don't have the log indicating the pool sizes(idle active waiting) on time of the error.
We have received some broken pipe and connection lost errors on all containers when the connections are lost. After DB recovery we got the following exception only on some (2/18) containers that are failed to recover.
StackTrace:
org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:402) ... 20 moreCaused by:
java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30000ms. at
com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:689) at
com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:196) at
com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:161) at
com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:128) at
org.hibernate.engine.jdbc.connections.internal.DatasourceConnectionProviderImpl.getConnection(DatasourceConnectionProviderImpl.java:122) at
org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:38) at
org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:104)
... 30 moreCaused by:org.postgresql.util.PSQLException: This connection has been closed. at
org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:857) at
org.postgresql.jdbc.PgConnection.setNetworkTimeout(PgConnection.java:1639) at
com.zaxxer.hikari.pool.PoolBase.setNetworkTimeout(PoolBase.java:556) at
com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:169) at
com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:185) ... 35 more
we have seen similar(on the same system) situations and tests where the DB failovers and connections are restored on Hikari without any problem. But in this case one of the containers are restored by itself after 1 hour and others after restart.
As far as we know Hikari is not returning the broken connections on the pool and evicts them from the pool after marked as broken or closed. Any ideas what might happened to those containers while the others(exactly same image and configuration) are just fine.
PS: we cannot reproduce the problem.
Hikari configuration:
allowPoolSuspension.............false
connectionInitSql...............none
connectionTestQuery.............none
connectionTimeout...............30000
idleTimeout.....................600000
initializationFailTimeout.......1
isolateInternalQueries..........false
leakDetectionThreshold..........0
maxLifetime.....................1800000
maximumPoolSize.................15
minimumIdle.....................15
validationTimeout...............5000

You can configure something like:
connectionTestQuery=select 1
This way Hikari tests that the connection is still alive before handling it over to Hibernate.

Related

HConnection Closed - JDBC - Connection-Pool

We're using JDBC Connection through Hikari CP to connect to Apache Phoenix & we're facing "HConnection-Closed" Issues. It's because of stale connections present in the pool and the pool does not get cleared until we restart the applications.
Has anyone faced something like above (may not be same DB)?
Is there any recommended approach to connect to Phoenix from Spring applications?

Connection was closed and evicted message with HikariCP after certain idle time

My Spring boot application is using HikariCP. I am getting following error message "connection closed and connection was evicted"
com.zaxxer.hikari.pool.PoolBase: HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection#1610c743 (This connection has been closed.). Possibly consider using a shorter maxLifetime value.
[nnection closer] com.zaxxer.hikari.pool.PoolBase: HikariPool-1 - Closing connection org.postgresql.jdbc.PgConnection#1610c743: (connection was evicted)
I got this message when I kept my application ON overnight and I tried to access an API from the application after around 10 hours of idle time approximately. The first call after 10 hours of idle time took 11 seconds to return response and gave me above message in logs.
Subsequent calls after this took response time around 2 seconds and I did not see this particular message.
Does any one has any idea why did I got this particular message and why it took so long for the first call after the idle time. My application is deployed on Azure Spring Cloud. Following are the library versions
HikariCP version: 4.0.3
Spring boot: 2.5.5
PostgresSql: 42.2.23
Hikari property value which I have changed. I have changed this because the default of 30minutes was giving me connection timeout and exception was generated. After changing the maxLifetime property now I don't get any connection timeout exception
hikari:
maxLifetime: 300000

Spring mongo driver connection pool close error

I have spring webflux/reactive server utilizing a singleton mongo database instance running on the same machine. Now, I have an rest endpoint in the server which triggers an external etl(python script using pymongo connection) on the db. But it then leads to pool close error on my spring server, and any subsequent database operations from the server fails.
2020-02-16T01:25:58.320+0530 [QUIET] [system.out] 2020-02-16 01:25:58.321 INFO 93553 --- [extShutdownHook] org.mongodb.driver.connection : Closed connection [connectionId{localValue:3, serverValue:133}] to localhost:27017 because the pool has been closed.
My etl runs for 10 secs, but mongo driver never reconnects after connection is closed from the pymongo side.
I tried mongodb configuration flags, but failed, don't know if there is a way. I am also willing to reconnect to mongodb on every rest call to avoid this, but any ideas/suggestions there.
I was hoping mongo_client should provide onClose() function, for the application to handle disconnections, but could not find any such handler.

How to configure auto reconnection with hikari in SpringBoot application?

We are using SpringBoot 2.1.x version so Hikari is the default DataSource implementation. However, I am not sure how to configure Hikari settings to auto reconnect to our Oracle database after database maintenance/restart or network connection issue.
We have the following hikari settings but it does not seem to help.
account.datasource.url: jdbc:oracle:thin:#myserver:1521:DEV
account.datasource.username: user
account.datasource.password: xxxx
account.datasource.driverClassName: oracle.jdbc.driver.OracleDriver
account.datasource.hikari.connection-timeout: 30000
account.datasource.hikari.maximum-pool-size: 3
account.datasource.hikari.idle-timeout: 60000
account.datasource.hikari.max-lifetime: 1800000
account.datasource.hikari.minimum-idle: 2
It failed to reconnect after network connection to the database got restored.
Failed to obtain JDBC Connection; nested exception is java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30033ms.
Any other account.datasource.hikari.xxxxx will help to auto reconnect to the database ?
From the HikariCP docs:
connectionTestQuery
If your driver supports JDBC4 we strongly
recommend not setting this property. This is for "legacy" drivers that
do not support the JDBC4 Connection.isValid() API. This is the query
that will be executed just before a connection is given to you from
the pool to validate that the connection to the database is still
alive. Again, try running the pool without this property, HikariCP
will log an error if your driver is not JDBC4 compliant to let you
know. Default: none
So I'd suggest verifying that your JDBC Driver is actually JDBC4 compliant. If it's not - set the above property.

Webapp hangs when Active MQ broker is running

I got a strange problem with my spring webapp (running on local jetty) which connects to a locally running ActiveMQ broker for JMS functionality.
As soon as I start the broker the applications becomes incredibly slow, e.g. the startup of the ApplicationContext with active broker takes forever (i.e. > 10mins, did not yet wait long enough for it to complete). If I start the broker after the webapp (i.e. after the ApplicationContext was loaded) it's running but in a very very slow way (requests which usually take <1s take >30s). All operations take longer even the ones without JMS involved. When I run the application without an activemq broker everything runs smoothly (except the JMS related stuff of course ;-) )
Here's what I tried so far:
Updated the ActiveMQ Version to 5.10.1
Used standalone ActiveMQ instead of maven-plugin
moved the broker running from a separate JVM (via active mq maven plugin, connection via JNDI lookup in jetty config) into the same JVM (started via spring config, without JNDI)
changed the active mq transport from tcp to vm
several activemq settings (alwaysSyncSend, alwaysSessionAsync, producerWindowSize)
Using CachingConnectionFactory and PooledConnectionFactory
When analyzing a thread dump (jstack) I see many activemq threads sleeping on a monitor. Which looks like this:
"ActiveMQ VMTransport: vm://localhost#0-3" daemon prio=6 tid=0x000000000b1a3000 nid=0x1840 waiting on condition [0x00000000177df000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000f786d670> (a java.util.concurrent.SynchronousQueue$TransferStack)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:196)
at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:424)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:323)
at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:874)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:955)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:917)
at java.lang.Thread.run(Thread.java:662)
Any help is greatly appreciated !
I found the cause of the issue and was able to fix it:
we were passing a transactionmanager to the AbstractMessageListenerContainer. While in production there is a XA-Transactionmanager in use on the local jetty environment only a JPATransactionManager is used. Apparently the JMS is waiting forever for an XA transaction to be commited, which never happens in the local environment.
By overriding the bean definition of the AbstractMessageListenerContainer for the local env without setting a transcationmanager but using sessionTransacted="true" instead everything works fine.
I got the idea that it might be related to transaction handling from enabling the ActiveMQ logging. With this I saw that something was wrong with the transaction (transactionContext.getTransactionId() returned null).

Resources