How to handle AmqpConnectException on pod restart of a spring boot app? - spring-boot

We're using a (reactive) Spring Boot app with RabbitMQ (Spring AMQP), running on a Kubernetes cluster, and lately we noticed some weird behavior in the logs of the application.
When we make a new deploy to the kubernetes cluster we keep getting the following error repeatedly with the correspond exception stack trace until the new pods are up and old ones destroyed:
Failed to check/redeclare auto-delete queue(s).
org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection refused
at org.springframework.amqp.rabbit.support.RabbitExceptionTranslator.convertRabbitAccessException(RabbitExceptionTranslator.java:61)
at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:602)
at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createConnection(CachingConnectionFactory.java:724)
at org.springframework.amqp.rabbit.connection.ConnectionFactoryUtils.createConnection(ConnectionFactoryUtils.java:252)
at org.springframework.amqp.rabbit.core.RabbitTemplate.doExecute(RabbitTemplate.java:2163)
at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:2136)
at org.springframework.amqp.rabbit.core.RabbitTemplate.execute(RabbitTemplate.java:2116)
at org.springframework.amqp.rabbit.core.RabbitAdmin.getQueueInfo(RabbitAdmin.java:407)
at org.springframework.amqp.rabbit.core.RabbitAdmin.getQueueProperties(RabbitAdmin.java:391)
at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.attemptDeclarations(AbstractMessageListenerContainer.java:1914)
at org.springframework.amqp.rabbit.listener.AbstractMessageListenerContainer.redeclareElementsIfNecessary(AbstractMessageListenerContainer.java:1895)
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.initialize(SimpleMessageListenerContainer.java:1347)
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:1193)
at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.Net.pollConnect(Native Method)
at java.base/sun.nio.ch.Net.pollConnectNow(Net.java:660)
at java.base/sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:542)
at java.base/sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:333)
at java.base/java.net.Socket.connect(Socket.java:648)
at com.rabbitmq.client.impl.SocketFrameHandlerFactory.create(SocketFrameHandlerFactory.java:60)
at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:1220)
at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:1170)
at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.connectAddresses(AbstractConnectionFactory.java:640)
at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.connect(AbstractConnectionFactory.java:615)
at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:565)
... 12 common frames omitted
We can't seem to find the source of the issue.
I'd love if anyone have any ideas. Thanks in advance!

Caused by: java.net.ConnectException: Connection refused - means the broker can't be reached.
That is normal behavior - the containers are trying to reconnect. You can control the frequency by setting the container's recoveryBackOff property. By default, the container will attempt to connect every 5 seconds, using a FixedBackOff - you can use a fixed backoff with a longer retry interval, or an ExponentialBackOff that will increase the delay each time.
/**
* Specify the {#link BackOff} for interval between recovery attempts.
* The default is 5000 ms, that is, 5 seconds.
* With the {#link BackOff} you can supply the {#code maxAttempts} for recovery before
* the {#link #stop()} will be performed.
* #param recoveryBackOff The BackOff to recover.
* #since 1.5
*/
public void setRecoveryBackOff(BackOff recoveryBackOff) {

Related

Spring Boot and Hikari Connection Pool: ORA-12514 TNS:listener does not currently know of service requested in connect descriptor

I'm having an issue that has gone from "annoyance" to "flip the desk".
I have a very basic Spring Boot MVC service with the following Hikari Connection Pool setup:
spring.datasource.username=ME
spring.datasource.password=mypassword
#spring.datasource.url=jdbc:oracle:thin:#SERVICE
spring.datasource.url=jdbc:oracle:thin:#localhost:1521/SERVICE
# need the duplicate because hikari is "special", but it's the spring default
#spring.datasource.jdbc-url=jdbc:oracle:thin:#SERVICE
spring.datasource.jdbc-url=jdbc:oracle:thin:#localhost:1521/SERVICE
# HikariCP settings
spring.datasource.hikari.minimumIdle=5
spring.datasource.hikari.maximumPoolSize=5
spring.datasource.hikari.idleTimeout=30000
spring.datasource.hikari.maxLifetime=2000000
spring.datasource.hikari.connectionTimeout=30000
spring.datasource.hikari.poolName=HikariPool
# leak detection - 30 seconds/30k ms
spring.datasource.hikari.leak-detection-threshold=30000
It seems that a connection is being held. I've tried restarting my Oracle Service and Listener with no luck, and the only solution seems to be to reboot.
Can anyone assist?
Update: I removed Hikari and used this:
spring.datasource.type=org.springframework.jdbc.datasource.SimpleDriverDataSource
I no longer think this is directly related to Hikari.
Edit 2:
Here's the actual "caused by" root exception:
Caused by: oracle.net.ns.NetException: Listener refused the connection with the following error:
ORA-12514, TNS:listener does not currently know of service requested in connect descriptor
(CONNECTION_ID=Z0U0SUlgTNuHsXgisq9YSA==)
at oracle.net.ns.NSProtocolNIO.createRefusePacketException(NSProtocolNIO.java:816) ~[ojdbc8-21.5.0.0.jar:21.5.0.0.0]
at oracle.net.ns.NSProtocolNIO.handleConnectPacketResponse(NSProtocolNIO.java:396) ~[ojdbc8-21.5.0.0.jar:21.5.0.0.0]
at oracle.net.ns.NSProtocolNIO.negotiateConnection(NSProtocolNIO.java:207) ~[ojdbc8-21.5.0.0.jar:21.5.0.0.0]
at oracle.net.ns.NSProtocol.connect(NSProtocol.java:350) ~[ojdbc8-21.5.0.0.jar:21.5.0.0.0]
at oracle.jdbc.driver.T4CConnection.connect(T4CConnection.java:2372) ~[ojdbc8-21.5.0.0.jar:21.5.0.0.0]
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:657) ~[ojdbc8-21.5.0.0.jar:21.5.0.0.0]
... 85 common frames omitted
That particular CONNECTION_ID is randomized and different each time.

Disable hikari pool in development environment

In development environment, spring boot with hikari, jdbc connection is unstable, if idle for some time, then call api again, it will fail(guess the network unstable cause it, because in production environment is ok)
2022-03-08 12:13:35.571 [http-nio-9090-exec-6] WARN com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Failed to validate connection com.mysql.cj.jdbc.ConnectionImpl#72415749 (No operations allowed after connection closed.). Possibly consider using a shorter maxLifetime value.
### The error occurred while executing a query
### Cause: org.springframework.jdbc.CannotGetJdbcConnectionException: Failed to obtain JDBC Connection; nested exception is java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 30038ms.] with root cause
com.mysql.cj.exceptions.ConnectionIsClosedException: No operations allowed after connection closed.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:61)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:105)
at com.mysql.cj.exceptions.ExceptionFactory.createException(ExceptionFactory.java:151)
at com.mysql.cj.NativeSession.checkClosed(NativeSession.java:1209)
at com.mysql.cj.jdbc.ConnectionImpl.checkClosed(ConnectionImpl.java:567)
at com.mysql.cj.jdbc.ConnectionImpl.setNetworkTimeout(ConnectionImpl.java:2484)
at com.zaxxer.hikari.pool.PoolBase.setNetworkTimeout(PoolBase.java:550)
at com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:165)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:179)
at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:155)
at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:128)
at org.springframework.jdbc.datasource.DataSourceUtils.fetchConnection(DataSourceUtils.java:157)
but next time it will ok.
Because it is development environment, no performance requirements, so I want to disable connection pool, that is every time use connection just create a new connection.
SO how to config spring.datasource.hikari.XXX to disable connection pool and create new jdbc connection every time when use?
You can use different datasource as SimpleDriverDataSource
as of Spring Boot 2.3.x, the following works out of the box with no need to exclude anything:
spring.datasource.type=org.springframework.jdbc.datasource.SimpleDriverDataSource
Also you can check MySQL recommended settings

Spring Boot Micro Service Not Defined in Registry When JMS Server Not Reachable

I have a strange issue that took me several days to narrow down. Basically, I have a Jhipster project based on Spring boot Version 2.1.10.RELEASE, which contains 4 microservices. We are interested here in 2 of them: Gateway and Corehub.
In the gateway, I have an angular app that performs a POST to /services/corehub/api/someendpoint which used to be working and that is failing now with different error messages, but the one I have more regularly is
{
"type": "https://www.jhipster.tech/problem/problem-with-message",
"title": "Method Not Allowed",
"status": 405,
"detail": "Request method 'POST' not supported",
"path": "/services/ambientcorehub/api/trips",
"message": "error.http.405"
}
I ended up looking at the traces of the Registry that keeps track of the microservices for internal communication and I found out that when this error occurs, I cannot find the corehub in the traces anymore. So it looks like the corehub micro service is not registered.
An other GIT branch of this service does not have this problem, so I performed a diff between these two branches and I removed the changes until I could narrow down the problem.
So, in the corehub, I have a JMS listener based on this mq-jms-spring implementation. The maven dependency is as follows:
<dependency>
<groupId>com.ibm.mq</groupId>
<artifactId>mq-jms-spring-boot-starter</artifactId>
<version>2.2.7</version>
</dependency>
I commented out my JmsListener class and its associated JmsContext (to get access to a topic), and kept only the configuration properties defining access to the server, along with the port, channel, topic name, etc.
If I comment the above maven dependency, my service works again.
If I keep the maven dependency, with the configuration only, my corehub microservice is not registered in the Registry and becomes not accessible anymore from the gateway and thus the Angular UI.
What is important to note, is that I have currently some network issue which prevents me from accessing the JMS Server.
So I believe the exception that is raised by this IBM library because the JMS server is not reachable, breaks the registration of the microservice towards the spring boot registry.
Here are the traces that come over and over in the corehub console:
2020-07-22 08:21:45.316 WARN 7964 --- [nfoReplicator-0] o.s.boot.actuate.jms.JmsHealthIndicator : JMS health check failed
com.ibm.msg.client.jms.DetailedIllegalStateException: JMSWMQ0018: Failed to connect to queue manager '' with connection mode 'Client' and host name '172.31.14.1(9010)'.
at com.ibm.msg.client.wmq.common.internal.Reason.reasonToException(Reason.java:489)
at com.ibm.msg.client.wmq.common.internal.Reason.createException(Reason.java:215)
at com.ibm.msg.client.wmq.internal.WMQConnection.<init>(WMQConnection.java:448)
at com.ibm.msg.client.wmq.factories.WMQConnectionFactory.createV7ProviderConnection(WMQConnectionFactory.java:8475)
at com.ibm.msg.client.wmq.factories.WMQConnectionFactory.createProviderConnection(WMQConnectionFactory.java:7815)
at com.ibm.msg.client.jms.admin.JmsConnectionFactoryImpl._createConnection(JmsConnectionFactoryImpl.java:303)
at com.ibm.msg.client.jms.admin.JmsConnectionFactoryImpl.createConnection(JmsConnectionFactoryImpl.java:236)
at com.ibm.mq.jms.MQConnectionFactory.createCommonConnection(MQConnectionFactory.java:6005)
at com.ibm.mq.jms.MQConnectionFactory.createConnection(MQConnectionFactory.java:6030)
at org.springframework.jms.connection.SingleConnectionFactory.doCreateConnection(SingleConnectionFactory.java:409)
at org.springframework.jms.connection.SingleConnectionFactory.initConnection(SingleConnectionFactory.java:349)
at org.springframework.jms.connection.SingleConnectionFactory.getConnection(SingleConnectionFactory.java:327)
at org.springframework.jms.connection.SingleConnectionFactory.createConnection(SingleConnectionFactory.java:242)
at org.springframework.boot.actuate.jms.JmsHealthIndicator.doHealthCheck(JmsHealthIndicator.java:52)
at org.springframework.boot.actuate.health.AbstractHealthIndicator.health(AbstractHealthIndicator.java:82)
at org.springframework.boot.actuate.health.CompositeHealthIndicator.health(CompositeHealthIndicator.java:95)
at org.springframework.cloud.netflix.eureka.EurekaHealthCheckHandler.getHealthStatus(EurekaHealthCheckHandler.java:110)
at org.springframework.cloud.netflix.eureka.EurekaHealthCheckHandler.getStatus(EurekaHealthCheckHandler.java:106)
at com.netflix.discovery.DiscoveryClient.refreshInstanceInfo(DiscoveryClient.java:1406)
at com.netflix.discovery.InstanceInfoReplicator.run(InstanceInfoReplicator.java:117)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: com.ibm.mq.MQException: JMSCMQ0001: IBM MQ call failed with compcode '2' ('MQCC_FAILED') reason '2538' ('MQRC_HOST_NOT_AVAILABLE').
at com.ibm.msg.client.wmq.common.internal.Reason.createException(Reason.java:203)
... 24 common frames omitted
Caused by: com.ibm.mq.jmqi.JmqiException: CC=2;RC=2538;AMQ9204: Connection to host '172.31.14.1(9010)' rejected. [1=com.ibm.mq.jmqi.JmqiException[CC=2;RC=2538;AMQ9204: Connection to host '/172.31.14.1:9010' rejected. [1=java.net.ConnectException[Connection timed out: connect],3=/172.31.14.1:9010,4=TCP,5=Socket.connect]],3=172.31.14.1(9010),5=RemoteTCPConnection.bindAndConnectSocket]
at com.ibm.mq.jmqi.remote.api.RemoteFAP$Connector.jmqiConnect(RemoteFAP.java:13558)
at com.ibm.mq.jmqi.remote.api.RemoteFAP.jmqiConnect(RemoteFAP.java:1426)
at com.ibm.mq.jmqi.remote.api.RemoteFAP.jmqiConnect(RemoteFAP.java:1385)
at com.ibm.mq.ese.jmqi.InterceptedJmqiImpl.jmqiConnect(InterceptedJmqiImpl.java:377)
at com.ibm.mq.ese.jmqi.ESEJMQI.jmqiConnect(ESEJMQI.java:562)
at com.ibm.msg.client.wmq.internal.WMQConnection.<init>(WMQConnection.java:381)
... 23 common frames omitted
Caused by: com.ibm.mq.jmqi.JmqiException: CC=2;RC=2538;AMQ9204: Connection to host '/172.31.14.1:9010' rejected. [1=java.net.ConnectException[Connection timed out: connect],3=/172.31.14.1:9010,4=TCP,5=Socket.connect]
at com.ibm.mq.jmqi.remote.impl.RemoteTCPConnection.bindAndConnectSocket(RemoteTCPConnection.java:901)
at com.ibm.mq.jmqi.remote.impl.RemoteTCPConnection.protocolConnect(RemoteTCPConnection.java:1381)
at com.ibm.mq.jmqi.remote.impl.RemoteConnection.connect(RemoteConnection.java:976)
at com.ibm.mq.jmqi.remote.impl.RemoteConnectionSpecification.getNewConnection(RemoteConnectionSpecification.java:553)
at com.ibm.mq.jmqi.remote.impl.RemoteConnectionSpecification.getSessionFromNewConnection(RemoteConnectionSpecification.java:233)
at com.ibm.mq.jmqi.remote.impl.RemoteConnectionSpecification.getSession(RemoteConnectionSpecification.java:141)
at com.ibm.mq.jmqi.remote.impl.RemoteConnectionPool.getSession(RemoteConnectionPool.java:127)
at com.ibm.mq.jmqi.remote.api.RemoteFAP$Connector.jmqiConnect(RemoteFAP.java:13302)
... 28 common frames omitted
Caused by: java.net.ConnectException: Connection timed out: connect
at java.base/java.net.PlainSocketImpl.connect0(Native Method)
at java.base/java.net.PlainSocketImpl.socketConnect(PlainSocketImpl.java:101)
at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
at java.base/java.net.Socket.connect(Socket.java:609)
at java.base/java.net.Socket.connect(Socket.java:558)
at com.ibm.mq.jmqi.remote.impl.RemoteTCPConnection$4.run(RemoteTCPConnection.java:1022)
at com.ibm.mq.jmqi.remote.impl.RemoteTCPConnection$4.run(RemoteTCPConnection.java:1014)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at com.ibm.mq.jmqi.remote.impl.RemoteTCPConnection.connectSocket(RemoteTCPConnection.java:1014)
at com.ibm.mq.jmqi.remote.impl.RemoteTCPConnection.bindAndConnectSocket(RemoteTCPConnection.java:805)
... 35 common frames omitted
Here are some version numbers:
jhipster-dependencies.version: 3.0.7
Spring boot version: 2.1.10.RELEASE
ibmmq-jms-spring version(s) that are affected by this issue: Version 2.2.7
Java version (including vendor and platform): AdoptOpenJDK\jdk-11.0.6.10-hotspot
A small code sample that demonstrates the issue.
Here is my configuration in application.yml:
spring:
jms:
# Used for JMS Message reception.
isPubSubDomain: false
application:
oag:
# This can be a queue or a topic (if subdomain is defined)
# In case of a topic, sub domain must be set to public.
queueName: "BRIDGE.XXX.TO.YYY.TST"
isTopic: false
ibm:
mq:
queueManager:
channel: XXX_GWT11.BT1
connName: 172.31.14.1(9010)
user: xxxx
password:
Would it be possible to catch this exception to avoid breaking regular Spring Boot registration mechanism?
I cannot afford having my cluster down because I cannot access the JMS server.
Beside this, I opened this message on IBM MQ side here and a person suggested me to stop the JMS health indicator. So I set the following property to no avail:
management:
endpoint:
jms:
# Prevent Unreachable JMS Server from unregistering corehub from the registry, leading to unreachable microservice from the Gateway
enabled: false
Corresponding documentation is here
Any help would be greatly appreciated.
Thank you
Christophe
If you do not want your application to be considered unhealthy when JMS is down, disabling the JMS health indicator is what I would recommend. It hasn't worked for you as you have used management.endpoint.jms.enabled. The correct property to use is management.health.jms.enabled:
management:
health:
jms:
enabled: false

Lettuce in Spring Integration 5: RedisQueueMessageDrivenEndpoint no really blocking

I've just switched from Spring Boot 1.X to Spring Boot 2.0 RC1, so now I'm using Spring Integration 5, which uses Lettuce as the default Redis library.
I have a RedisQueueMessageDrivenEndpoint that I want to wait until a new messages arrives, so I set the receive timeout to 0, which means it should block until something arrives. This worked with Spring Integration 4, however, with Lettuce there is a default connection timeout of 60 seconds, so I get the following exception:
Failed to execute listening task. Will attempt to resubmit in 5000 milliseconds.
org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out
at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:70)
at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41)
at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44)
at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42)
at org.springframework.data.redis.connection.lettuce.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:257)
at org.springframework.data.redis.connection.lettuce.LettuceListCommands.convertLettuceAccessException(LettuceListCommands.java:490)
at org.springframework.data.redis.connection.lettuce.LettuceListCommands.bRPop(LettuceListCommands.java:409)
at org.springframework.data.redis.connection.DefaultedRedisConnection.bRPop(DefaultedRedisConnection.java:478)
at org.springframework.data.redis.core.DefaultListOperations$5.inRedis(DefaultListOperations.java:215)
at org.springframework.data.redis.core.AbstractOperations$ValueDeserializingRedisCallback.doInRedis(AbstractOperations.java:59)
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:224)
at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:184)
at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:95)
at org.springframework.data.redis.core.DefaultListOperations.rightPop(DefaultListOperations.java:211)
at org.springframework.data.redis.core.DefaultBoundListOperations.rightPop(DefaultBoundListOperations.java:154)
at org.springframework.integration.redis.inbound.RedisQueueMessageDrivenEndpoint.popMessageAndSend(RedisQueueMessageDrivenEndpoint.java:195)
at org.springframework.integration.redis.inbound.RedisQueueMessageDrivenEndpoint.access$200(RedisQueueMessageDrivenEndpoint.java:58)
at org.springframework.integration.redis.inbound.RedisQueueMessageDrivenEndpoint$ListenerTask.run(RedisQueueMessageDrivenEndpoint.java:340)
at org.springframework.integration.util.ErrorHandlingTaskExecutor.lambda$execute$0(ErrorHandlingTaskExecutor.java:53)
at java.lang.Thread.run(Thread.java:745)
Caused by: io.lettuce.core.RedisCommandTimeoutException: Command timed out
at io.lettuce.core.LettuceFutures.awaitOrCancel(LettuceFutures.java:114)
at io.lettuce.core.AbstractRedisAsyncCommands.select(AbstractRedisAsyncCommands.java:1185)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at io.lettuce.core.FutureSyncInvocationHandler.handleInvocation(FutureSyncInvocationHandler.java:52)
at io.lettuce.core.internal.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:80)
at com.sun.proxy.$Proxy156.select(Unknown Source)
at org.springframework.data.redis.connection.lettuce.LettuceConnection.getDedicatedConnection(LettuceConnection.java:888)
at org.springframework.data.redis.connection.lettuce.LettuceListCommands.bRPop(LettuceListCommands.java:407)
... 13 common frames omitted
What is the recommended way with Lettuce to have a blocking brpop operation without running into timeouts (and getting this execptions)? I could increase the connection timeout using spring.redis.timeout, but this would affect all connections.
Lettuce is a non-blocking client based on netty and NIO. The latter two have some significance as socket read timeout (known from SocketInputStream) is not applicable.
Lettuce exposes a command timeout to protect synchronous calls against infinite wait. It makes sense to have a slight lower timeout on RedisQueueMessageDrivenEndpoint (say 50 seconds if your spring.redis.timeout is 60 seconds).
The stack trace above shows a timeout in the SELECT command. This happens before BRPOP. It looks as if spring.redis.timeout was set to 0 (no timeout at all).

How to safely resume cache operation on client side after Hazelcast restart?

Whenever I restart hazelcast server, without restarting client in spring boot. I'm getting following error :
03-01-2018 16:44:17.966 [http-nio-8080-exec-7] ERROR o.a.c.c.C.[.[.[.[dispatcherServlet].log - Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is com.hazelcast.client.HazelcastClientNotActiveException: Partition does not have owner. partitionId : 203] with root cause
java.io.IOException: Partition does not have owner. partitionId : 203
at com.hazelcast.client.spi.impl.ClientSmartInvocationServiceImpl.invokeOnPartitionOwner(ClientSmartInvocationServiceImpl.java:43)
at com.hazelcast.client.spi.impl.ClientInvocation.invokeOnSelection(ClientInvocation.java:142)
at com.hazelcast.client.spi.impl.ClientInvocation.invoke(ClientInvocation.java:122)
at com.hazelcast.client.spi.ClientProxy.invokeOnPartition(ClientProxy.java:152)
at com.hazelcast.client.spi.ClientProxy.invoke(ClientProxy.java:147)
at com.hazelcast.client.proxy.ClientMapProxy.getInternal(ClientMapProxy.java:245)
at com.hazelcast.client.proxy.ClientMapProxy.get(ClientMapProxy.java:240)
at com.hazelcast.spring.cache.HazelcastCache.lookup(HazelcastCache.java:139)
at com.hazelcast.spring.cache.HazelcastCache.get(HazelcastCache.java:57)
at org.springframework.cache.interceptor.AbstractCacheInvoker.doGet(AbstractCacheInvoker.java:71)
If I enabled hot-restart, the issue is solved. But is there a way to resume client application without restarting it and hot-restart is disabled ?
Hazelcast client tries to reconnect to the cluster if the connection drops. It uses ClientNetworkConfig.connectionAttemptLimit and ClientNetworkConfig.connectionAttemptPeriod elements to configure how frequently it will try. connectionAttemptLimit defines the number of attempts on a disconnection and connectionAttemptPeriod defines the period between two retries in ms. Please see the usage example below:
ClientConfig clientConfig = new ClientConfig();
clientConfig.getNetworkConfig().setConnectionAttemptLimit(5);
clientConfig.getNetworkConfig().setConnectionAttemptPeriod(5000);
Starting with Hazelcast 3.9, you can use reconnect-mode property to configure how the client will reconnect to the cluster after it disconnects. It has three options:
The option OFF disables the reconnection.
ON enables reconnection in a blocking manner where all the waiting invocations will be blocked until a cluster connection is established or failed.
The option ASYNC enables reconnection in a non-blocking manner where all the waiting invocations will receive a HazelcastClientOfflineException.
Its default value is ON. You can see a configuration example below:
ClientConfig clientConfig = new ClientConfig();
clientConfig.getConnectionStrategyConfig()
.setReconnectMode(ClientConnectionStrategyConfig.ReconnectMode.ON);
By using these configuration elements, you can resume your client without restarting it.

Resources