Recovering Kafka clients (consumers/producers) after they went down - spring-boot

At the company i work with we use Spring for Kafka without authentication and lately we did some experiments to setup the security in Kafka and we enabled authentication for a brief moment which cause a crush in all our consumers/producers within our microservices ! (the microservices stayed up)
The exception :
Authorization Exception and no authorizationExceptionRetryInterval set
org.apache.kafka.common.errors.GroupAuthorizationException: Not authorized to access group: foo-group
after some researchs we found out that this is the expected behavior by kafka clients and we needed to set the authorizationExceptionRetryInterval property
public void setAuthorizationExceptionRetryInterval​(java.time.Duration authorizationExceptionRetryInterval)
Set the interval between retries after AuthorizationException is
thrown by KafkaConsumer. By default the field is null and retries are
disabled. In such case the container will be stopped. The interval
must be less than max.poll.interval.ms consumer property.
Here is some other useful links
Setting authorizationExceptionRetryInterval for Spring Kafka
Why does the spring KafkaConsumer suspend all consumption from n topics when one fails to authorize
What i want to know is :
Is a failed authentication the only case when
consumers/producers goes down ?
If there are some other cases, how to make sure that our
consumers/producers recover without human intervention (restarting
the microservices) ? In other word how to check if the
consumers/producers are up and restart them otherwise ?

Containers are stopped only under the following circumstances:
AuthorizationException with no authorizationExceptionRetryInterval
NoOffsetForPartitionException - thrown when ConsumerConfig.AUTO_OFFSET_RESET_CONFIG is not earliest or latest and there is no existing offset for a partition with this consumer group.
FencedInstanceIdException - using transactions and static group members (meaning some other instance is using this instance id).
StopAfterFenceException - when stopContainerWhenFenced is true (default false) - only applies with transactions
Any Error (such as OOME)

Related

Spring Boot cannot config database connection down to zero

I am deploying my Spring Boot REST API on AWS Fargate, which connects to AWS Aurora Postgresql Serverless V1.
I have configured the Aurora to scale the ACU to 0 when idle as in the following picture, so that I am not charge too much when I don't use the API.
Initially, my Spring Boot App maintains 10 idle connections by default, so I have tried to make it zero by adding the this to application.properties
spring.datasource.minimumIdle=0
And then I see from AWS console that the database connection has been reduced. But it remains 1 connection forever.
Please help suggest if you know how to make it zero.
The Spring Boot database configuration is basically like this
#Bean
#ConfigurationProperties(prefix = "spring.datasource")
public DataSource dataSource() {
return DataSourceBuilder.create().build();
}
Edit 1
I used the suggestion in the comment to check if the connection really comes from Spring Boot.
It turns out there is no active connection but /actuator/metrics/hikaricp.connections.idle always return the value of 1
{"name":"hikaricp.connections.idle","description":"Idle connections","baseUnit":null,"measurements":[{"statistic":"VALUE","value":1.0}],"availableTags":[{"tag":"pool","values":["HikariPool-1"]}]}
And it seems does not relate to health check because I have tried running it locally and the result of /actuator/metrics/hikaricp.connections.idle remains 1.
I set logging.level.root = trace to see what is happening.
There are only 2 things keep printing in the log periodically
The Hikari connection report, showing 1 idle connection
{"level":"DEBUG","ref":"|","marker":"INTERNAL","message":"HikariPool-1 - Before cleanup stats (total=1, active=0, idle=1, waiting=0)","logger":"com.zaxxer.hikari.pool.HikariPool","timestamp":"2022-06-14 16:15:16.799","thread":"HikariPool-1 housekeeper"}
{"level":"DEBUG","ref":"|","marker":"INTERNAL","message":"HikariPool-1 - After cleanup stats (total=1, active=0, idle=1, waiting=0)","logger":"com.zaxxer.hikari.pool.HikariPool","timestamp":"2022-06-14 16:15:16.800","thread":"HikariPool-1 housekeeper"}
{"level":"DEBUG","ref":"|","marker":"INTERNAL","message":"HikariPool-1 - Fill pool skipped, pool is at sufficient level.","logger":"com.zaxxer.hikari.pool.HikariPool","timestamp":"2022-06-14 16:15:16.800","thread":"HikariPool-1 housekeeper"}
Tomcat NioEndpoint, but I think it is not relevant
{"level":"DEBUG","ref":"|","marker":"INTERNAL","message":"timeout completed: keys processed=0; now=1655198117181; nextExpiration=1655198117180; keyCount=0; hasEvents=false; eval=false","logger":"org.apache.tomcat.util.net.NioEndpoint","timestamp":"2022-06-14 16:15:17.181","thread":"http-nio-8445-Poller"}
Thanks to the suggestion in the comment, this is because of the actuator health check, which can be solved by the following settings
management.health.db.enabled=false

Error Handling with Apache Camel and ActiveMQ - so breaking out of pipeline for exchange

I've been back and forth with an issue on our system that even with some research around the forums and several tests, we can't seem to be able to fix.
I'll try to be as clear as I can with what we are dealing with
We have a main service with a route that reads from an activemq queue ( spring boot with embedded broker ) sends it to a Route(B) and then ships everything to a final Route(C) . Route(B) is on a dependency of the service.
Camel Version: 3.3.0
Spring-boot version: 2.3.3.RELEASE
Route A:
onException(Exception::class.java)
.handled(true)
.bean("foo.ErrorProcessor", "processError")
from("activemq:queue:myqueue")
.routeId("myroute")
.to("direct:my_external_route")
.to(ExchangePattern.InOnly,"direct:myroute_result")
Route B:
onException(Exception::class.java)
.handled(true)
.bean("foo.ErrorProcessor", "processError")
from("direct:my_external_route")
.routeId("my_external_route")
.process {something()} //This processor can throw exceptions that are treated in our processor
Route C:
from("direct:myroute_result")
.process(someProcess())
.to(ExchangePattern.InOnly,"activemq:queue:results_queue")
Spring Boot activemq configs
spring:
jmx:
enabled: true
activemq:
broker-url: vm://localhost?broker.persistent=false,useShutdownHook=false
in-memory: true
non-blocking-redelivery: true
packages:
trust-all: false
trusted: com.mypackage
pool:
block-if-full: true
block-if-full-timeout: -1
enabled: false
idle-timeout: 30000
max-connections: 10
time-between-expiration-check: -1
use-anonymous-producers: true
Everything runs very well and smoothly when B's processors do not throw exceptions. When it does, even though they are being treated and a normal object is being returned in the message body, all we have on the logs is
2021-04-10 15:33:32.354 DEBUG [#1 - JmsConsumer[consumerName]] o.a.c.p.Pipeline
: Message exchange has failed: so breaking out of pipeline for exchange: Exchange[ID-1234] Handled by the error handler. {}
We even added a default error handler to our activemq connection factory but nothing happens there as well. We have a DLQ consumer who also does not seems to get anything. The error processor on routeA also does not catches anything which is expected since the exception was handled previously.
Has anyone ever had this issue or similar ? I know that some issues between Camel and the JMS component regarding error handling were raised in the past but we are struggling to understand what is the root of this issue.
Thanks in advance,
Pedro
Probably what you are looking for is the continued option on your Route B exception clause. This option allows you to continue routing to the original route as if the exception did not occur. Do not use the handled option as it will not allow routing to the original route but break out.
So your Route B should be defined as something like this:
onException(Exception::class.java) .continued(true)
.bean("foo.ErrorProcessor", "processError")
from("direct:my_external_route")
.routeId("my_external_route")
.process {something()}
Refer the camel documentation for more details: CAMEL EXCEPTION CLAUSE

Netty - EventLoop Queue Monitoring

I am using Netty server for a Spring boot application. Is there anyway to monitor the Netty server queue size so that we will come to know if the queue is full and server is not able to accept any new request? Also, Is there any logging by netty server if the queue is full or unable to accept a new request?
Netty does not have any logging for that purpose but I implemented a way to find pending tasks and put some logs according to your question. here is a sample log from my local
you can find all code here https://github.com/ozkanpakdil/spring-examples/tree/master/reactive-netty-check-connection-queue
About code which is very explanatory from itself but NettyConfigure is actually doing the netty configuration in spring boot env. at https://github.com/ozkanpakdil/spring-examples/blob/master/reactive-netty-check-connection-queue/src/main/java/com/mascix/reactivenettycheckconnectionqueue/NettyConfigure.java#L46 you can see "how many pending tasks" in the queue. DiscardServerHandler may help you how to discard if the limit is full. You can use jmeter for the test here is the jmeter file https://github.com/ozkanpakdil/spring-examples/blob/master/reactive-netty-check-connection-queue/PerformanceTestPlanMemoryThread.jmx
if you want to handle netty limit you can do it like the code below
#Override
public void channelActive(ChannelHandlerContext ctx) throws Exception {
totalConnectionCount.incrementAndGet();
if (ctx.channel().isWritable() == false) { // means we hit the max limit of netty
System.out.println("I suggest we should restart or put a new server to our pool :)");
}
super.channelActive(ctx);
}
You should check https://stackoverflow.com/a/49823055/175554 for handling the limits and here is another explanation about "isWritable" https://stackoverflow.com/a/44564482/175554
One more extra, I put actuators in the place http://localhost:8080/actuator/metrics/http.server.requests is nice to check too.

How to safely resume cache operation on client side after Hazelcast restart?

Whenever I restart hazelcast server, without restarting client in spring boot. I'm getting following error :
03-01-2018 16:44:17.966 [http-nio-8080-exec-7] ERROR o.a.c.c.C.[.[.[.[dispatcherServlet].log - Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is com.hazelcast.client.HazelcastClientNotActiveException: Partition does not have owner. partitionId : 203] with root cause
java.io.IOException: Partition does not have owner. partitionId : 203
at com.hazelcast.client.spi.impl.ClientSmartInvocationServiceImpl.invokeOnPartitionOwner(ClientSmartInvocationServiceImpl.java:43)
at com.hazelcast.client.spi.impl.ClientInvocation.invokeOnSelection(ClientInvocation.java:142)
at com.hazelcast.client.spi.impl.ClientInvocation.invoke(ClientInvocation.java:122)
at com.hazelcast.client.spi.ClientProxy.invokeOnPartition(ClientProxy.java:152)
at com.hazelcast.client.spi.ClientProxy.invoke(ClientProxy.java:147)
at com.hazelcast.client.proxy.ClientMapProxy.getInternal(ClientMapProxy.java:245)
at com.hazelcast.client.proxy.ClientMapProxy.get(ClientMapProxy.java:240)
at com.hazelcast.spring.cache.HazelcastCache.lookup(HazelcastCache.java:139)
at com.hazelcast.spring.cache.HazelcastCache.get(HazelcastCache.java:57)
at org.springframework.cache.interceptor.AbstractCacheInvoker.doGet(AbstractCacheInvoker.java:71)
If I enabled hot-restart, the issue is solved. But is there a way to resume client application without restarting it and hot-restart is disabled ?
Hazelcast client tries to reconnect to the cluster if the connection drops. It uses ClientNetworkConfig.connectionAttemptLimit and ClientNetworkConfig.connectionAttemptPeriod elements to configure how frequently it will try. connectionAttemptLimit defines the number of attempts on a disconnection and connectionAttemptPeriod defines the period between two retries in ms. Please see the usage example below:
ClientConfig clientConfig = new ClientConfig();
clientConfig.getNetworkConfig().setConnectionAttemptLimit(5);
clientConfig.getNetworkConfig().setConnectionAttemptPeriod(5000);
Starting with Hazelcast 3.9, you can use reconnect-mode property to configure how the client will reconnect to the cluster after it disconnects. It has three options:
The option OFF disables the reconnection.
ON enables reconnection in a blocking manner where all the waiting invocations will be blocked until a cluster connection is established or failed.
The option ASYNC enables reconnection in a non-blocking manner where all the waiting invocations will receive a HazelcastClientOfflineException.
Its default value is ON. You can see a configuration example below:
ClientConfig clientConfig = new ClientConfig();
clientConfig.getConnectionStrategyConfig()
.setReconnectMode(ClientConnectionStrategyConfig.ReconnectMode.ON);
By using these configuration elements, you can resume your client without restarting it.

Spring Kafka disabling listening from a list of topics

We use spring kafka configuration to receive messages from upstream systems.
We have java configuration for topic configuration
#Bean(id="firstcontainer")
protected ConcurrentMessageListenerContainer createContainerInstance(...) {
//topics addition
}
#Bean(id="secondcontainer")
protected ConcurrentMessageListenerContainer createContainerInstance(...) {
//topics addition
}
#KafkaListener(firstcontainer)
public void listenerFirst(){
}
#KafkaListener(secondcontainer)
public void listenerSecond(){
}
This code works perfectly fine as we have seperate containerfactory.
Now we have requirement to spin up mulitple instances of this application where one instance will listen to firstContainer and secondContainer will be disabled
And For second instance, it will only enable secondContainer and disable firstContainer.
Can someone help to understand if it is possible to disable listening from a topic(list of topics)?
Your two instances (or many) can be identical and accept topic list from the external configuration. The #KafkaListener allows to do that.
There is Spring #Profile functionality, if you still want to keep several beans in your application. This way you should sever your #KafkaListener method to different classes and mark their component with an appropriate #Profile, which, again, can be activated externally.
The Apache Kafka has a concept as Consumer Group meaning that all consumers in the same group are joining to the broker, but only one of them will consume records from single partition in the topic. This way independently of the number of instances of your application you still will have a consistency because there is nothing to worry about duplicates in case of proper Kafka groups usage.

Resources