spring kafka cloud stream: limit retry attempts in batch mode - spring

When exeption is thrown while consuming message spring tries to read the same message on and on and consuming of other messages basicly stops. I've tried setting defaultRetryable and retryableExceptions properties like this:
spring:
cloud.stream:
bindings:
consumer-in-0:
consumer:
defaultRetryable: false
retryable-exceptions:
org.springframework.dao.DataIntegrityViolationException: false
as written here https://docs.spring.io/spring-cloud-stream/docs/3.1.0/reference/html/spring-cloud-stream.html#_retry_template_and_retrybackoff
but it had no effect, how can I disable repeated attempts of reading failed message or limit number of such attempts?
update
looking at spring source KafkaMessageChannelBinder:
protected MessageProducer createConsumerEndpoint() {
// ...
if (!extendedConsumerProperties.isBatchMode()
&& extendedConsumerProperties.getMaxAttempts() > 1
&& transMan == null) {
kafkaMessageDrivenChannelAdapter
.setRetryTemplate(buildRetryTemplate(extendedConsumerProperties));
so it looks like mentioned properties work only when not using batch mode, which is my case (batch==true). Wonder how I can handle retries in batch mode?.

Batch listeners retry forever, by default, because the framework can't tell which record in the batch failed.
It's best to handle errors in batch mode in the listener itself.
You can add a ListenerContainerCustomizer bean to configure a different BatchErrorHandler. See https://docs.spring.io/spring-kafka/docs/current/reference/html/#annotation-error-handling for options.

Related

Error Handling with Apache Camel and ActiveMQ - so breaking out of pipeline for exchange

I've been back and forth with an issue on our system that even with some research around the forums and several tests, we can't seem to be able to fix.
I'll try to be as clear as I can with what we are dealing with
We have a main service with a route that reads from an activemq queue ( spring boot with embedded broker ) sends it to a Route(B) and then ships everything to a final Route(C) . Route(B) is on a dependency of the service.
Camel Version: 3.3.0
Spring-boot version: 2.3.3.RELEASE
Route A:
onException(Exception::class.java)
.handled(true)
.bean("foo.ErrorProcessor", "processError")
from("activemq:queue:myqueue")
.routeId("myroute")
.to("direct:my_external_route")
.to(ExchangePattern.InOnly,"direct:myroute_result")
Route B:
onException(Exception::class.java)
.handled(true)
.bean("foo.ErrorProcessor", "processError")
from("direct:my_external_route")
.routeId("my_external_route")
.process {something()} //This processor can throw exceptions that are treated in our processor
Route C:
from("direct:myroute_result")
.process(someProcess())
.to(ExchangePattern.InOnly,"activemq:queue:results_queue")
Spring Boot activemq configs
spring:
jmx:
enabled: true
activemq:
broker-url: vm://localhost?broker.persistent=false,useShutdownHook=false
in-memory: true
non-blocking-redelivery: true
packages:
trust-all: false
trusted: com.mypackage
pool:
block-if-full: true
block-if-full-timeout: -1
enabled: false
idle-timeout: 30000
max-connections: 10
time-between-expiration-check: -1
use-anonymous-producers: true
Everything runs very well and smoothly when B's processors do not throw exceptions. When it does, even though they are being treated and a normal object is being returned in the message body, all we have on the logs is
2021-04-10 15:33:32.354 DEBUG [#1 - JmsConsumer[consumerName]] o.a.c.p.Pipeline
: Message exchange has failed: so breaking out of pipeline for exchange: Exchange[ID-1234] Handled by the error handler. {}
We even added a default error handler to our activemq connection factory but nothing happens there as well. We have a DLQ consumer who also does not seems to get anything. The error processor on routeA also does not catches anything which is expected since the exception was handled previously.
Has anyone ever had this issue or similar ? I know that some issues between Camel and the JMS component regarding error handling were raised in the past but we are struggling to understand what is the root of this issue.
Thanks in advance,
Pedro
Probably what you are looking for is the continued option on your Route B exception clause. This option allows you to continue routing to the original route as if the exception did not occur. Do not use the handled option as it will not allow routing to the original route but break out.
So your Route B should be defined as something like this:
onException(Exception::class.java) .continued(true)
.bean("foo.ErrorProcessor", "processError")
from("direct:my_external_route")
.routeId("my_external_route")
.process {something()}
Refer the camel documentation for more details: CAMEL EXCEPTION CLAUSE

Recovering Kafka clients (consumers/producers) after they went down

At the company i work with we use Spring for Kafka without authentication and lately we did some experiments to setup the security in Kafka and we enabled authentication for a brief moment which cause a crush in all our consumers/producers within our microservices ! (the microservices stayed up)
The exception :
Authorization Exception and no authorizationExceptionRetryInterval set
org.apache.kafka.common.errors.GroupAuthorizationException: Not authorized to access group: foo-group
after some researchs we found out that this is the expected behavior by kafka clients and we needed to set the authorizationExceptionRetryInterval property
public void setAuthorizationExceptionRetryInterval​(java.time.Duration authorizationExceptionRetryInterval)
Set the interval between retries after AuthorizationException is
thrown by KafkaConsumer. By default the field is null and retries are
disabled. In such case the container will be stopped. The interval
must be less than max.poll.interval.ms consumer property.
Here is some other useful links
Setting authorizationExceptionRetryInterval for Spring Kafka
Why does the spring KafkaConsumer suspend all consumption from n topics when one fails to authorize
What i want to know is :
Is a failed authentication the only case when
consumers/producers goes down ?
If there are some other cases, how to make sure that our
consumers/producers recover without human intervention (restarting
the microservices) ? In other word how to check if the
consumers/producers are up and restart them otherwise ?
Containers are stopped only under the following circumstances:
AuthorizationException with no authorizationExceptionRetryInterval
NoOffsetForPartitionException - thrown when ConsumerConfig.AUTO_OFFSET_RESET_CONFIG is not earliest or latest and there is no existing offset for a partition with this consumer group.
FencedInstanceIdException - using transactions and static group members (meaning some other instance is using this instance id).
StopAfterFenceException - when stopContainerWhenFenced is true (default false) - only applies with transactions
Any Error (such as OOME)

Netty - EventLoop Queue Monitoring

I am using Netty server for a Spring boot application. Is there anyway to monitor the Netty server queue size so that we will come to know if the queue is full and server is not able to accept any new request? Also, Is there any logging by netty server if the queue is full or unable to accept a new request?
Netty does not have any logging for that purpose but I implemented a way to find pending tasks and put some logs according to your question. here is a sample log from my local
you can find all code here https://github.com/ozkanpakdil/spring-examples/tree/master/reactive-netty-check-connection-queue
About code which is very explanatory from itself but NettyConfigure is actually doing the netty configuration in spring boot env. at https://github.com/ozkanpakdil/spring-examples/blob/master/reactive-netty-check-connection-queue/src/main/java/com/mascix/reactivenettycheckconnectionqueue/NettyConfigure.java#L46 you can see "how many pending tasks" in the queue. DiscardServerHandler may help you how to discard if the limit is full. You can use jmeter for the test here is the jmeter file https://github.com/ozkanpakdil/spring-examples/blob/master/reactive-netty-check-connection-queue/PerformanceTestPlanMemoryThread.jmx
if you want to handle netty limit you can do it like the code below
#Override
public void channelActive(ChannelHandlerContext ctx) throws Exception {
totalConnectionCount.incrementAndGet();
if (ctx.channel().isWritable() == false) { // means we hit the max limit of netty
System.out.println("I suggest we should restart or put a new server to our pool :)");
}
super.channelActive(ctx);
}
You should check https://stackoverflow.com/a/49823055/175554 for handling the limits and here is another explanation about "isWritable" https://stackoverflow.com/a/44564482/175554
One more extra, I put actuators in the place http://localhost:8080/actuator/metrics/http.server.requests is nice to check too.

Starting Spring Boot application without check for Kafka Server

I have got an application that uses SpringBoot 2.10.0.Release and kafka in the version 2.10.0. The application has got a simple producer and consumer: The sender works with KafkaTemplate and the consumer with KafkaListener.
What I try to achieve is to be able to start the SpringBoot application even if the KafkaServer is not running.
Currently without a running KafkaBroker the application cannot be started with this error message:
org.springframework.context.ApplicationContextException:
Failed to start bean 'org.springframework.kafka.config.internalKafkaListenerEndpointRegistry';
nested exception is org.apache.kafka.common.errors.TimeoutException
Is there a way to achieve this and if yes could anybody give me hint or a keyword how to manage this?
When running the Spring-Boot application with a KafkaListener, the listener will per default try to listen to Kafka. If the KafkaBroker is invalid or missing, then you will get a org.apache.kafka.common.KafkaException.
You can change the default behaviour of the container factory by setting the autoStartup property to false. One way to do this is by adding autoStartup = "false" element to your KafkaListener annotation:
#KafkaListener(topics = "some_topic", autoStartup = "false")
public void fooEventListener(){
Now your spring boot application will start. You will still get an error when trying to use the KafkaListener if the broker is still down or invalid, but you will now be able to handle the error within your Java code instead of a Spring Boot server crash.
Documentation about KafkaListner autoStartup element.
It have to be mentioned that the error you are receiving (TimeoutException) is not because the broker is down, it is what Kafka will throw if the buffer is full.
The batch records will then be removed from the queue and will not be delivered to the broker. This error will not be the reason for you application using Kafka not to start.

What's the correct exception type to NACK a message using annotation based listener in spring-boot with amqp?

I'm using spring boot with spring-amqp and annotation based listener to consume message from a rabbitmq broker.
I've a spring component which contains a method like this:
#RabbitListener(queues = "tasks")
public void receiveMessage(#Payload Task task) {...}
I'm using the AUTO mode to acknowledge messages after successful execution of receiveMessage(...). If i detect a special error, i'm throwing AmqpRejectAndDontRequeueException to get this message into a configured dead letter queue. Now i need to nack a message only, so that the message gets requeued into the main queue of rabbitmq and another consumer has the possibility to work on that message again.
Which exception should i throw for that? I wouldn't like to use channel.basicNack(...) like described here (http://docs.spring.io/spring-integration/reference/html/amqp.html) if possible.
As long as defaultRequeueRejected is true (the default) in the container factory, throwing any exception other than AmqpRejectAndDontRequeueException will cause the message to be rejected and requeued.
The exception must not have a AmqpRejectAndDontRequeueException in its cause chain (the container traverses the causes to ensure there is no such exception).

Resources