I've been trying to implement retry logic for Spring cloud stream kafka such that if an exception is throw when producing an event to the topic sample-topic, It retries two more time.
I added in the following configuration to the application.properties file
spring.cloud.stream.bindings.processSampleEvent.destination=sample-topic
spring.cloud.stream.bindings.processSampleEvent.content-type=application/json
spring.cloud.stream.bindings.processSampleEvent.consumer.maxAttempts=2
I've written the lister code in way that it simply logs the received message and throws a NullPointerException so that I can test out the retry.
#StreamListener(ListenerBind.SAMPLE_CHANNEL)
public void processSampleEvent(String productEventDto) {
System.out.println("Entering listener: " + productEventDto);
throw new NullPointerException();
}
But when testing out by producing an event to the sample-topic, I see that in the logs the event has been retries 20 times but I've specified in the properties to try only two time and also a weird thing happens when I change to it 3 times, It retries 30 times.
I'm pretty new to Spring cloud streams and any help on this would be really helpful.
Thanks in Advance 😊
The default error handler in the listener container is now a SeekToCurrentErrorHandler with 10 delivery attempts.
You can either disable the retries in the binder, and configure a STCEH with the retry semantics you want, or use retries in the binder and replace the default error handler with a simple LoggingErrorHandler.
To configure the container's error handler, add a ListenerContainerCustomizer<AbstractKafkaListenerContainerFactory> #Bean.
I faced the same problem.
My working solution was to create a ListenerContainerCustomizer Bean, give it desired number of max attempts, and set consumer binding maxAttempts: 1
#Bean
public ListenerContainerCustomizer<AbstractMessageListenerContainer<?,?>> listenerContainerCustomizer(){
return (container, dest, group) ->
container.setErrorHandler(containerAwareErrorHandler());
}
public SeekToCurrentErrorHandler containerAwareErrorHandler(){
return new SeekToCurrentErrorHandler(new FixedBackOff(0, maxAttempts-1);
}
Related
I am using spring boot (version 2.7.1) with spring cloud stream kafka binder (2.8.5) for processing Kafka messages
I've functional style consumer that consumes messages in batches. Right now its retrying 10 times and commits the offset for errored records.
I want now to introduce the mechanism of retry for certain numbers (works using below error handler) then stop processing messages and fail entire batch messages without auto committing offset.
I read through the documents and understand that CommonContainerStoppingErrorHandler can be used for stopping the container from consuming messages.
My handler looks below now and its retries exponentially.
#Bean
public ListenerContainerCustomizer<AbstractMessageListenerContainer<String, Message>> errorHandler() {
return (container, destinationName, group) -> {
container.getContainerProperties().setAckMode(ContainerProperties.AckMode.BATCH);
ExponentialBackOffWithMaxRetries backOffWithMaxRetries = new ExponentialBackOffWithMaxRetries(2);
backOffWithMaxRetries.setInitialInterval(1);
backOffWithMaxRetries.setMultiplier(2.0);
backOffWithMaxRetries.setMaxInterval(5);
container.setCommonErrorHandler(new DefaultErrorHandler(backOffWithMaxRetries));
};
}
How do I chain CommonContainerStoppingErrorHandler along with above error handler, so the failed batch is not commited and replayed upon restart ?
with BatchListenerFailedException from consumer, it is possible to fail entire batch (including one or other valid records before any problematic record in that batch) ?
Add a custom recoverer to the error handler - see this answer for an example: How do you exit spring boot application programmatically when retries are exhausted, to prevent kafka offset commit
No; records before the failed one will have their offsets committed.
I am trying to implement DLQ using spring cloud stream with Batch mode enabled
#Bean
public ListenerContainerCustomizer<AbstractMessageListenerContainer<?, ?>> customizer(BatchErrorHandler handler) {
return ((container, destinationName, group) -> {
if(dlqEnabledTopic.contains(destinationName))
container.setBatchErrorHandler(handler);});
}
#Bean
public BatchErrorHandler batchErrorHandler(KafkaOperations<String, byte[]> kafkaOperations) {
CustomDeadLetterPublishingRecoverer recoverer = new CustomDeadLetterPublishingRecoverer(kafkaOperations,
(cr, e) -> new TopicPartition(cr.topic()+"_dlq", cr.partition()));
return new RecoveringBatchErrorHandler(recoverer, new FixedBackOff(1000, 1));
}
but have a few queries:
how to configure key/value Serializer using properties - my message is String type but KafkaOperations is using ByteArraySerializer
In the batch multiple messages are there , but if first message failed it went to DLQ but don't see the processing of next message.
Requirement - at any index if batch fails, I need only that message to be sent to DLQ and rest of the message should be processed again.
Is DLQ now supported with batch mode now ? just like with record mode it can be enabled using properties
spring.kafka.producer.* properties - however, the DLT publishing should use the same serializers as the main stream app. ByteArraySerializer is generally correct.
The recovering batch error handler will perform seeks for the unprocessed records and they will be returned. Debug logging should help you figure out what's wrong. If you can't figure it out, provide an MCRE that exhibits the behavior you are seeing.
No; the binder does not support DLQ for batch mode; configuring the error handler is the correct approach.
We have implemented sqslistner as the documentation suggests, the best way to receive AWS SQS message Cloud Spring Doc.
There are two ways for receiving SQS messages, either use the receive
methods of the QueueMessagingTemplate or with annotation-driven
listener endpoints. The latter is by far the more convenient way to
receive messages.
Everything is working as expected. If business process failed, we throw a runtime exception. The particular message is sent back to the SQS queue for retry. When visibility timeout passed the message reappears to the worker for processing.
Sample Code is here:
#SqsListener(value="sample-standard-queue",deletionPolicy = SqsMessageDeletionPolicy.ON_SUCCESS)
public void receiveMessage(String message) {
log.info("Message Received **************************** "+message );
log.info("After Conversion"+new JSONObject(message).getString("payload"));
throw new RuntimeException("An exception was thrown during the execution of the SQS listener method and Message will be still available in Queue");
}
But there are some examples where "Acknowledgment" is used instead of throwing run time exception. Documentation doesn't suggest that.
Which one is the best way to deal with a business logic failure scenario?Is Acknowledgment necessary?
Thanks in advance.
One way is to keep a track of messages being processed in some RDS table. If any message gets retried then increase the retry count in the table for that particular message.
There should be some configured numbers of retries that you want to retry one particular message and then you may want to move that to a dead-letter-queue or you may log it and just simply discard it.
There can be multiple ways of handling it: One way can be:
#SqsListener(value="sample-standard-queue",deletionPolicy = SqsMessageDeletionPolicy.ON_SUCCESS)
public void receiveMessage(String message) {
try{
log.info("Message Received **************************** "+message );
log.info("After Conversion"+new JSONObject(message).getString("payload"));
}catch(Exception e){
// check if its retry count has exhausted or not
// if exhausted - then acknowledge it (push it into dead-letter-queue) and dont throw the exception
// If not exhausted - increase the retry count in the table before throwing exception
throw new RuntimeException("An exception was thrown during the execution of the SQS listener method and Message will be still available in Queue");
}
}
I'm hoping this is a simple configuration issue but I can't seem to figure out what it might be.
Set-up
Spring-Boor 2.2.2.RELEASE
cloud-starter
cloud-starter-aws
spring-jms
spring-cloud-dependencies Hoxton.SR1
amazon-sqs-java-messaging-lib 1.0.8
Problem
My application starts up fine and begins to process messages from Amazon SQS. After some amount of time I see the following warning
2020-02-01 04:16:21.482 LogLevel=WARN 1 --- [ecutor-thread14] o.s.j.l.DefaultMessageListenerContainer : Number of scheduled consumers has dropped below concurrentConsumers limit, probably due to tasks having been rejected. Check your thread pool configuration! Automatic recovery to be triggered by remaining consumers.
The above warning gets printed multiple times and eventually I see the following two INFO messages
2020-02-01 04:17:51.552 LogLevel=INFO 1 --- [ecutor-thread40] c.a.s.javamessaging.SQSMessageConsumer : Shutting down ConsumerPrefetch executor
2020-02-01 04:18:06.640 LogLevel=INFO 1 --- [ecutor-thread40] com.amazon.sqs.javamessaging.SQSSession : Shutting down SessionCallBackScheduler executor
The above 2 messages will display several times and at some point no more messages are consumed from SQS. I don't see any other messages in my log to indicate an issue, but I get no messages from my handlers that they are processing messages (I have 2~) and I can see the AWS SQS queue growing in the number of messages and the age.
~: This exact code was working fine when I had a single handler, this problem started when I added the second one.
Configuration/Code
The first "WARNing" I realize is caused by the currency of the ThreadPoolTaskExecutor, but I can not get a configuration which works properly. Here is my current configuration for the JMS stuff, I have tried various levels of max pool size with no real affect other than the warings start sooner or later based on the pool size
public ThreadPoolTaskExecutor asyncAppConsumerTaskExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setThreadGroupName("asyncConsumerTaskExecutor");
taskExecutor.setThreadNamePrefix("asyncConsumerTaskExecutor-thread");
taskExecutor.setCorePoolSize(10);
// Allow the thread pool to grow up to 4 times the core size, evidently not
// having the pool be larger than the max concurrency causes the JMS queue
// to barf on itself with messages like
// "Number of scheduled consumers has dropped below concurrentConsumers limit, probably due to tasks having been rejected. Check your thread pool configuration! Automatic recovery to be triggered by remaining consumers"
taskExecutor.setMaxPoolSize(10 * 4);
taskExecutor.setQueueCapacity(0); // do not queue up messages
taskExecutor.setWaitForTasksToCompleteOnShutdown(true);
taskExecutor.setAwaitTerminationSeconds(60);
return taskExecutor;
}
Here is the JMS Container Factory we create
public DefaultJmsListenerContainerFactory jmsListenerContainerFactory(SQSConnectionFactory sqsConnectionFactory, ThreadPoolTaskExecutor asyncConsumerTaskExecutor) {
DefaultJmsListenerContainerFactory factory = new DefaultJmsListenerContainerFactory();
factory.setConnectionFactory(sqsConnectionFactory);
factory.setDestinationResolver(new DynamicDestinationResolver());
// The JMS processor will start 'concurrency' number of tasks
// and supposedly will increase this to the max of '10 * 3'
factory.setConcurrency(10 + "-" + (10 * 3));
factory.setTaskExecutor(asyncConsumerTaskExecutor);
// Let the task process 100 messages, default appears to be 10
factory.setMaxMessagesPerTask(100);
// Wait up to 5 seconds for a timeout, this keeps the task around a bit longer
factory.setReceiveTimeout(5000L);
factory.setSessionAcknowledgeMode(Session.CLIENT_ACKNOWLEDGE);
return factory;
}
I added the setMaxMessagesPerTask & setReceiveTimeout calls based on stuff found on the internet, the problem persists without these and at various settings (50, 2500L, 25, 1000L, etc...)
We create a default SQS connection factory
public SQSConnectionFactory sqsConnectionFactory(AmazonSQS amazonSQS) {
return new SQSConnectionFactory(new ProviderConfiguration(), amazonSQS);
}
Finally the handlers look like this
#JmsListener(destination = "consumer-event-queue")
public void receiveEvents(String message) throws IOException {
MyEventDTO myEventDTO = jsonObj.readValue(message, MyEventDTO.class);
//messageTask.process(myEventDTO);
}
#JmsListener(destination = "myalert-sqs")
public void receiveAlerts(String message) throws IOException, InterruptedException {
final MyAlertDTO myAlert = jsonObj.readValue(message, MyAlertDTO.class);
myProcessor.addAlertToQueue(myAlert);
}
You can see in the first function (receiveEvents) we just take the message from the queue and exit, we have not implemented the processing code for that.
The second function (receiveAlerts) gets the message, the myProcessor.addAlertToQueue function creates a runnable object and submits it to a threadpool to be processed at some point in the future.
The problem only started (the warning, info and failure to consume messages) only started when we added the receiveAlerts function, previously the other function was the only one present and we did not see this behavior.
More
This is part of a larger project and I am working on breaking this code out into a smaller test case to see if I can duplicate this issue. I will post a follow-up with the results.
In the Mean Time
I'm hoping this is just a config issue and someone more familiar with this can tell me what I'm doing wrong, or that someone can provide some thoughts and comments on how to correct this to work properly.
Thank you!
After fighting this one for a bit I think I finally resolved it.
The issue appears to be due to the "DefaultJmsListenerContainerFactory", this factory creates a new "DefaultJmsListenerContainer" for EACH method with a '#JmsListener' annotation. The person who originally wrote the code thought it was only called once for the application, and the created container would be re-used. So the issue was two-fold
The 'ThreadPoolTaskExecutor' attached to the factory had 40 threads, when the application had 1 '#JmsListener' method this worked fine, but when we aded a second method then each method got 10 threads (total of 20) for listening. This is fine, however; since we stated that each listener could grow up to 30 listeners we quickly ran out of threads in the pool mentioned in 1 above. This caused the "Number of scheduled consumers has dropped below concurrentConsumers limit" error
This is probably obvious given the above, but I wanted to call it out explicitly. In the Listener Factory we set the concurrency to be "10-30", however; all of the listeners have to share that pool. As such the max concurrency has to be setup so that each listeners' max value is small enough so that if each listener creates its maximum that it doesn't exceed the maximum number of threads in the pool (e.g. if we have 2 '#JmsListener' annotated methods and a pool with 40 threads, then the max value can be no more than 20).
Hopefully this might help someone else with a similar issue in the future....
In my Spring Boot/Kafka project I have the following listener:
#KafkaListener(topics = "${kafka.topic.update}", containerFactory = "updateKafkaListenerContainerFactory")
public void onUpdateReceived(ConsumerRecord<String, Update> consumerRecord, Acknowledgment ack) {
// do some logic
ack.acknowledge();
}
Inside of the listener I need to check some condition according to my business logic and if it is not met - skip processing of this certain message and let Kafka know to redeliver this message one more time.
The reason I need this - according to the business logic of my application I need to avoid sending more than one post per second into the particular Telegram chat. This why I'd like to check the chatLastSent time in the Kafka listener and postpone message sending if needed(via message redelivery to this Kafka topic)
How to properly do it? Do I only need to not perform the ack.acknowledge(); this time or there is another, more proper way in order to achieve it?
Use the SeekToCurrentErrorHandler.
When you throw an exception, the container will invoke the error handler which will re-seek the unprocessed messages so they will be fetched again on the next poll.
You can use a RecordFilterStrategy.
See doc here : https://docs.spring.io/spring-kafka/docs/2.0.5.RELEASE/reference/html/_reference.html#_filtering_messages