Spring cloud stream kafka consumer error handling and retries issues - spring

I need help in error handling scenario in spring cloud stream kafka binder. My Application has java 8 consumer of which binding is specified in application.yaml. The consumer is written as :
#Bean
public Consumer<Message<Transaction>> doProcess() {
return message -> {
Transaction transaction = message.getPayload();
if(true) {
throw new RuntimeException("exception!! !!:)");
}
Acknowledgment acknowledgment = message.getHeaders().get(KafkaHeaders.ACKNOWLEDGMENT,
Acknowledgment.class);
if (acknowledgment != null) {
System.out.println("Acknowledgment provided");
acknowledgment.acknowledge();
}
}
}
application.yaml:
spring.application.name: appname
spring.cloud.stream:
function.definition: doProcess
kafka:
default.consumer:
startOffset: latest
useNativeDecoding: true
bindings:
input.consumer.autoCommitOffset: false
bindings:
doProcess-in-0:
destination: kafka.input.topic.name
group: appGroup
content-type: application/*+avro
consumer:
autoCommitOffset: false.
Now, I am struggling with error handling and have two issues:
I am trying with manual acking the consumption of the message rather than using autoCommitOffset as true. So, when I give autoCommitOffset as false and test for error scenario, facing weird behavior where whenever an exception is thrown the message is retried for 'n' number of times and this retrial / re-delivery of failed message is happening even after restart of the service (if restart is done before n re-trial is completed). And once n retrial is done, the message is not picked even after re-start of the service. So does that mean , the consumer is committing offset after n re-trial/re-delivery of the message which should not be the case as autoCommitOffset is false.
Note: I have not configured any dlq.
We need to write the custom exception handler, where we can catch the exception(error in both application code and framework) and send notification to a user group via email in AWS env. But, we are not able to find any error handler which can catch both types of exception. Something like extending SeekToCurrentErrorHandler or any other listener that can be called on error event.
Edit :
As per solution provided by Gary, we can use the below beans to configure the custom error handler:
#Bean
public ListenerContainerCustomizer<AbstractMessageListenerContainer> MQLCC() {
System.out.println(String.format("DEBUG: Bean %s has bean created.", "MQLCC"));
return new ListenerContainerCustomizerCustom ();
}
private static class ListenerContainerCustomizerCustom implements ListenerContainerCustomizer<AbstractMessageListenerContainer> {
#Override
public void configure(AbstractMessageListenerContainer container, String destinationName, String group) {
System.out.println(String.format("HELLO from container %s, destination: %s, group: %s", container, destinationName, group));
}
}

The default error handler in the listener container will retry 10 times and then log the error and discard the record; for different behavior you need to configure a custom error handler and recovery strategy. Use a ListenerContainerCustomizer bean to configure the container.
See https://docs.spring.io/spring-kafka/docs/current/reference/html/#default-eh
and https://docs.spring.io/spring-kafka/docs/current/reference/html/#dead-letters
(3.2 and later)
or https://docs.spring.io/spring-kafka/docs/2.7.x/reference/html/#seek-to-current
and https://docs.spring.io/spring-kafka/docs/2.7.x/reference/html/#dead-letters
for earlier versions.

Related

spring cloud stream kafka binder retry

I have spring cloud stream application with kafka binder that consumes and send messages.
In application i configure custom error handler with retry policy, and add not retryable exception to handler. Configuration exaple:
#Bean
public ListenerContainerCustomizer<AbstractMessageListenerContainer<?, ?>> customizer(
SeekToCurrentErrorHandler customErrorHandler
) {
return (((container, destinationName, group) -> {
container.setErrorHandler(customErrorHandler);
}));
}
#Bean
public SeekToCurrentErrorHandler customErrorHandler() {
var errorHandler = new SeekToCurrentErrorHandler(
(consumerRecord, e) -> log.error("Got exception skip record record: {}", consumerRecord, e),
new FixedBackOff(1000L, 10)
);
errorHandler.addNotRetryableException(App.MyCustomException.class);
return errorHandler;
}
But i see, that if exception throws, than application retry to process message 3 times.
Expected behaveor - will not repeat to consume messages if App.MyCustomException.class throws.
How to configure retry policy for spring cloud stream kafka binder application?
Code exaple here: github
Run test for reproduce issue.
The customizations you provide are for the container-level error handler. Binder has a different retrying mechanism. You can add the following to your configuration to ensure that the record is not re-tried when the exception occurs.
spring.cloud.stream:
bindings:
processor-in-0:
...
consumer:
retryableExceptions:
ru.vichukano.kafka.binder.retry.App.MyCustomException: false
When I tried that, I didn't see the message being re-delivered.
Here are some explanations for this.

Kafka stream does not retry on deserialisation error

Spring cloud Kafka stream does not retry upon deserialization error even after specific configuration. The expectation is, it should retry based on the configured retry policy and at the end push the failed message to DLQ.
Configuration as below.
spring.cloud.stream.bindings.input_topic.consumer.maxAttempts=7
spring.cloud.stream.bindings.input_topic.consumer.backOffInitialInterval=500
spring.cloud.stream.bindings.input_topic.consumer.backOffMultiplier=10.0
spring.cloud.stream.bindings.input_topic.consumer.backOffMaxInterval=100000
spring.cloud.stream.bindings.iinput_topic.consumer.defaultRetryable=true
public interface MyStreams {
String INPUT_TOPIC = "input_topic";
String INPUT_TOPIC2 = "input_topic2";
String ERROR = "apperror";
String OUTPUT = "output";
#Input(INPUT_TOPIC)
KStream<String, InObject> inboundTopic();
#Input(INPUT_TOPIC2)
KStream<Object, InObject> inboundTOPIC2();
#Output(OUTPUT)
KStream<Object, outObject> outbound();
#Output(ERROR)
MessageChannel outboundError();
}
#StreamListener(MyStreams.INPUT_TOPIC)
#SendTo(MyStreams.OUTPUT)
public KStream<Key, outObject> processSwft(KStream<Key, InObject> myStream) {
return myStream.mapValues(this::transform);
}
The metadataRetryOperations in KafkaTopicProvisioner.java is always null and hence it creates a new RetryTemplate in the afterPropertiesSet().
public KafkaTopicProvisioner(KafkaBinderConfigurationProperties kafkaBinderConfigurationProperties, KafkaProperties kafkaProperties) {
Assert.isTrue(kafkaProperties != null, "KafkaProperties cannot be null");
this.adminClientProperties = kafkaProperties.buildAdminProperties();
this.configurationProperties = kafkaBinderConfigurationProperties;
this.normalalizeBootPropsWithBinder(this.adminClientProperties, kafkaProperties, kafkaBinderConfigurationProperties);
}
public void setMetadataRetryOperations(RetryOperations metadataRetryOperations) {
this.metadataRetryOperations = metadataRetryOperations;
}
public void afterPropertiesSet() throws Exception {
if (this.metadataRetryOperations == null) {
RetryTemplate retryTemplate = new RetryTemplate();
SimpleRetryPolicy simpleRetryPolicy = new SimpleRetryPolicy();
simpleRetryPolicy.setMaxAttempts(10);
retryTemplate.setRetryPolicy(simpleRetryPolicy);
ExponentialBackOffPolicy backOffPolicy = new ExponentialBackOffPolicy();
backOffPolicy.setInitialInterval(100L);
backOffPolicy.setMultiplier(2.0D);
backOffPolicy.setMaxInterval(1000L);
retryTemplate.setBackOffPolicy(backOffPolicy);
this.metadataRetryOperations = retryTemplate;
}
}
The retry configuration only works with MessageChannel-based binders. With the KStream binder, Spring just helps with building the topology in a prescribed way, it's not involved with the message flow once the topology is built.
The next version of spring-kafka (used by the binder) has added the RecoveringDeserializationExceptionHandler (commit here); while it can't help with retry, it can be used with a DeadLetterPublishingRecoverer send the record to a dead-letter topic.
You can use a RetryTemplate within your processors/transformers to retry specific operations.
Spring cloud Kafka stream does not retry upon deserialization error even after specific configuration.
The behavior you are seeing matches the default settings of Kafka Streams when it encounters a deserialization error.
From https://docs.confluent.io/current/streams/faq.html#handling-corrupted-records-and-deserialization-errors-poison-pill-records:
LogAndFailExceptionHandler implements DeserializationExceptionHandler and is the default setting in Kafka Streams. It handles any encountered deserialization exceptions by logging the error and throwing a fatal error to stop your Streams application. If your application is configured to use LogAndFailExceptionHandler, then an instance of your application will fail-fast when it encounters a corrupted record by terminating itself.
I am not familiar with Spring's facade for Kafka Streams, but you probably need to configure the desired org.apache.kafka.streams.errors.DeserializationExceptionHandler, instead of configuring retries (they are meant for a different purpose). Or, you may want to implement your own, custom handler (see link above for more information), and then configure Spring/KStreams to use it.

Spring-Cloud-Streams Kafka - How to stop the consumers

I have a Spring-Cloud-Streams client reading from a Kakfa topic consisting of several partitions. The client calls a webservice for every Kafka message it reads. If the webservice is unavailable after a few retries, I want to stop the consumer from reading from Kafka. Referring to a previous Stackoverflow question (Spring cloud stream kafka pause/resume binders) I autowired BindingsEndpoint and call the changeState() method to try to stop the consumer but the logs show the consumer continuing to read the messages from Kafka after changeState() is invoked.
I am using Spring Boot version 2.1.2.RELEASE with Spring Cloud version Greenwich.RELEASE. The managed version for spring-cloud-stream-binder-kafka is 2.1.0.RELEASE. I have set the properties autoCommitOffset=true and autoCommitOnError=false.
Below is snippet of my codes. Is there something I have missed? Is the first input parameter to changeState() supposed to be the topic name?
If I want the consumer application to exit when the webservice is not available, can I simply do System.exit() without needing to stop the consumer first?
#Autowired
private BindingsEndpoint bindingsEndpoint;
...
...
#StreamListener(MyInterface.INPUT)
public void read(#Payload MyDTO dto,
#Header(KafkaHeaders.RECEIVED_TOPIC) String topic,
#Header(KafkaHeaders.RECEIVED_PARTITION_ID) int partition,
#Header(KafkaHeaders.CONSUMER) Consumer<?, ?> consumer) {
try {
logger.info("Processing message "+dto);
process(dto); // this is the method that calls the webservice
} catch (Exception e) {
if (e instanceof IllegalStateException || e instanceof ConnectException) {
bindingsEndpoint.changeState("my.topic.name",
BindingsEndpoint.State.STOPPED);
// Binding<?> b = bindingsEndpoint.queryState("my.topic.name"); ==> Using topic name returns a valid Binding object
}
e.printStackTrace();
throw (e);
}
}
You can do so by utilising Binding visualization and control feature where you can visualize as well as stop/start/pause/resume bindings.
Also, you are aware that System.exit() will shut down the entire JVM?
Had the same issue, the first input parameter to changeState() should be the binding name. It worked for me

SpringBoot microservice #StreamListener retry unlimited time when it throw RunTimeException

I have a #StreamListener method where it will perform REST call. When REST call return exception, #StreamListener method will throw RunTimeException and perform retry. #StreamListener method will retry unlimited times when it throw RuntimeException
Spring Cloud Stream Retry configuration:
spring.cloud.stream.kafka.bindings.inputChannel.consumer.enableDlq=true
spring.cloud.stream.bindings.inputChannel.consumer.maxAttempts=3
spring.cloud.stream.bindings.inputChannel.consumer.concurrency=3
spring.cloud.stream.bindings.inputChannel.consumer.backOffInitialInterval=300000
spring.cloud.stream.bindings.inputChannel.consumer.backOffMaxInterval=600000
SpringBoot microservice dependencies version:
Spring Boot 2.0.3
Spring Cloud Stream Elmhurst.RELEASE
Kafka broker 1.1.0
Using RetryTemplate or increasing maxAttempts property has the restriction that retries should be completed within max.poll.interval.ms, otherwise Kafka broker will think that consumer is down and reassigns the partition to another consumer(if available).
Other option is to make the listener re-read the same message from Kafka using consumer.seek method.
#StreamListener("events")
public void handleEvent(#Payload String eventString, #Header(KafkaHeaders.CONSUMER) Consumer<?, ?> consumer,
#Header(KafkaHeaders.RECEIVED_PARTITION_ID) String partitionId,
#Header(KafkaHeaders.RECEIVED_TOPIC) String topic,
#Header(KafkaHeaders.OFFSET) String offset) {
try {
//do the logic (example: REST call)
} catch (Exception e) { // Catch only specific exceptions that can be retried
consumer.seek(new TopicPartition(topic, Integer.parseInt(partitionId)), Long.parseLong(offset));
}
}
You can certainly increase the number of attempts (maxAttempts property) to something like Integer.MAX_VALUE, or you can provide an instance of your own RetryTemplate bean which could be configured as you wish.
Here is where you can get more info https://docs.spring.io/spring-cloud-stream/docs/current/reference/htmlsingle/#_retry_template
after a few trial and error, we found out that kafka configuration: max.poll.interval.ms is defaulted to 5 minutes. Due to our consumer retry mechanism, our whole retry process will take 15 minutes for the worst case scenario.
So after 5 minutes of the first message being consumed, kafka partition decides that consumer did not provide any response, do a auto-balancing and assign the same message to another partition.

Spring AMQP Consumer mysteriously dropping connection to queue

We're using spring-amqp 1.5.2, with RabbitMQ version 3.5.3. All queues work fine and we have consumers listening on them with no issues, except one consumer which keeps on dropping connections mysteriously. spring-amqp auto recovers, but after a few hours the consumers are disconnected and never come back up.
The queue is declared as
#Bean()
public Queue analyzeTransactionsQueue(){
Map<String, Object> args = new HashMap<>();
args.put("x-message-ttl", 60000);
return new Queue("analyze.txns", true, false, false, args);
}
Other queues are declared in a similar fashion, and have no issues.
The consumer (listener) is declared as
#Bean
public SimpleRabbitListenerContainerFactory analyzeTransactionListenerContainerFactory(ConnectionFactory connectionFactory, AsyncTaskExecutor asyncTaskExecutor) {
connectionFactory.getVirtualHost());
SimpleRabbitListenerContainerFactory factory = new SimpleRabbitListenerContainerFactory();
factory.setConnectionFactory(connectionFactory);
factory.setConcurrentConsumers(2);
factory.setMaxConcurrentConsumers(4);
factory.setTaskExecutor(asyncTaskExecutor);
ConsumerTagStrategy consumerTagStrategy = new ConsumerTagStrategy() {
#Override
public String createConsumerTag(String queue) {
return queue;
}
};
factory.setConsumerTagStrategy(consumerTagStrategy);
return factory;
}
Again, other consumers having no issues are declared in a similar fashion.
The code after the message is received has no exceptions. Even after turning on DEBUG logging for SimpleMessageListenerContainer, there are no errors in the logs.
LogLevel=DEBUG; category=org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer; msg=Cancelling Consumer: tags=[{}], channel=Cached Rabbit Channel: AMQChannel(amqp://guest#10.17.1.13:5672/,47), acknowledgeMode=AUTO local queue size=0;
LogLevel=DEBUG; category=org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer; msg=Idle consumer terminating: Consumer: tags=[{}], channel=Cached Rabbit Channel: AMQChannel(amqp://guest#10.17.1.13:5672/,47), acknowledgeMode=AUTO local queue size=0;
Any ideas on why this would be happening. Have tried DEBUG logging but to no avail.
one thing I have observed is that consumer would disconnect if there's an exception during parsing and it doesn't always log the problem, depending on your logging config...
since then, I always wrap the handleDelivery method into a try catch, to get better logging and no connection drop :
consumer = new DefaultConsumer(channel) {
#Override
public void handleDelivery(String consumerTag,
Envelope envelope,
AMQP.BasicProperties properties,
byte[] body) throws IOException {
log.info("processing message - content : " + new String(body, "UTF-8"));
try {
MyEvent myEvent = objectMapper.readValue(new String(body, "UTF-8"), MyEvent.class);
processMyEvent(myEvent);
} catch (Exception exp) {
log.error("couldn't process "+MyEvent.class+" message : ", exp);
}
}
};
Looking at the way you have configured things, it is pretty obvious that you have enabled dynamic scaling of consumers.
factory.setConcurrentConsumers(2);
factory.setMaxConcurrentConsumers(4);
There was a threading issue that I submitted a fix for which caused number of consumers to drop to zero. This was happening while consumers were scaling down.
By the looks of it, you have been a victim of that problem. The fix has been back-ported I believe and can be seen here
Try using the latest version and see whether you get the same problem.

Resources