How to find no more messages in kafka topic/partition & reading only after writing to topic is done - spring-boot

I'm using Spring boot version 1.5.4.RELEASE & spring Kafka version 1.3.8.RELEASE.
Some generic questions
is there way to find out no more messages in topic/partition in
consumer
how to start consumer to start consuming messages from a
topic only after writing from the producer is done?

Spring Boot 1.5 is end of life and no longer supported; the current version is 2.2.5.
The latest 1.3.x version of Spring for Apache Kafka is 1.3.10. It will only be supported through the end of this year.
You should plan on upgrading.
You can start and stop containers using the KafkaListenerEndpointRegistry bean; set autoStartup to false on the container factory.
See Detecting Idle and Non-Responsive Consumers.
While efficient, one problem with asynchronous consumers is detecting when they are idle - users might want to take some action if no messages arrive for some period of time.
You can configure the listener container to publish a ListenerContainerIdleEvent when some time passes with no message delivery. While the container is idle, an event will be published every idleEventInterval milliseconds.
...

Related

Spring Kafka consumer removed from consumer group when topic idle

Versions
Spring Boot 1.5.x,
Spring Boot 2.4.x,
Apache Kafka 0.10.2
The Situation
We have two service instances hosted on different servers. Each instance initializes multiple Kafka consumers. All consumers are listening to the same topic and are part of the same consumer group.
We are not relying on Spring Boot/Spring Kafka to configure the ConcurrentKafkaListnerContainerFactory and its DefaultKafkaConsumerFactory. All the consumer configuration properties are set to the default Apache Kafka consumer property values except for max.poll.records, session.timeout.ms, and heartbeat.interval.ms. Acknowledgement mode is set to record.
We are using the #KafkaListener annotation and setting its containerFactory property with the bean name of the initialized ConcurrentKafkaListenerContainerFactory and setting it topics property.
The Problem
When a topic does not get any messages published to it for a day or two, all consumers are removed from the consumer group.
I can’t find any reason for this to happen. From my understanding of reading both the Apache Kafka and Spring Kafka documentation if poll is called within max.poll.interval.ms, the consumer is considered alive. And if heartbeats are continuously sent by the consumer within the session.timeout.ms, the consumer is considered alive. According to the documentation, poll is called continuously and heartbeats are sent at the interval set by heartbeat.interval.ms.
The Questions
Is there a setting or property Spring Boot/Spring Kafka is setting that causes a consumer that hasn’t consumed any records from an idle topic for a day or two to be removed from the consumer group?
If yes, can this be turned off and what are the downsides?
If no, is there a way to rejoin the consumer group without having to restart the service and what are the downsides?
That Kafka version is very, very old.
Older versions removed the consumer offsets after no activity for 24 hours, even if the consumer is still connected. In 2.0, this was increased to 7 days. With newer brokers (since 2.1), consumer offsets are only removed if the consumers are not actually connected for 7 days.
See https://kafka.apache.org/documentation/#upgrade_200_notable
You can increase the broker's offsets.retention.minutes with older brokers.

How to find the processing time of Kafka messages?

I have an application running Kafka consumers and want to monitor the processing time of each message consumed from the topic. The application is a Spring boot application and exposes Kafka consumer metrics to Spring Actuator Prometheus endpoint using micrometre registry.
Can I use kafka_consumer_commit_latency_avg_seconds or kafka_consumer_commit_latency_max_seconds to monitor or alert?
Those metrics have nothing to do with record processing time. spring-kafka provides metrics for that; see here.
Monitoring Listener Performance
Starting with version 2.3, the listener container will automatically create and update Micrometer Timer s for the listener, if Micrometer is detected on the class path, and a single MeterRegistry is present in the application context. The timers can be disabled by setting the ContainerProperty micrometerEnabled to false.
Two timers are maintained - one for successful calls to the listener and one for >failures.

duplicate consumption of messages with Spring Cloud Stream Kafka binder

We have several micro-services using Spring Boot and Spring Cloud Stream Kafka binder to communicate between them.
Occasionally, we observe bursts of duplicate messages received by a consumer - often several days after it was first consumed and processed (successfully).
While I understand that Kafka does not guarantee exactly-once delivery, it still looks very strange, given that there were no rebalancing events or any 'suspicious' activity in the logs of either the brokers nor the services. Since the consumer is interacting with external APIs, it is a bit difficult to make it idempotent.
Any hints what might be the cause of duplication? What should I be looking for to figure this out?
We are using Kafka broker 1.0.0, and this particular consumer uses Spring Cloud Stream Binder Kafka 2.0.0, which is based on kafka-client 1.0.2 (version of the other services might be a bit different).
You should show your configuration when asking questions like this.
Best guess is the broker's offsets.retention.minutes.
With modern broker versions (since 2.0), it defaults to 1 week; with older versions it was only one day.

What is the ideal way to store the consumer offset using spring boot kafka consumer client?

I have spring kafka consumer application. The application acts as pass through which polls the messages from kafka broker and send to IBM MQ. What would be a best/simplistic approach to store the offset in case of failure?
The simplest approach is to use the default mechanism of storing the offsets in kafka itself.
If you add a SeekToCurrentErrorHandler, the container will keep redelivering records that are failed in the listener, up to 10 times by default but it can be configured for infinite retries.
If you add stateful retry, the listener adapter can add a delay between each delivery attempt.
See Stateful Retry.
ackOnError should be set to false.

Spring Integration - Kafka Message Driven Channel - Auto Acknowledge

I have used the sample configuration as was listed in the spring io docs and it is working fine.
<int-kafka:message-driven-channel-adapter
id="kafkaListener"
listener-container="container1"
auto-startup="false"
phase="100"
send-timeout="5000"
channel="nullChannel"
message-converter="messageConverter"
error-channel="errorChannel" />
However, when i was testing it with downstream application where i consume from kafka and publish it to downstream. If downstream is down, the messages were still getting consumed and was not replayed.
Or lets say after consuming from kafka topic , in case i find some exception in service activator, i want to throw some exception as well which should rollback the transaction so that kafka messages can be replayed.
In brief, if the consuming application is having some issue , then i want to roll back the transaction so that messages are not automatically acknowledged and are replayed back again and again unless it is succesfuly processed.
That's not how Apache Kafka works. There is the TX semantics similar to JMS. The offset in Kafka topic has nothing with rallback or redelivery.
I suggest you to study Apache Kafka closer from their official resource.
Spring Kafka brings nothing over the regular Apache Kafka protocol, however you can consider to use retry capabilities in the Spring Kafka to redeliver the same record locally : http://docs.spring.io/spring-kafka/docs/1.2.2.RELEASE/reference/html/_reference.html#_retrying_deliveries
And yes, the ack mode must be MANUAL, do not commit offset into the Kafka automatically after consuming.

Resources