Spring Kafka Auto Commit Offset In Case of Failures - spring

I am using Spring Kafka 1.2.2.RELEASE. I have a Kafka Listener as consumer that listens to a topic and index the document in elastic.
My Auto commit offset property is set to true //default.
I was under the impression that in case there is an exception in the listener(elastic is down) the offsets should not be committed and the same message should be processed for the next poll
However this is not happening and the consumer commits the offset on the next poll.After reading posts and documentation i learnt that this is the case that with auto commit set to true to next poll will commit all offset
My doubt is why is the consumer calling the next poll and also how can i prevent any offset from committing with auto commit to true or do i need to set this property to false and commit manually.

I prefer to set it to false; it is more reliable for the container to manage the offsets for you.
Set the container's AckMode to RECORD (it defaults to BATCH) and the container will commit the offset for you after the listener returns.
Also consider upgrading to at least 1.3.3 (current version is 2.1.4); 1.3.x introduced a much simpler threading model, thanks to KIP-62
EDIT
With auto-commit, the offset will be committed regardless of success/failure. The container won't commit after a failure, unless ackOnError is true (another reason to not use auto commit).
However, that still won't help because the broker won't send the same record again. You have to perform a seek operation on the Consumer for that.
In 2.0.1 (current version is 2.1.4), we added the SeekToCurrentErrorHandler which will cause the failed and unprocessed records to be re-sent on the next poll. See the reference manual.
You can also use a ConsumerAwareListener to perform the seek yourself (also added in 2.0).
With older versions (>= 1.1) you have to use a ConsumerSeekAware listener which is quite a bit more complicated.
Another alternative is to add retry so the delivery will be re-attempted according to the retry settings.

Apparently, there will be message loss with Spring Kafka <=1.3.3 #KafkaListener even if you use ackOnError=false if we expect Spring Kafka to automatically (at least document) take care of this by retrying and "simply not doing poll again"? :). And, the default behavior is to just log.
We were able to reproduce message loss/skip on a consumer even with spring-kafka 1.3.3.RELEASE (no maven sources) and with a single partition topic, concurrency(1), AckOnError(false), BatchListener(true) with AckMode(BATCH) for any runtime exceptions. We ended up doing retries inside the template or explore ConsumerSeekAware.
#GaryRussell, regarding "broker won't send same record again" or continue returning next batch of messages without commit?, is this because, consumer poll is based on current offset that it seeked to get next batch of records and not exactly on last offsets committed? Basically, consumers need not commit at all assuming some run time exceptions on every processing and keep consuming entire messages on topic. Only a restart will start from last committed offset (duplicate).
Upgrade to 2.0+ to use ConsumerAwareListenerErrorHandler seems requires upgrading to at least Spring 5.x which is a major upgrade.

Related

Polling behavior when using ReactiveKafkaConsumerTemplate

I have a Spring Boot application using ReactiveKafkaConsumerTemplate for consuming messages from Kafka.
I've consume messages using kafkaConsumerTemplate.receive() therefore I'm manually acknowledging each message. Since I'm working in an asynchronous manner, messages are not processed sequentially.
I'm wondering how does the commit and poll process work in this scenario - If I polled 100 messages but acknowledged only 99 of them (message not acknowledged is in the middle of the 100 messages I polled, say number 50), what happens on the next poll operation? Will it actually poll only after all 100 messages are acknowledged (and offset is committed) and until then I'll keep getting the un-acknowledged messages over and over to my app until I acknowledge it?
Kafka maintains 2 offsets for a consumer group/partition - the current position() and the committed offset. When a consumer starts, the position is set to the last committed offset.
Position is updated after each poll, so the next poll will never return the same record, regardless of whether it has been committed (unless a seek is performed).
However, with reactor, you must ensure that commits are performed in the right order, since records are not acknowledged individually, just the committed offset is retained.
If you commit out of order and restart your app, you may get some processed messages redelivered.
We recently added support in the framework for out-of-order commits.
https://projectreactor.io/docs/kafka/release/reference/#_out_of_order_commits
The current version is 1.3.11, including this feature.

Spring #Kafkalistener auto commit offset or manual: Which is recommended?

As per what I read on internet, method annotated with Spring #KafkaListener will commit the offset in 5 sec by default.
Suppose after 5 seconds, the offset is committed but the processing is still going on and in between consumer crashes because of some issue, in that case after rebalancing, the partition will be assigned to other consumer and it will start processing from next message because previous message offset was committed.
This will result in loss of the message.
So, do I need to commit the offset manually after processing completes? What would be the recommended approach?
Again, if processing is done, and just before commit, the consumer crashed, then how to avoid the message
duplication in this case.
Please suggest the way which will avoid message loss and duplication. I am using Spring KafkaListener
with default configuration.
As usual this depends on your use case and how you would like to deal with issues during your processing. The usage of auto-commit will change the delivery semantics of your application.
Enabling the auto commits is more an "at-most-once" semantics as you would read the data and commit it before you have actually processed the data. In case your processing fails the message was already committed and you will not read it again, it is therefore "lost" for your application (for your particular consumerGroup to be more precise).
Disabling the auto commit is more a "at-least-once" semantics as you are committing the data only after the processing of the data. Imagine you fetch 100 messages from the topic. 50 of them were processed sucessfullay and your application fails during the processing of the 51st message. Now, as you disabled auto commit and only commit all or none messages at the end of the processing, you have not committed any of the 100 messages, the next time your application reads the same 100 messages again. However, you have now created 50 duplicate messages as they were already processed successfully previously.
To conclude, you need to figure out if your use case can rather handle data loss or deal with duplicates. Dealing with duplicates can be ensured if your application is idempotent.
You are asking about "how to prevent data loss and duplicates" which means you are referring to "exactly-once-scemantics". This is a big topic in distributed streaming systems and you could check the spring-kafka docs if this is supported under which configuration and dependent on the output operation of your application.
Please also check the comment of GaryRussell on this post:
"the Spring team does not recommend using auto commit; the listener container Ackmode (BATCH or RECORD) will commit the offsets in a deterministic manner; recent versions of the framework disable auto commit (unless specifically enabled)"
If the consumer takes 5+ seconds to process the message then you have a problem in the code that needs to be fixed.
Auto-commit is risky in Production as can lead to problem scenarios (message loss etc.)
Better to go with manual commit to have better control.
Make the consumer idempotent so that duplicate message and WIP state of consumer is not a problem. May be, maintain processing status in consumer's DB so that if processing is half done then on consumer restart it can clear the WIP state and process afresh. Similarly, if processing status is Complete state then on restart it will see the Complete status and simply commit the duplicate message to Kafka.

What is the most efficient way to know that a Kafka event is visible in a K-Table?

We use Kafka topics as both events and a repository. Using the kafka-streams API we define a simple K-Table that represents all the events in the topic.
In our use case we publish events to the topic and subsequently reference the K-Table as the backing repository. The main issue is that the published events are not immediately visible on the K-Table.
We tried transactions and exactly once semantics as described here (https://kafka.apache.org/26/documentation/streams/core-concepts#streams_processing_guarantee) but there is always a delay we cannot control.
Publish Event
Undetermined amount of time
Published Event is visible in the K-Table
Is there a way to eliminate the delay or otherwise know that a specific event has been consumed by the K-Table.
NOTE: We tried both partition and global tables with similar results.
Thanks
Because Kafka is an asynchronous system the observed delay is expected and you cannot do anything to avoid it.
However, if you publish a message to a topic, the KafkaProducer allows you to pass in a Callback to the send() method and the callback will be executed after the message was written to the topic providing the record's metadata like topic, partition, and offset.
After Kafka Streams processed messages, it will eventually commit the offsets (you can configure the commit interval, too). Thus, you can know if the message is in the KTable after the offset was committed. By default, committing happens every 30 seconds only and it's not recommended to use a very short commit interval because it implies large overhead. Thus, I am not sure if this would help for your case, as it seem you want a more timely "response".
As an alternative, you can also disable caching on the KTable and use a toStream().process() step -- after each update to the KTable, the changelog stream provided by toStream() will contain the record and you can access the record metadata (including its offset) in the Processor via the given ProcessorContext object. Thus should also allow you to figure out, when the record is available in the KTable.

Spring Kafka don't respect max.poll.records with strange behavior

Well, I'm trying the following scenario:
In application.properties set max.poll.records to 50.
In application.properties set enable-auto-commit=false and ack-mode to manual.
In my method added #KafkaListener, but don't commit any message, just read, log but don't make an ACK.
Actually, in my Kafka topic, I have 500 messages to be consumed, so I'm expecting the following behavior:
Spring Kafka poll() 50 messages (offset 0 to 50).
As I said, I didn't commit anything, just log the 50 messages.
In the next Spring Kafka poll() invocation, get the same 50 messages (offset 0 to 50), as step 1. Spring Kafka, in my understanding, should continue in this loop (step 1-3) reading always the same messages.
But what happens is the following:
Spring Kafka poll() 50 messages (offset 0 to 50).
As I said, I didn't commit anything, just log the 50 messages.
In the next Spring Kafka poll() invocation, get the NEXT 50 messages, different from step 1 (offset 50 to 100).
Spring Kafka reads the 500 messages, in blocks of 50 messages, but don't commit anything. If I shut down the application and start again, the 500 messages are received again.
So, my doubts:
If I configured the max.poll.recors to 50, how spring Kafka get the next 50 records if I didn't commit anything? I understand the poll() method should return the same records.
Does Spring Kafka have some cache? If yes, this can be a problem if I get 1million records in cache without commit.
Your first question:
If I configured the max.poll.recors to 50, how spring Kafka get the
next 50 records if I didn't commit anything? I understand the poll()
method should return the same records.
First, to make sure that you did not commit anything, you must make sure that you understand the following 3 parameters, which i believe you understood.
ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, set it to false(which is also the recommended default). And if it is set to false, take note that auto.commit.interval.ms becomes irrelevant. Check out this documentation:
Because the listener container has it’s own mechanism for committing
offsets, it prefers the Kafka ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG
to be false. Starting with version 2.3, it unconditionally sets it to
false unless specifically set in the consumer factory or the
container’s consumer property overrides.
factory.getContainerProperties().setAckMode(AckMode.MANUAL); You take the responsibility to acknowledge. (Ignored when transactions are being used) and ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG can't be true.
factory.getContainerProperties().setSyncCommits(true/false); Set whether or not to call consumer.commitSync() or commitAsync() when the container is responsible for commits. Default true. This is responsible for sync-ing with Kafka, nothing else, if set to true, that call will block until Kafka responds.
Secondly, no the consumer poll() will not return the same records. For the current running consumer, it tracks its offset in memory with some internal index, we don't have to care about committing offsets. Please also see #GaryRussell s explanation here.
In short, he explained:
Once the records have been returned by the poll (and offsets not
committed), they won't be returned again unless you restart the
consumer or perform seek() operations on the consumer to reset the
offset to the unprocessed ones.
Your second question:
Does Spring Kafka have some cache? If yes, this can be a problem if I
get 1million records in cache without commit.
There is no "cache", it's all about offsets and commits, explanation as per above.
Now to achieve what you wanted to do, you can consider doing 2 things after fetching the first 50 records, i.e for the next poll():
Either, re-start the container programatically
Or call consumer.seek(partition, offset);
BONUS:
Whatever configuration you choose, you can always check out the results, by looking at the LAG column of this output:
kafka-consumer-groups.bat --bootstrap-server localhost:9091 --describe --group your_group_name
Consumer not committing the offset will have impact only in situations like:
Your consumer crashed after reading 200 messages, when you restart it, it will start again from 0.
Your consumer is no longer assigned a partition.
So in a perfect world, you don't need to commit at all and it will consume all the messages because consumer first asks for 1-50,then 51-100.
But if the consumer crashed, nobody knows what was the offset that consumer read. If the consumer had committed the offset, when it is restarted it can check the offset topic to see where the crashed consumer left and start from there.
max.poll.records defines how many records to fetch at one go but it does not define which records to fetch.

Kafka 2.1 behaviour change for retentions and Kafka Stream application, what can we so that retention works?

Following is from the Kafka Documentation for 2.1.
https://kafka.apache.org/documentation/
Offset expiration semantics has slightly changed in this version.
According to the new semantics, offsets of partitions in a group will
not be removed while the group is subscribed to the corresponding
topic and is still active (has active consumers). If group becomes
empty all its offsets will be removed after default offset retention
period (or the one set by broker) has passed (unless the group becomes
active again). Offsets associated with standalone (simple) consumers,
that do not use Kafka group management, will be removed after default
offset retention period (or the one set by broker) has passed since
their last commit.
If I understand this correctly, as long as Stream Thread consumer's are connected, no retention setting will be effective?
I also started to observe following Exception after the restart of stream application
stream thread - Restoring Stream Tasks failed. Deleting StreamTasks stores to recreate from scratch.
org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions:' but stream application uses the property 'StreamsConfig.consumerPrefix(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG), "earliest"'...
I think it has to do something with retention but I can't tell what?
If I understand this correctly, as long as Stream Thread consumer's are connected, no retention setting will be effective?
This applies to __consumer_offset topic only, that is a Kafka internal topic. For all regular/user topics, retention time is applied the same way as in all previous versions. Also note, this only applies if you upgrade your brokers to 2.1.
For the log message of Streams: you don't need to worry about it. It seems that your application was offline for a longer time, and thus, your local store is not in a consistent state any longer. Thus, it's deleted and recreated from scratch from the changelog topic.

Resources