Spring #Kafkalistener auto commit offset or manual: Which is recommended? - spring-boot

As per what I read on internet, method annotated with Spring #KafkaListener will commit the offset in 5 sec by default.
Suppose after 5 seconds, the offset is committed but the processing is still going on and in between consumer crashes because of some issue, in that case after rebalancing, the partition will be assigned to other consumer and it will start processing from next message because previous message offset was committed.
This will result in loss of the message.
So, do I need to commit the offset manually after processing completes? What would be the recommended approach?
Again, if processing is done, and just before commit, the consumer crashed, then how to avoid the message
duplication in this case.
Please suggest the way which will avoid message loss and duplication. I am using Spring KafkaListener
with default configuration.

As usual this depends on your use case and how you would like to deal with issues during your processing. The usage of auto-commit will change the delivery semantics of your application.
Enabling the auto commits is more an "at-most-once" semantics as you would read the data and commit it before you have actually processed the data. In case your processing fails the message was already committed and you will not read it again, it is therefore "lost" for your application (for your particular consumerGroup to be more precise).
Disabling the auto commit is more a "at-least-once" semantics as you are committing the data only after the processing of the data. Imagine you fetch 100 messages from the topic. 50 of them were processed sucessfullay and your application fails during the processing of the 51st message. Now, as you disabled auto commit and only commit all or none messages at the end of the processing, you have not committed any of the 100 messages, the next time your application reads the same 100 messages again. However, you have now created 50 duplicate messages as they were already processed successfully previously.
To conclude, you need to figure out if your use case can rather handle data loss or deal with duplicates. Dealing with duplicates can be ensured if your application is idempotent.
You are asking about "how to prevent data loss and duplicates" which means you are referring to "exactly-once-scemantics". This is a big topic in distributed streaming systems and you could check the spring-kafka docs if this is supported under which configuration and dependent on the output operation of your application.
Please also check the comment of GaryRussell on this post:
"the Spring team does not recommend using auto commit; the listener container Ackmode (BATCH or RECORD) will commit the offsets in a deterministic manner; recent versions of the framework disable auto commit (unless specifically enabled)"

If the consumer takes 5+ seconds to process the message then you have a problem in the code that needs to be fixed.
Auto-commit is risky in Production as can lead to problem scenarios (message loss etc.)
Better to go with manual commit to have better control.
Make the consumer idempotent so that duplicate message and WIP state of consumer is not a problem. May be, maintain processing status in consumer's DB so that if processing is half done then on consumer restart it can clear the WIP state and process afresh. Similarly, if processing status is Complete state then on restart it will see the Complete status and simply commit the duplicate message to Kafka.

Related

Polling behavior when using ReactiveKafkaConsumerTemplate

I have a Spring Boot application using ReactiveKafkaConsumerTemplate for consuming messages from Kafka.
I've consume messages using kafkaConsumerTemplate.receive() therefore I'm manually acknowledging each message. Since I'm working in an asynchronous manner, messages are not processed sequentially.
I'm wondering how does the commit and poll process work in this scenario - If I polled 100 messages but acknowledged only 99 of them (message not acknowledged is in the middle of the 100 messages I polled, say number 50), what happens on the next poll operation? Will it actually poll only after all 100 messages are acknowledged (and offset is committed) and until then I'll keep getting the un-acknowledged messages over and over to my app until I acknowledge it?
Kafka maintains 2 offsets for a consumer group/partition - the current position() and the committed offset. When a consumer starts, the position is set to the last committed offset.
Position is updated after each poll, so the next poll will never return the same record, regardless of whether it has been committed (unless a seek is performed).
However, with reactor, you must ensure that commits are performed in the right order, since records are not acknowledged individually, just the committed offset is retained.
If you commit out of order and restart your app, you may get some processed messages redelivered.
We recently added support in the framework for out-of-order commits.
https://projectreactor.io/docs/kafka/release/reference/#_out_of_order_commits
The current version is 1.3.11, including this feature.

What is the most efficient way to know that a Kafka event is visible in a K-Table?

We use Kafka topics as both events and a repository. Using the kafka-streams API we define a simple K-Table that represents all the events in the topic.
In our use case we publish events to the topic and subsequently reference the K-Table as the backing repository. The main issue is that the published events are not immediately visible on the K-Table.
We tried transactions and exactly once semantics as described here (https://kafka.apache.org/26/documentation/streams/core-concepts#streams_processing_guarantee) but there is always a delay we cannot control.
Publish Event
Undetermined amount of time
Published Event is visible in the K-Table
Is there a way to eliminate the delay or otherwise know that a specific event has been consumed by the K-Table.
NOTE: We tried both partition and global tables with similar results.
Thanks
Because Kafka is an asynchronous system the observed delay is expected and you cannot do anything to avoid it.
However, if you publish a message to a topic, the KafkaProducer allows you to pass in a Callback to the send() method and the callback will be executed after the message was written to the topic providing the record's metadata like topic, partition, and offset.
After Kafka Streams processed messages, it will eventually commit the offsets (you can configure the commit interval, too). Thus, you can know if the message is in the KTable after the offset was committed. By default, committing happens every 30 seconds only and it's not recommended to use a very short commit interval because it implies large overhead. Thus, I am not sure if this would help for your case, as it seem you want a more timely "response".
As an alternative, you can also disable caching on the KTable and use a toStream().process() step -- after each update to the KTable, the changelog stream provided by toStream() will contain the record and you can access the record metadata (including its offset) in the Processor via the given ProcessorContext object. Thus should also allow you to figure out, when the record is available in the KTable.

How does Mass Transit handle retries deduplication and message id generation when using in-memory outbox

Mass Transit has an in-memory "outbox" implementation that I think will handle the majority of the concerns / challenges I am looking to over come however I can not find a lot of documentation that describes its capabilities in the detail I am looking for. A lot of these questions came about after watching a video where Udi Dahan explains how to handle reliable messaging without distributed transactions (https://vimeo.com/111998645).
Does the in-memory outbox handle failures that may happen when trying to send a message to the queue? So for example: A consumer generates 3 messages that are collected in the outbox. The consumer completes without issue.The collected messages in the outbox start being processed
If from some reason while processing the collected message there is a network issue (or other issue) and message 2 fails to be sent what will happen to message 2 and 3? Is there any sort of retry policy?
What happens if a message being processed in the outbox is successfully added to the queue but is unsuccessfully marked as sent in the outbox? Will there be another attempt to send the message to the queue?
Assuming the outbox will retry sending a message to a queue if there is some sort of failure is the message ID guaranteed to be consistent between attempts? Having a consistent Message ID is important for de-duplication to ensure we do not process the same message multiple times.
When a message is consumed is there any de-duplication that takes place? (This ties back to 1.C)
How does Mass Transit track processed records for each consumer? Do the storage engines take care of this responsibility?
Is there any sort of "transaction" exposed to the consumer that allows you to clear the collected message in the outbox without throwing an exception or is throwing an exception the only way to rollback the outbox?
What about messages that are generated outside of a consumer, Is there a way to rollback messages collected in the outbox (example: A WebAPI controller action)?
Is there a recommendation to use the DTC features of Mass Transit instead of outbox or vice versa or use them both?
Currently Mass Transit does not have an outbox implementation that can survive a process crash. Is there a plan to include such a feature? Is there a road map this is tracked on?
The in-memory outbox defers any message send/publish/respond calls until the consumer has completed all processing. This includes regular consumers and sagas. The very last thing the consumer does is send/publish any deferred messages, after which the incoming message is acknowledged (and removed from the queue). With that said, most of the remaining items in your question aren't relevant, because it isn't writing messages to a database, and then processing them afterwards.
No
No
Don't use the DTC, it isn't even supported in .NET Core
No plans, nothing on the roadmap
As you said at the start, the in-memory outbox handles 99.9% of the cases. A well-designed saga and supporting services can push that even higher, ensuring idempotency and eventually successful command (or event) processing. Anything beyond what's there today is typically to support poorly designed systems and just creates way too much complexity with extra dependencies.

Spring Kafka Auto Commit Offset In Case of Failures

I am using Spring Kafka 1.2.2.RELEASE. I have a Kafka Listener as consumer that listens to a topic and index the document in elastic.
My Auto commit offset property is set to true //default.
I was under the impression that in case there is an exception in the listener(elastic is down) the offsets should not be committed and the same message should be processed for the next poll
However this is not happening and the consumer commits the offset on the next poll.After reading posts and documentation i learnt that this is the case that with auto commit set to true to next poll will commit all offset
My doubt is why is the consumer calling the next poll and also how can i prevent any offset from committing with auto commit to true or do i need to set this property to false and commit manually.
I prefer to set it to false; it is more reliable for the container to manage the offsets for you.
Set the container's AckMode to RECORD (it defaults to BATCH) and the container will commit the offset for you after the listener returns.
Also consider upgrading to at least 1.3.3 (current version is 2.1.4); 1.3.x introduced a much simpler threading model, thanks to KIP-62
EDIT
With auto-commit, the offset will be committed regardless of success/failure. The container won't commit after a failure, unless ackOnError is true (another reason to not use auto commit).
However, that still won't help because the broker won't send the same record again. You have to perform a seek operation on the Consumer for that.
In 2.0.1 (current version is 2.1.4), we added the SeekToCurrentErrorHandler which will cause the failed and unprocessed records to be re-sent on the next poll. See the reference manual.
You can also use a ConsumerAwareListener to perform the seek yourself (also added in 2.0).
With older versions (>= 1.1) you have to use a ConsumerSeekAware listener which is quite a bit more complicated.
Another alternative is to add retry so the delivery will be re-attempted according to the retry settings.
Apparently, there will be message loss with Spring Kafka <=1.3.3 #KafkaListener even if you use ackOnError=false if we expect Spring Kafka to automatically (at least document) take care of this by retrying and "simply not doing poll again"? :). And, the default behavior is to just log.
We were able to reproduce message loss/skip on a consumer even with spring-kafka 1.3.3.RELEASE (no maven sources) and with a single partition topic, concurrency(1), AckOnError(false), BatchListener(true) with AckMode(BATCH) for any runtime exceptions. We ended up doing retries inside the template or explore ConsumerSeekAware.
#GaryRussell, regarding "broker won't send same record again" or continue returning next batch of messages without commit?, is this because, consumer poll is based on current offset that it seeked to get next batch of records and not exactly on last offsets committed? Basically, consumers need not commit at all assuming some run time exceptions on every processing and keep consuming entire messages on topic. Only a restart will start from last committed offset (duplicate).
Upgrade to 2.0+ to use ConsumerAwareListenerErrorHandler seems requires upgrading to at least Spring 5.x which is a major upgrade.

MDB CLIENT_ACKNOWLEDGEMENT mode with max-messages-in-transaction >1

I have a need where I want to group messages received from a system based on certain criterion. For performance reasons, I want to avoid persisting these individual messages before I can group them. I've seen that JMS implementations provide transaction batching over a set of messages as given in
Document 1
Document 2
But I also want the acknowledgement of batch to be controlled by my code; as in case there is some issue in grouping, I should be able to rollback the batch I am reading, to be able to process the message in next try.
From above links, as the transaction is managed by container over a set of onMessage calls, I would not control the transaction commit and rollback.
Can someone let me know if I misreading it and what would be the way to achieve this?

Resources