How is offset committed in Spring Kafka? - spring

I am using Spring Kafka implementation to integrate with Kafka.
And am struggling to find out how internally Spring Kafka is handling offset commit.
I need this knowhow to decide my strategy on Disaster Recovery while switching from one Kafka Broker to the DR Kafka Broker.
Please help or route me to a post/blog which explains how offset commits are handled by Spring's implementation of Kafka. Thanks.

See documentation for some info: https://docs.spring.io/spring-kafka/docs/current/reference/html/#committing-offsets.
In the end the commit is delegated to the KafkaConsumer any way:
this.consumer.commitSync(commits, this.syncCommitTimeout);
or
this.consumer.commitAsync(commits, (offsetsAttempted, exception) -> {
So, when you switch from one broker to another without clustering between them, all those commits and offset tracking does not make sense. Just because the data on a new broker is fully new and it has its own offsets, even if topic name and partitions are the same over there.

Related

How to change offset of a topic during runtime?

I have a Producer for kafka topic which keeps on pushing some messages to kafka topic. And also I have another service reading these messages from topic.
I have an business use-case, where sometimes consumer need to ignore all the messages which are already there in queue and start processing only new upcoming messages. Can this be archived without stopping and restarting the kafka server.
I am working on GO. So if kafka supports such requirement, is there any way I can change configuration of consumer to start consuming from latest message using sarama GO client.
Thank you in advance.
You could use a random UUID for consumer group id, and/or disable auto commits, then you can start at the latest offset with
config := sarama.NewConfig()
config.Consumer.Offsets.Initial = sarama.OffsetOldest
(copied from Sarama example code)
Otherwise, Kafka consumer API should have a seekToEnd function, but it seems to be exposed in Sarama as getting high watermarks from consumer for every partition, then calling ResetOffets on a ConsumerGroup instance. Note: the group should be paused before doing that.

Persist state of Kafka Producer within Spring Clod/Boot

I want to implement a Kafka Producer with Spring that observes a Cloud Storage and emits meta informations about newly arrived files.
Until now we did that with a Kafka Connector but for some reasons we now have to do this with a simple Kafka producer.
Now I need to persist the state of the producer (e.g. timestamp of last commited file) in a kind of Offset Topic like the Connector did, but did not find a reasonable approach to do that.
My current idea is to hold the state by committing it to a topic that the producer also consumes but just acknowledge the last consumed state when commuting a new one. So if the Kubernetes pod of the producer dies and comes up again to consume the last state (not acknowledged) and so knows where it stopped.
But this idea seems to be a bit complex to just hold a state of a Kafka app. Is there a better approach for that?

Spring Integration - Kafka Message Driven Channel - Auto Acknowledge

I have used the sample configuration as was listed in the spring io docs and it is working fine.
<int-kafka:message-driven-channel-adapter
id="kafkaListener"
listener-container="container1"
auto-startup="false"
phase="100"
send-timeout="5000"
channel="nullChannel"
message-converter="messageConverter"
error-channel="errorChannel" />
However, when i was testing it with downstream application where i consume from kafka and publish it to downstream. If downstream is down, the messages were still getting consumed and was not replayed.
Or lets say after consuming from kafka topic , in case i find some exception in service activator, i want to throw some exception as well which should rollback the transaction so that kafka messages can be replayed.
In brief, if the consuming application is having some issue , then i want to roll back the transaction so that messages are not automatically acknowledged and are replayed back again and again unless it is succesfuly processed.
That's not how Apache Kafka works. There is the TX semantics similar to JMS. The offset in Kafka topic has nothing with rallback or redelivery.
I suggest you to study Apache Kafka closer from their official resource.
Spring Kafka brings nothing over the regular Apache Kafka protocol, however you can consider to use retry capabilities in the Spring Kafka to redeliver the same record locally : http://docs.spring.io/spring-kafka/docs/1.2.2.RELEASE/reference/html/_reference.html#_retrying_deliveries
And yes, the ack mode must be MANUAL, do not commit offset into the Kafka automatically after consuming.

For a spring enterprise web application with multiple instances, What is the way to retrieve the offset value from Kafka and store it?

I'm working on an enterprise web application that has a requirement to read from a Kafka system and then trigger events. Can anyone suggest a way to get the offset and also an ideal way to store the offset (Ideal way should be able to handle accessing by multiple instances of the application)?
Note:-
I'm using spring-kafka and open for any further suggestions.
Thanks in advance.
With recent versions of Kafka, the offset is stored in a kafka topic. Kafka keeps track of the consumer offset for each partition in a topic __consumer_offsets which is a compacted topic; in other words; kafka itself keeps track of the offset for each consumer group.
With Spring for Apache Kafka; several options are provided for when the offset is committed.
In earlier versions of kafka offsets were often stored externally; it's now a lot simpler.
There may still be use cases for that but such scenarios are all supported by Spring Kafka; especially with the upcoming 2.0 release.

Connection between Apache Kafka and JMS

I was wondering could Apache Kafka communicate and send messages to JMS? Can I establish connection between them? For example, I'm using JMS in my system and it should send messages to the other system that uses Kafka
answering bit late, but if I understood correctly the requirement.
If the requirement is synchronous messaging from
client->JMS->Kafka --- > consumer
then following is not the solution, but if its ( and most likely) the async requirement like:
client->JMS | ----> Kafka ---> consumer
then, this would be related to KafkaConnect framework which is solving the problem of how to integrate different sources and sinks with Kafka.
http://docs.confluent.io/2.0.0/connect/
http://www.confluent.io/product/connectors
so what you need is a JMSSourceConnector.
Not directly. And the two are incomparable concepts. JMS is a vendor-neutral API specification of a messaging service.
While Kafka may be classified as a messaging service, it is not compatible with the JMS API, and to the best of my knowledge there is no trivial way of adapting JMS to fit Kafka's use cases without making significant compromises.
However, if your needs are simply to move messages between Kafka and a JMS-compliant broker, then this can easily be achieved by either writing a simple relay app that consumes from one and publishes onto another, or use something like Kafka Connect, which has pre-canned sinks for most data sources, including JMS brokers, databases, etc.
If the requirement is the reverse of the previous answer:
Kafka Producer -> Kafka Broker -> JMS Broker -> JMS Consumer
then you would need a KafkaConnect Sink like the following one from Data Mountaineer
http://docs.datamountaineer.com/en/latest/jms.html

Resources