For a spring enterprise web application with multiple instances, What is the way to retrieve the offset value from Kafka and store it? - spring

I'm working on an enterprise web application that has a requirement to read from a Kafka system and then trigger events. Can anyone suggest a way to get the offset and also an ideal way to store the offset (Ideal way should be able to handle accessing by multiple instances of the application)?
Note:-
I'm using spring-kafka and open for any further suggestions.
Thanks in advance.

With recent versions of Kafka, the offset is stored in a kafka topic. Kafka keeps track of the consumer offset for each partition in a topic __consumer_offsets which is a compacted topic; in other words; kafka itself keeps track of the offset for each consumer group.
With Spring for Apache Kafka; several options are provided for when the offset is committed.
In earlier versions of kafka offsets were often stored externally; it's now a lot simpler.
There may still be use cases for that but such scenarios are all supported by Spring Kafka; especially with the upcoming 2.0 release.

Related

Kafka cluster unavailable and how to listen to reconnected

In one of our spring boot applications I'm developing a retry mechanism. Below summarized what this new feature needs to do:
If the kafka cluster for whatever reason is not available the application should keep running because records should be inserted in the database.
When the kafka cluster is available again a producer should send all newly inserted records to a topic.
What I'm looking for is an event which tells me kafka is back up and running. So far I'm unable to find something like this. What I did find are forums telling this is not supported. I was wondering if somebody has experience how to implement this.
If you have any questions for me to clarify please let me know.
We are using spring-kafka 2.8.2
Kind regards,
Josip

Spring Batch and Kafka

I am a junior programmer in banking. I want to make a microservice system that get data from kafka and processes it. after that, save to database and send final data to client app. What technology can i use? I plan to use spring bacth and kafka. Can the technology be implemented in my project or is there a better alternative?
To process data from a Kafka topic I recommend you to use Kafka Streams API, especially Spring Kafka Streams.
Kafka Streams and Spring
And to store the data in a database, you should use a Kafka Sink Connector.
Kafka Connect
This approach is very common and easy if your company has a Kafka ecosystem.
In terms of alternatives, here you will find an interesting comparison:
https://scramjet.org/blog/welcome-to-the-family
3 in 1 serverless
Scramjet takes a slightly different approach - 3 platforms in one.
Both the free product https://hub.scramjet.org/ for installation on your server and the cloud platform are available - currently also free in the beta version https://scramjet.org/#join-beta

How is offset committed in Spring Kafka?

I am using Spring Kafka implementation to integrate with Kafka.
And am struggling to find out how internally Spring Kafka is handling offset commit.
I need this knowhow to decide my strategy on Disaster Recovery while switching from one Kafka Broker to the DR Kafka Broker.
Please help or route me to a post/blog which explains how offset commits are handled by Spring's implementation of Kafka. Thanks.
See documentation for some info: https://docs.spring.io/spring-kafka/docs/current/reference/html/#committing-offsets.
In the end the commit is delegated to the KafkaConsumer any way:
this.consumer.commitSync(commits, this.syncCommitTimeout);
or
this.consumer.commitAsync(commits, (offsetsAttempted, exception) -> {
So, when you switch from one broker to another without clustering between them, all those commits and offset tracking does not make sense. Just because the data on a new broker is fully new and it has its own offsets, even if topic name and partitions are the same over there.

duplicate consumption of messages with Spring Cloud Stream Kafka binder

We have several micro-services using Spring Boot and Spring Cloud Stream Kafka binder to communicate between them.
Occasionally, we observe bursts of duplicate messages received by a consumer - often several days after it was first consumed and processed (successfully).
While I understand that Kafka does not guarantee exactly-once delivery, it still looks very strange, given that there were no rebalancing events or any 'suspicious' activity in the logs of either the brokers nor the services. Since the consumer is interacting with external APIs, it is a bit difficult to make it idempotent.
Any hints what might be the cause of duplication? What should I be looking for to figure this out?
We are using Kafka broker 1.0.0, and this particular consumer uses Spring Cloud Stream Binder Kafka 2.0.0, which is based on kafka-client 1.0.2 (version of the other services might be a bit different).
You should show your configuration when asking questions like this.
Best guess is the broker's offsets.retention.minutes.
With modern broker versions (since 2.0), it defaults to 1 week; with older versions it was only one day.

Kafka streams - exactly once configuration

I was trying to use exactly once capabilities of kafka using kafka streams library. I've only configured proessing.guarantee as exactly_once. Along with this, there is a need to have transaction state stored in a internal topic (__transaction_state).
My question is, how to customize the name of the topic? if kafka cluster is being shared by multiple consumers, does each customer need a different topic for transaction management?
Thanks
Murthy
You don't need to worry about the topic __transaction_state -- it's an internal topic that will be automatically created for you -- you don't need to create it manually and it will always have this name (it's not possible to customize the name). It will be used for all producers that use transactions.

Resources