Persist state of Kafka Producer within Spring Clod/Boot - spring

I want to implement a Kafka Producer with Spring that observes a Cloud Storage and emits meta informations about newly arrived files.
Until now we did that with a Kafka Connector but for some reasons we now have to do this with a simple Kafka producer.
Now I need to persist the state of the producer (e.g. timestamp of last commited file) in a kind of Offset Topic like the Connector did, but did not find a reasonable approach to do that.
My current idea is to hold the state by committing it to a topic that the producer also consumes but just acknowledge the last consumed state when commuting a new one. So if the Kubernetes pod of the producer dies and comes up again to consume the last state (not acknowledged) and so knows where it stopped.
But this idea seems to be a bit complex to just hold a state of a Kafka app. Is there a better approach for that?

Related

How can I retrieve Kafka messages inside a controller in Spring Boot?

The messages created by the producer are all being consumed as expected.
The thing is, I need to create an endpoint to retrieve the latest messages from the consumer.
Is there a way to do it?
Like an on-demand consumer?
I found this SO post but is only to consume the last N records. I want to consume the latest without caring about the offsets.
Spring Kafka Consumer, rewind consumer offset to go back 'n' records
I'm working with Kotlin but if you have the answer in Java I don't mind either.
There are several ways to create listener containers dynamically; you can then start/stop them on demand. To get the records back into the controller, you'd need to use something like a blocking queue, or make the controller itself a MessageListener.
These answers show a couple of techniques for creating containers on demand:
How to dynamically create multiple consumers in Spring Kafka
Kafka Consumer in spring can I re-assign partitions programmatically?

How to change offset of a topic during runtime?

I have a Producer for kafka topic which keeps on pushing some messages to kafka topic. And also I have another service reading these messages from topic.
I have an business use-case, where sometimes consumer need to ignore all the messages which are already there in queue and start processing only new upcoming messages. Can this be archived without stopping and restarting the kafka server.
I am working on GO. So if kafka supports such requirement, is there any way I can change configuration of consumer to start consuming from latest message using sarama GO client.
Thank you in advance.
You could use a random UUID for consumer group id, and/or disable auto commits, then you can start at the latest offset with
config := sarama.NewConfig()
config.Consumer.Offsets.Initial = sarama.OffsetOldest
(copied from Sarama example code)
Otherwise, Kafka consumer API should have a seekToEnd function, but it seems to be exposed in Sarama as getting high watermarks from consumer for every partition, then calling ResetOffets on a ConsumerGroup instance. Note: the group should be paused before doing that.

How is offset committed in Spring Kafka?

I am using Spring Kafka implementation to integrate with Kafka.
And am struggling to find out how internally Spring Kafka is handling offset commit.
I need this knowhow to decide my strategy on Disaster Recovery while switching from one Kafka Broker to the DR Kafka Broker.
Please help or route me to a post/blog which explains how offset commits are handled by Spring's implementation of Kafka. Thanks.
See documentation for some info: https://docs.spring.io/spring-kafka/docs/current/reference/html/#committing-offsets.
In the end the commit is delegated to the KafkaConsumer any way:
this.consumer.commitSync(commits, this.syncCommitTimeout);
or
this.consumer.commitAsync(commits, (offsetsAttempted, exception) -> {
So, when you switch from one broker to another without clustering between them, all those commits and offset tracking does not make sense. Just because the data on a new broker is fully new and it has its own offsets, even if topic name and partitions are the same over there.

Does Spring Kafka producer guarantee delivery by default?

I wonder whether spring kafka Producer within spring boot guarantee delivery or not.
Does anybody know what happens if some random listener fails to receive message? Would spring kafka retry to send the message?
There are some concepts here:
Producer will produce events and send them to kafka server. You must be aware on the producer side for retries and things like that if Kafka will have downtime or other error scenarios that are specific to your context.
Consumers will have assign partitions by Kafka, each partition will deliver events and each event will have an offset. Consumers will poll for data from kafka (they will request for data, kafka will not push data to consumers, but consumers will go to kafka and require data). Every event that is delivered with success by Kafka to the consumers will produce and Acknowledgment and Kafka will commit the offset of the event. So the next event, with a higher offeset will be delivered to the consumer. If a consumer goes down, partitions will be reasigned to other consumers, so you won't lose your data. If you have only one consumer, the data will be stored in Kafka and when the consumer will be back, it will go and request data from the latest/earliest offset.

messages published to all consumers with same consumer-group in spring-data-stream project

I got my zookeeper and 3 kafka broker running locally.
I started one producer and one consumer. I can see consumer is consuming message.
I then started three consumers with same consumer group name (different ports since its a spring boot project). but what I found is that all the consumers are now consuming (receiving) messages. But I expect the message to be load-balanced in that only messages are not repeated across the consumers. I don't know what the problem is.
Here is my property file
spring.cloud.stream.bindings.input.destination=timerTopicLocal
spring.cloud.stream.kafka.binder.zkNodes=localhost
spring.cloud.stream.kafka.binder.brokers=localhost
spring.cloud.stream.bindings.input.group=timerGroup
Here the group is timerGroup.
consumer code : https://github.com/codecentric/edmp-sample-stream-sink
producer code : https://github.com/codecentric/edmp-sample-stream-source
Can you please update dependencies to Camden.RELEASE (and start using Kafka 0.9+) ? In Brixton.RELEASE, Kafka consumers were 0.8-based and required passing instanceIndex/instanceCount as properties in order to distribute partitions correctly.
In Camden.RELEASE we are using the Kafka 0.9+ consumer client, which does load-balancing in the way you are expecting (we also support static partition allocation via instanceIndex/instanceCount, but I suspect this is not what you want). I can enter into more details on how to configure this with Brixton, but I guess an upgrade should be a much easier path.

Resources