Read latest message from kafka - segmentio/kafka-go - go

I'm using segmentio/kafka-go client to read messages from a topic.
I'm unable to find.. how to start reading from last/new message.
Everytime I start the code, it starts reading from beginning offset in that partition.

What you need to know about consuming messages from Kafka is that each consumer client is part of a Consumer Group. Kafka stores the already processed offset for each Consumer Group at Topic-Partition level in an internal Kafka topic called __consumer_offsets. This enables a consumer of a Consumer Group to continue consumption from where it left off after a re-start.
In your case it means you need to set the Consumer Group (in the KafkaConsumer API it is the configuration "group.id") and keep it constant. Only then you will be able to continue reading from the latest/new est message and not start from beginning after a re-start.

Related

What will happen if my kafka consumer group is changed after each restart

Let’s say for instance, my kafka consumer (in Consumer Group 1) is reading messages from Kafka Topic A.
Now if that consumer consumes 12 messages before failing.
When the consumer starts up again, and now it has different consumer group (i.e. consumer group 2),
Question 1 -? On restart, will it continue from where it left off in the offset (or position) because that offset is stored by Kafka and/or ZooKeeper or will it start consuming messages from 1st message.
Question 2-> Is there a way to ensure that on restart (When consumer has different consumer group), it still start consuming from where it left off before restarting?
Just to give you the context, i am trying to update in-memory caches in each node/server on receiving a message on kafka topic. In order to do that, i am using a different consumer group for each node/server so that each message is consumed by all the nodes/servers to update in-memory cache. Please let me know if there are better ways to do this. Thanks!
Consumer offsets are maintained per consumer group and hence if you have a different consumer group on each restart you can make use of the auto.offset.reset property
The auto.offset.reset property specifies
What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted): earliest: automatically reset the offset to the earliest offsetlatest: automatically reset the offset to the latest offsetnone: throw exception to the consumer if no previous offset is found for the consumer's groupanything else: throw exception to the consumer.
Having informed about the current approach - I believe you should relook at the design and it would be better to have a different consumer group per node but ensure to keep the same consumer group name per node even after a restart. This is a suggestion based on the info provided but there could be better solutions as well after going into the detail of the design/implementation.

Spring Boot Kafka - Message management with consumer different

My application create with SpringBoot and is in cluster (two different istance openshit)
Every istance has one consumer that read message of topic in replication factory.
I would like to find a mechanism to block the reading of a message into topic in replication factory if it has already been read by one of the two consumers
Example:
CONSUMER CLIENT A -- READ MSG_1 --> BROKER_1
- Offset increase
- Commit OK
CONSUMER CLIENT B --> NOT READ MSG_1 --> BROKER_1
-- Correct beacause already commit
Now BROKER_1 is show and new lead is BROKER_2
How can I block the already read message into BROKER_2?
Thanks all!
Giuseppe.
Replication factor doesn't control if/how consumers read messages. The partition count does. If the topic only has one partition, then only one consumer instance is able to read messages, and all other instances are "blocked". And if the message is already read and commited then it doesn't matter which broker is the leader because the offsets are maintained per topic, not per replica
If you have more than one partition and you still want to block consumers from being able to read data, then you'll need to implement some external, coordinated lock via Zookeeper, for example

how to handle consumer recovery by offset with shopify sarama

I've read that kafka provides a consumer client library that allows recovery by saving the last offset read in zookeeper (not 100% sure about where it's stored).
Is it possible to do the same with Sarama consumers?
Let's say that I'm reading until offset 550, my consumer crashes for 5 min, we are now at offset 700 but I want to resume consuming from offset 550.
Is that possible without having to save the state by myself?
I would assume it does but I don't understand how.
I've found sarama.OffsetNewest/Oldestbut that's not what I'm looking for...
Kafka consumers used to store offsets in Zookeeper but now they store them directly in Kafka. See the Consumer section in the Kafka docs.
Sarama handles that very well and Sarama consumers will commit (store) offsets in Kafka by default.
Have a look at the Sarama Consumer example. Initially this example starts at the end of the topic but upon restarting, it will restart from its last position.

How to rewind and look at previous offset in a partition using Kafka Go client's Consumer

I am new to Kafka. Currently I am experimenting with this Channel Consumer example from Confluent Inc's Github repo
From what I know, consumers are separated into groups. Each group has their own offset in the partition. Let's say I have 40 messages in a particular topic let's call it, owner_commands. A consumer, belongs to the dog group, joins and begins to consume those 40 messages.
When I disconnected and reconnected this consumer, I noticed that messages don't show up anymore. It says that I have reached the end of file. However, if I join the cluster with another consumer, which belongs to a different group (say cat) I get to read those 40 messages again.
Do you know if there is a way for consumers in the dog group to rewind and replay those messages again using Kafka's Go API. I looked at the source code for Kafka Golang API, I couldn't find anything that indicates to me that I can rewind and look at a particular message in the past.
Thank you
You could use CommitOffsets and just commit back to the offset you want to rewind to. The next poll will start from that offset.
CommitOffsets is documented here:
http://docs.confluent.io/current/clients/confluent-kafka-go/index.html#Consumer.CommitOffsets
Outside of the API, there's functionality in the kafka-consumer-groups command to move the position of consumer groups as well. This is released with Apache Kafka 0.11.

Does Kafka have Durable Subscriptions feature?

I'm interested to use Kafka in one of my projects, but there is a requirement that the messaging broker have to keep the the messages when one of the subscriber (consumer) is disconnected.
I see that JMS have this feature.
In the website it said that Kafka had durability features.
Is it the same like JMS or is it have different meaning ?
Consumer pulls the data from kafka (brokers). Consumer specifies the offset from where it wants to gather the data. If Consumer disconnects and comes back, it can continue where it left. It can also start consuming data from earlier point (changing the offset).
Kafka does support a durable consumer style pattern, but there are a few ways to achieve it.
First you need to understand the concept of Offsets and Consumer Position
Kafka maintains a numerical offset for each record in a partition.
This offset acts as a unique identifier of a record within that
partition, and also denotes the position of the consumer in the
partition. For example, a consumer which is at position 5 has consumed
records with offsets 0 through 4 and will next receive the record with
offset 5. There are actually two notions of position relevant to the
user of the consumer: The position of the consumer gives the offset of
the next record that will be given out. It will be one larger than the
highest offset the consumer has seen in that partition. It
automatically advances every time the consumer receives messages in a
call to poll(Duration).
The committed position is the last offset that has been stored
securely. Should the process fail and restart, this is the offset that
the consumer will recover to. The consumer can either automatically
commit offsets periodically; or it can choose to control this
committed position manually by calling one of the commit APIs (e.g.
commitSync and commitAsync).
The offset can be stored/persisted on either the Kafka server or the client side:
Kafka Server persists/holds the consumers position, in this case there are 2 sub options:
Consumer explicitly commits the message consumption
Consumer automatically commits the message consumption
Client application persists/holds
the consumers position
This is all as per https://kafka.apache.org/22/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html.

Resources