What will happen if my kafka consumer group is changed after each restart - spring-boot

Let’s say for instance, my kafka consumer (in Consumer Group 1) is reading messages from Kafka Topic A.
Now if that consumer consumes 12 messages before failing.
When the consumer starts up again, and now it has different consumer group (i.e. consumer group 2),
Question 1 -? On restart, will it continue from where it left off in the offset (or position) because that offset is stored by Kafka and/or ZooKeeper or will it start consuming messages from 1st message.
Question 2-> Is there a way to ensure that on restart (When consumer has different consumer group), it still start consuming from where it left off before restarting?
Just to give you the context, i am trying to update in-memory caches in each node/server on receiving a message on kafka topic. In order to do that, i am using a different consumer group for each node/server so that each message is consumed by all the nodes/servers to update in-memory cache. Please let me know if there are better ways to do this. Thanks!

Consumer offsets are maintained per consumer group and hence if you have a different consumer group on each restart you can make use of the auto.offset.reset property
The auto.offset.reset property specifies
What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted): earliest: automatically reset the offset to the earliest offsetlatest: automatically reset the offset to the latest offsetnone: throw exception to the consumer if no previous offset is found for the consumer's groupanything else: throw exception to the consumer.
Having informed about the current approach - I believe you should relook at the design and it would be better to have a different consumer group per node but ensure to keep the same consumer group name per node even after a restart. This is a suggestion based on the info provided but there could be better solutions as well after going into the detail of the design/implementation.

Related

duplicate events by consumer

we observed that one of the consumer try to pick the events multiple times from kafka topic. we have the below seetings on consumer application side.
spring.kafka.consumer.enable-auto-commit=false & spring.kafka.consumer.auto-offset-reset=earliest.
how to avoid the duplicate by the consumer application.
Do we need to fine tune the above configuration settings to avoid the consumer to pick the events multiple times from the kafka topic.
Since you've disabled auto commits, you do need to fine tune when you actually commit a record, otherwise you could have at least once processing.
You could also read the examples of the exactly once processing capabilities using transactions and idempotent producers
The auto.offset.reset only applies if your consumer group is removed, or never exists at all (you're not committing anything). In that case, you're always going to read from the beginning of the topic

Read latest message from kafka - segmentio/kafka-go

I'm using segmentio/kafka-go client to read messages from a topic.
I'm unable to find.. how to start reading from last/new message.
Everytime I start the code, it starts reading from beginning offset in that partition.
What you need to know about consuming messages from Kafka is that each consumer client is part of a Consumer Group. Kafka stores the already processed offset for each Consumer Group at Topic-Partition level in an internal Kafka topic called __consumer_offsets. This enables a consumer of a Consumer Group to continue consumption from where it left off after a re-start.
In your case it means you need to set the Consumer Group (in the KafkaConsumer API it is the configuration "group.id") and keep it constant. Only then you will be able to continue reading from the latest/new est message and not start from beginning after a re-start.

Spring Boot Kafka - Message management with consumer different

My application create with SpringBoot and is in cluster (two different istance openshit)
Every istance has one consumer that read message of topic in replication factory.
I would like to find a mechanism to block the reading of a message into topic in replication factory if it has already been read by one of the two consumers
Example:
CONSUMER CLIENT A -- READ MSG_1 --> BROKER_1
- Offset increase
- Commit OK
CONSUMER CLIENT B --> NOT READ MSG_1 --> BROKER_1
-- Correct beacause already commit
Now BROKER_1 is show and new lead is BROKER_2
How can I block the already read message into BROKER_2?
Thanks all!
Giuseppe.
Replication factor doesn't control if/how consumers read messages. The partition count does. If the topic only has one partition, then only one consumer instance is able to read messages, and all other instances are "blocked". And if the message is already read and commited then it doesn't matter which broker is the leader because the offsets are maintained per topic, not per replica
If you have more than one partition and you still want to block consumers from being able to read data, then you'll need to implement some external, coordinated lock via Zookeeper, for example

Kafka 2.1 behaviour change for retentions and Kafka Stream application, what can we so that retention works?

Following is from the Kafka Documentation for 2.1.
https://kafka.apache.org/documentation/
Offset expiration semantics has slightly changed in this version.
According to the new semantics, offsets of partitions in a group will
not be removed while the group is subscribed to the corresponding
topic and is still active (has active consumers). If group becomes
empty all its offsets will be removed after default offset retention
period (or the one set by broker) has passed (unless the group becomes
active again). Offsets associated with standalone (simple) consumers,
that do not use Kafka group management, will be removed after default
offset retention period (or the one set by broker) has passed since
their last commit.
If I understand this correctly, as long as Stream Thread consumer's are connected, no retention setting will be effective?
I also started to observe following Exception after the restart of stream application
stream thread - Restoring Stream Tasks failed. Deleting StreamTasks stores to recreate from scratch.
org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions:' but stream application uses the property 'StreamsConfig.consumerPrefix(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG), "earliest"'...
I think it has to do something with retention but I can't tell what?
If I understand this correctly, as long as Stream Thread consumer's are connected, no retention setting will be effective?
This applies to __consumer_offset topic only, that is a Kafka internal topic. For all regular/user topics, retention time is applied the same way as in all previous versions. Also note, this only applies if you upgrade your brokers to 2.1.
For the log message of Streams: you don't need to worry about it. It seems that your application was offline for a longer time, and thus, your local store is not in a consistent state any longer. Thus, it's deleted and recreated from scratch from the changelog topic.

Does Kafka have Durable Subscriptions feature?

I'm interested to use Kafka in one of my projects, but there is a requirement that the messaging broker have to keep the the messages when one of the subscriber (consumer) is disconnected.
I see that JMS have this feature.
In the website it said that Kafka had durability features.
Is it the same like JMS or is it have different meaning ?
Consumer pulls the data from kafka (brokers). Consumer specifies the offset from where it wants to gather the data. If Consumer disconnects and comes back, it can continue where it left. It can also start consuming data from earlier point (changing the offset).
Kafka does support a durable consumer style pattern, but there are a few ways to achieve it.
First you need to understand the concept of Offsets and Consumer Position
Kafka maintains a numerical offset for each record in a partition.
This offset acts as a unique identifier of a record within that
partition, and also denotes the position of the consumer in the
partition. For example, a consumer which is at position 5 has consumed
records with offsets 0 through 4 and will next receive the record with
offset 5. There are actually two notions of position relevant to the
user of the consumer: The position of the consumer gives the offset of
the next record that will be given out. It will be one larger than the
highest offset the consumer has seen in that partition. It
automatically advances every time the consumer receives messages in a
call to poll(Duration).
The committed position is the last offset that has been stored
securely. Should the process fail and restart, this is the offset that
the consumer will recover to. The consumer can either automatically
commit offsets periodically; or it can choose to control this
committed position manually by calling one of the commit APIs (e.g.
commitSync and commitAsync).
The offset can be stored/persisted on either the Kafka server or the client side:
Kafka Server persists/holds the consumers position, in this case there are 2 sub options:
Consumer explicitly commits the message consumption
Consumer automatically commits the message consumption
Client application persists/holds
the consumers position
This is all as per https://kafka.apache.org/22/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html.

Resources