CONSUME_FROM_LAST offset is noneffective - consumer

I changed my consume group and start to consume,it consumed from last offset as default,it is right.
But when i restart this consumer it consume the history message,why it not consume the history message for the first time startup but do for the second time.
And the offsets of brokers from 3000 to 4000.

CONSUME_FROM_LAST_OFFSET only takes effect on a brand new consume group.
So for your first restart, you change your consume group(Say from CG-A to CG-B) so it is brand new consume group, then it will consume from the last offset as you expected.
For your second attempt, after offline for long, you restart your instance with the same consume grop GC-B, rocketmq will consume from the offset broker has been remembered, which is actually working by design.
So for the scenario that you want to skip the history message but only consume from the last offset, you may either
Change your consume group to a brand new one with CONSUME_FROM_LAST_OFFSET
Remain using your consume group but reset the consume group's consume offset to the latest offset before restarting your instance.
NOTE
If you are using broadcast consumer, whose offset is stored in local file(the fil path related to the instance name) rather than broker, please specify your instance name
consumer.setInstanceName("YOUR_INSTANCE_NAME");
. Otherwise, every time you restart, the instance name is changed(default to process id) thus consumer could not file so it may consume from the last offset every time you restart.

Related

Polling behavior when using ReactiveKafkaConsumerTemplate

I have a Spring Boot application using ReactiveKafkaConsumerTemplate for consuming messages from Kafka.
I've consume messages using kafkaConsumerTemplate.receive() therefore I'm manually acknowledging each message. Since I'm working in an asynchronous manner, messages are not processed sequentially.
I'm wondering how does the commit and poll process work in this scenario - If I polled 100 messages but acknowledged only 99 of them (message not acknowledged is in the middle of the 100 messages I polled, say number 50), what happens on the next poll operation? Will it actually poll only after all 100 messages are acknowledged (and offset is committed) and until then I'll keep getting the un-acknowledged messages over and over to my app until I acknowledge it?
Kafka maintains 2 offsets for a consumer group/partition - the current position() and the committed offset. When a consumer starts, the position is set to the last committed offset.
Position is updated after each poll, so the next poll will never return the same record, regardless of whether it has been committed (unless a seek is performed).
However, with reactor, you must ensure that commits are performed in the right order, since records are not acknowledged individually, just the committed offset is retained.
If you commit out of order and restart your app, you may get some processed messages redelivered.
We recently added support in the framework for out-of-order commits.
https://projectreactor.io/docs/kafka/release/reference/#_out_of_order_commits
The current version is 1.3.11, including this feature.

What will happen if my kafka consumer group is changed after each restart

Let’s say for instance, my kafka consumer (in Consumer Group 1) is reading messages from Kafka Topic A.
Now if that consumer consumes 12 messages before failing.
When the consumer starts up again, and now it has different consumer group (i.e. consumer group 2),
Question 1 -? On restart, will it continue from where it left off in the offset (or position) because that offset is stored by Kafka and/or ZooKeeper or will it start consuming messages from 1st message.
Question 2-> Is there a way to ensure that on restart (When consumer has different consumer group), it still start consuming from where it left off before restarting?
Just to give you the context, i am trying to update in-memory caches in each node/server on receiving a message on kafka topic. In order to do that, i am using a different consumer group for each node/server so that each message is consumed by all the nodes/servers to update in-memory cache. Please let me know if there are better ways to do this. Thanks!
Consumer offsets are maintained per consumer group and hence if you have a different consumer group on each restart you can make use of the auto.offset.reset property
The auto.offset.reset property specifies
What to do when there is no initial offset in Kafka or if the current offset does not exist any more on the server (e.g. because that data has been deleted): earliest: automatically reset the offset to the earliest offsetlatest: automatically reset the offset to the latest offsetnone: throw exception to the consumer if no previous offset is found for the consumer's groupanything else: throw exception to the consumer.
Having informed about the current approach - I believe you should relook at the design and it would be better to have a different consumer group per node but ensure to keep the same consumer group name per node even after a restart. This is a suggestion based on the info provided but there could be better solutions as well after going into the detail of the design/implementation.

Kafka 2.1 behaviour change for retentions and Kafka Stream application, what can we so that retention works?

Following is from the Kafka Documentation for 2.1.
https://kafka.apache.org/documentation/
Offset expiration semantics has slightly changed in this version.
According to the new semantics, offsets of partitions in a group will
not be removed while the group is subscribed to the corresponding
topic and is still active (has active consumers). If group becomes
empty all its offsets will be removed after default offset retention
period (or the one set by broker) has passed (unless the group becomes
active again). Offsets associated with standalone (simple) consumers,
that do not use Kafka group management, will be removed after default
offset retention period (or the one set by broker) has passed since
their last commit.
If I understand this correctly, as long as Stream Thread consumer's are connected, no retention setting will be effective?
I also started to observe following Exception after the restart of stream application
stream thread - Restoring Stream Tasks failed. Deleting StreamTasks stores to recreate from scratch.
org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions:' but stream application uses the property 'StreamsConfig.consumerPrefix(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG), "earliest"'...
I think it has to do something with retention but I can't tell what?
If I understand this correctly, as long as Stream Thread consumer's are connected, no retention setting will be effective?
This applies to __consumer_offset topic only, that is a Kafka internal topic. For all regular/user topics, retention time is applied the same way as in all previous versions. Also note, this only applies if you upgrade your brokers to 2.1.
For the log message of Streams: you don't need to worry about it. It seems that your application was offline for a longer time, and thus, your local store is not in a consistent state any longer. Thus, it's deleted and recreated from scratch from the changelog topic.

How to rewind and look at previous offset in a partition using Kafka Go client's Consumer

I am new to Kafka. Currently I am experimenting with this Channel Consumer example from Confluent Inc's Github repo
From what I know, consumers are separated into groups. Each group has their own offset in the partition. Let's say I have 40 messages in a particular topic let's call it, owner_commands. A consumer, belongs to the dog group, joins and begins to consume those 40 messages.
When I disconnected and reconnected this consumer, I noticed that messages don't show up anymore. It says that I have reached the end of file. However, if I join the cluster with another consumer, which belongs to a different group (say cat) I get to read those 40 messages again.
Do you know if there is a way for consumers in the dog group to rewind and replay those messages again using Kafka's Go API. I looked at the source code for Kafka Golang API, I couldn't find anything that indicates to me that I can rewind and look at a particular message in the past.
Thank you
You could use CommitOffsets and just commit back to the offset you want to rewind to. The next poll will start from that offset.
CommitOffsets is documented here:
http://docs.confluent.io/current/clients/confluent-kafka-go/index.html#Consumer.CommitOffsets
Outside of the API, there's functionality in the kafka-consumer-groups command to move the position of consumer groups as well. This is released with Apache Kafka 0.11.

Does Kafka have Durable Subscriptions feature?

I'm interested to use Kafka in one of my projects, but there is a requirement that the messaging broker have to keep the the messages when one of the subscriber (consumer) is disconnected.
I see that JMS have this feature.
In the website it said that Kafka had durability features.
Is it the same like JMS or is it have different meaning ?
Consumer pulls the data from kafka (brokers). Consumer specifies the offset from where it wants to gather the data. If Consumer disconnects and comes back, it can continue where it left. It can also start consuming data from earlier point (changing the offset).
Kafka does support a durable consumer style pattern, but there are a few ways to achieve it.
First you need to understand the concept of Offsets and Consumer Position
Kafka maintains a numerical offset for each record in a partition.
This offset acts as a unique identifier of a record within that
partition, and also denotes the position of the consumer in the
partition. For example, a consumer which is at position 5 has consumed
records with offsets 0 through 4 and will next receive the record with
offset 5. There are actually two notions of position relevant to the
user of the consumer: The position of the consumer gives the offset of
the next record that will be given out. It will be one larger than the
highest offset the consumer has seen in that partition. It
automatically advances every time the consumer receives messages in a
call to poll(Duration).
The committed position is the last offset that has been stored
securely. Should the process fail and restart, this is the offset that
the consumer will recover to. The consumer can either automatically
commit offsets periodically; or it can choose to control this
committed position manually by calling one of the commit APIs (e.g.
commitSync and commitAsync).
The offset can be stored/persisted on either the Kafka server or the client side:
Kafka Server persists/holds the consumers position, in this case there are 2 sub options:
Consumer explicitly commits the message consumption
Consumer automatically commits the message consumption
Client application persists/holds
the consumers position
This is all as per https://kafka.apache.org/22/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html.

Resources