Possible Reasons of Reconsuming Kafka Messages

Possible Reasons of Reconsuming Kafka Messages - go

Yesterday I found from log the kafka was reconsuming some messages after the Kafka group coordinator initiated a group rebalance. These messages had been consumed two days ago (confirmed from log).
There were two other rebalancing reported in the log, but they didn't reconsume messages anymore. So why the first time reblancing would cause reconsuming messages? What were the problems?
I am using the golang kafka client. here are the code
config := sarama.NewConfig()
config.Version = version
config.Consumer.Offsets.Initial = sarama.OffsetOldest
and we are handling messges before claiming messages, so seems we are using the Send At Least Once strategy for kafka. We have three brokers in one machine, and only one consumer thread (go routine) in the other machine.
Any explanations for this phoenomenon?
I think the messages must have been committed, coz they were consumed two days ago, or why would kafka keep offsets for more than two days without committing?
Consuming Code sample:
func (consumer *Consumer) ConsumeClaim(session
sarama.ConsumerGroupSession, claim sarama.ConsumerGroupClaim) error {
for message := range claim.Messages() {
realHanlder(message) // consumed data here
session.MarkMessage(message, "") // mark offset
}
return nil
}
Added:
Rebalancing happened after app restarted. There were two other restarts which didn't cuase reconnsume
configs of kafka
log.retention.check.interval.ms=300000
log.retention.hours=168
zookeeper.connection.timeout.ms=6000
group.initial.rebalance.delay.ms=0
delete.topic.enable = true
auto.create.topics.enable=false

By reading the source code of both golang saram client and kafka server, finally I found the reason as below
Consumer group offset retention time is 24hours, which is a default setting by kafka, while log retention is 7days explicitly set by us.
My server app is running in test environment where few people can visit, which means there may be few messages produced by kafka producer, and then the consumer group has few messages to consumes, thus the consumer may not commit any offset for long time.
When the consume offset is not updated for more than 24hours, due to the offset config, the kafka broker/coordinator will remove the consume offset from partitions. Next time the saram queries from kafka broker where the offset is, of course client gets nothing. Notice we are using sarama.OffsetOldest as initial value, then sarama client will consume messages from the start of messages kept by kafka broker, which results in messages reconsuming, and this is likely to happen because log retention is 7days

Related

Kafka: Sarama, idempotence and transactional.id

Does Shopify/sarama provide an option similar to transactional.id in JVM API?
The library supports idempotence (Config.Producer.Idemponent, similar to enable.idempotence), but I don't understand how to use it without transactional.id.
Please, correct me if I'm wrong, there is a bit lack of documentation about these options in Sarama. But according to JVM docs, idempotence without the identifier will be limited by a single producer session. In other words, we will loss the guarantee when producer fails and restart.
I found relevant properties in the source code and some tests (for example), but don't understand how to use them externally.

Shopify/sarama Provides Kafka Exactly Once (Idempotency) with idempotent enabled producer. But For that below configuration setup need to be there.
From Shopify/sarama/config.go
if c.Producer.Idempotent {
if !c.Version.IsAtLeast(V0_11_0_0) {
return ConfigurationError("Idempotent producer requires Version >= V0_11_0_0")
}
if c.Producer.Retry.Max == 0 {
return ConfigurationError("Idempotent producer requires Producer.Retry.Max >= 1")
}
if c.Producer.RequiredAcks != WaitForAll {
return ConfigurationError("Idempotent producer requires Producer.RequiredAcks to be WaitForAll")
}
if c.Net.MaxOpenRequests > 1 {
return ConfigurationError("Idempotent producer requires Net.MaxOpenRequests to be 1")
}
}
In Shopify/sarama How they do this is, There is a producerEpoch ID in AsyncProducer's transactionManager. You can refer the file in Shopify/sarama/async_producer.go. This Id initialise with the producer initialisation and increment when successfully producing each message. read bumpEpoch() function to see that in async_producer.go file.
This is the sequence id for that producer session with the broker and it is sending with each message. Increment when message published successfully.
Read this example. It describes how idempotence works.
You are correct on producer session fact. That exactly once promised for single producer session. When restating producer just after the sequence failure, there can be a duplicate.
When producer restarts, new PID gets assigned. So the idempotency is promised only for a single producer session. Even though producer retries requests on failures, each message is persisted in the log exactly once. There can still be duplicates depending on the source where the producer is getting data. Kafka won’t take care of the duplicate data received by the producer. So, in some cases, you may require an additional de-duplication system.

How to retry a kafka message when there is an error - spring cloud stream

I'm pretty new to Kafka. I'm using spring cloud stream Kafka to produce and consume
#StreamListener(Sink.INPUT)
public void process(Order order) {
try {
// have my message processing
}
catch( exception e ) {
//retry here that record..
}
}
}
Just want to know how can I implement a retry ? Any help on this is highly appreciated

Hy
There are multiple ways to handle "retries" and it depends on the kind of events you encounter.
For basic issues kafka framework will retry for you to recover from an error condition, for example in case of a short network downtime the consumer and producer api implement auto retry.
In particular kafka support "built-in producer/consumer retries" to correctly handle a large variety of errors without loss of messages, but as a developer, you must still be able to handle other types of errors with the try-catch block you mention.
Error in kafka can be divided in the following categories:
(producer & consumer side) Nonretriable broker errors such as errors regarding message size, authorization errors, etc -> you must handle them in "design phase" of your app.
(producer side) Errors that occur before the message was sent to the broker—for example, serialization errors --> you must handle them in the runtime app execution
(producer & consumer sideErrors that occur when the producer exhausted all retry attempts or when the
available memory used by the producer is filled to the limit due to using all of it to store messages while retrying -> you should handle these errors.
Another point of attention regarding "how to retry" is how to handle correctly the order of commits in case of auto-commit option is set to false.
A common and simple pattern to get commit order right is to use a monotonically increasing sequence number. Increase the sequence number every time you commit and add the sequence number at the time of the commit to the commit function.
When you’re getting ready to send a retry, check if the
commit sequence number the callback got is equal to the instance
variable; if it is, there was no newer commit and it is safe to retry. If
the instance sequence number is higher, don’t retry because a
newer commit was already sent.

Kafka Listener rollback transaction on session timeout

I'm using spring-kafka version 1.1.3 to consume messages from a topic. Auto commit is set to true and max.poll.records to 10 in the consumer config. session.timeout.ms is negotiated to 10 seconds with the server.
Upon receiving a message I persist part of it to the database. My database tends to be quite slow sometimes, which results in a session timeout by the kafka listener:
Auto offset commit failed for group mygroup: Commit cannot be completed
since the group has already rebalanced and assigned the partitions to
another member. This means that the time between subsequent calls to
poll() was longer than the configured session.timeout.ms, which
typically implies that the poll loop is spending too much time message
processing. You can address this either by increasing the session
timeout or by reducing the maximum size of batches returned in poll()
with max.poll.records.
Since I can't increase the session timeout on the server and the max.poll.records is already down at 10, I'd like to be able to wrap my database call in a transaction, which would rollback in case of a kafka session timeout.
Is this possible and how can I accomplish this?
Unfortunately I wasn't able to find a solution in the docs.

You have to consider to upgrade to Spring Kafka 1.2 and Kafka 0.10.x. The old Apache Kafka has a flaw with the heart-beat. So, using autoCommit and slow listener you end up with unexpected rebalancing and you are on your own with such a problem. The version of Spring Kafka you use has a logic like:
// if the container is set to auto-commit, then execute in the
// same thread
// otherwise send to the buffering queue
if (this.autoCommit) {
invokeListener(records);
}
else {
if (sendToListener(records)) {
if (this.assignedPartitions != null) {
// avoid group management rebalance due to a slow
// consumer
this.consumer.pause(this.assignedPartitions);
this.paused = true;
this.unsent = records;
}
}
}
So, you may consider to switch off the autoCommit and rely on the built-in pause feature which is turned on by default.

Decided to upgrade to Kafka 0.11 since it adds transaction support (see Release Notes).

Got error "CODE: 22 Not found, V3_0_6_SNAPSHOT maybe this group consumer boot first" when a consumer group boot,RocketMQ

version: 3.2.6
consumer type: PullConsumer
When a new consumer boots, I will try to fetch the consumer offset from mq:
long offset = pullConsumer.fetchConsumeOffset(mq, true) ;
But I happen to meet that this returns -1, and I saw error:
CODE: 22 Not found, V3_0_6_SNAPSHOT maybe this group consumer boot first
from error log.

This only happens when a totally new consumer group boots with EITHER one of below condition happens:
min offset >0 , indicating that the topic is an old topic/queue, which messages has been deleted before from this queue.
Message which consume offset is 0 is considered that it will be consumed from disk by checkInDiskByCommitOffset, which rocketmq think if you consume from 0, a lot of messages will be consume from disk rather than page cache.
When this happens, client should be responsible to determine where to consume. Propably from 0, but you may suffer from consuming a lot of messages from disk.

Changing state of messages which are "in delivery"

In my application, I have a queue (HornetQ) set up on JBoss 7 AS.
I have used Spring batch to do some work once the messages is received (save values in database etc.) and then the consumer commits the JMS session.
Sometimes when there is an exception while processing the message, the excecution of consumer is aborted abruptly.
And the message remains in "in delivery" state. There are about 30 messages in this state on my production queue.
I have tried restarting the consumer but the state of these messages is not changed. The only way to remove these
messages from the queue is to restart the queue. But before doing that I want a way to read these messages so
that they can be corrected and sent to the queue again to be processed.
I have tried using QueueBrowser to read them but it does not work. I have searched a lot on Google but could not
find any way to read these messages.
I am using a Transacted session, where once the message is processed, I am calling:
session.commit();
This sends the acknowledgement.
I am implementing spring's
org.springframework.jms.listener.SessionAwareMessageListener
to recieve messages and then to process them.
While processing the messages, I am using spring batch to insert some data in database.
For a perticular case, it tries to insert data too big to be inserted in a column.
It throws an exception and transaction is aborted.
Now, I have fixed my producer and consumer not to have such data, so that this case should not happen again.
But my question is what about the 30 "in delivery" state messages that are in my production queue? I want to read them so that they can be corrected and sent to the queue again to be processed. Is there any way to read these messages? Once I know their content, I can restart the queue and submit them again (after correcting them).
Thanking you in anticipation,
Suvarna

It all depends on the Transaction mode you are using.
for instance if you use transactions:
// session here is a TX Session
MessageConsumer cons = session.createConsumer(someQueue);
session.start();
Message msg = consumer.receive...
session.rollback(); // this will make the messages to be redelivered
if you are using non TX:
// session here is auto-ack
MessageConsumer cons = session.createConsumer(someQueue);
session.start();
// this means the message is ACKed as we receive, doing autoACK
Message msg = consumer.receive...
//however the consumer here could have a buffer from the server...
// if you are not using the consumer any longer.. close it
consumer.close(); // this will release messages on the client buffer
Alternatively you could also set consumerWindowSize=0 on the connectionFactory.
This is on 2.2.5 but it never changed on following releases:
http://docs.jboss.org/hornetq/2.2.5.Final/user-manual/en/html/flow-control.html
I"m covering all the possibilities I could think of since you're not being specific on how you are consuming. If you provide me more detail then I will be able to tell you more:

You can indeed read your messages in the queue using jmx (with for example jconsole)
In Jboss As7 you can do it the following way :
MBeans>jboss.as>messaging>default>myJmsQueue>Operations
listMessagesAsJson
[edit]
Since 2.3.0 You have a dedicated method for this specific case :
listDeliveringMessages
See https://issues.jboss.org/browse/HORNETQ-763

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio