Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member - spring-boot

I am using spring-kafka with a recordFilterStrategy.
#Bean("manualImmediateListenerContainerFactory")
public KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<Object, Object>> manualImmediateListenerContainerFactory(
ConsumerFactory<Object, Object> consumerFactory) {
ConcurrentKafkaListenerContainerFactory<Object, Object> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory);
factory.getContainerProperties().setPollTimeout(9999999);
factory.setBatchListener(false);
//配置手动提交offset
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL_IMMEDIATE);
factory.setAckDiscarded(true);
factory.setRecordFilterStrategy(new RecordFilterStrategy<Object, Object>() {
#Override
public boolean filter(ConsumerRecord<Object, Object> consumerRecord) {
Shipment shipment = (Shipment) consumerRecord.value();
return shipment.getType().contains("YAW");
}
});
return factory;
}
Here I have did factory.setAckDiscarded(true). When it received a message which should be discarded. It will try to ack discarded message. Then it will get an exception like below.
I already increased max.poll.interval.ms and decreased maximum size of batches.
Any hints will be highly appreciated!
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.

I noticed in kafka console. It was continuesly prepare for rebalanceing. Basic i think the issue is caused by kafka broker is not stable except the spring application code has the issue.

Related

When will kafka retry to process the messages that have not been acknowledged?

I have a consumer which is configured with the manual ACK property :
#Bean
public ConcurrentKafkaListenerContainerFactory<String, MessageAvro> kafkaListenerContainerFactory() {
final ConcurrentKafkaListenerContainerFactory<String, MessageAvro> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL);
factory.setConsumerFactory(consumerFactory());
return factory;
}
And a consumer with a #KafkaListener method which did some job like :
#KafkaListener(
topics = "${tpd.topic-name}",
containerFactory = "kafkaListenerContainerFactory",
groupId = "${tpd.group-id}")
public void messageListener(final ConsumerRecord<String, MessageAvro> msg, #Payload final MessageAvro message, final Acknowledgment ack) {
if (someCondition) {
// do something
ack.acknowledge();
} else {
// do not acknoledge the message here in order to retry it later.
}
}
In case where the condition is "false" and we move on to the "else" part, when will my consumer try to read the unacknowledged message again?
And in case it doesn't do it again, how do I tell my #KafkaListener to take into account the unacknowledged messages?
As soon as you commit (or "acknowledge") an offset, all previous offsets are also committed in the sense, that the ConsumerGroup will not try to read it again.
That means: If you hit the "else" condition and your job keeps running in a way that it will hit the "if" condition with the acknowledgment all offsets are committed.
The reason behind this is that a Kafkaconsumer will report back to the brokers which offset to read next. For this to achieve Kafka stores that information within an internal Kafka topic called __consumer_offsets as a key/value pair, where
key: ConsumerGroup, Topic name, Partition
value: next offset to read
That internal topic is a compacted topic which means it will eventually only store the latest value for the mentioned key. As a consequence Kafka will not track the "un-acknowledged" messages in between.
Workaround
What people usually do is to fork those "un-acknowledged" messages into another topic so they can be inspected and consumed together at a later point in time. That way, you will not block your actual application from consuming further messages and you can deal with the un-acknowledged messages seperately.

Spring kafka consume records with some delay

I'm using spring kafka in my application. I want to add some delay of 15 mins for consuming the records for one of the listener - kafkaRetryListenerContainerFactory. I have two listeners. Below is my configuration:
#Bean
public ConcurrentKafkaListenerContainerFactory<String, Object> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, Object> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(primaryConsumerFactory());
return factory;
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, Object> kafkaRetryListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, Object> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(retryConsumerFactory());
factory.setConcurrency(this.kafkaConfigProperties.getConsumerConcurrency());
factory.setAutoStartup(true);
factory.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
return factory;
}
Kafka retry listener:
#KafkaListener(topics = "${spring.kafka.retry.topic}", groupId = "${spring.kafka.consumer-group-id}",
containerFactory = "kafkaRetryListenerContainerFactory", id = "retry.id")
public void retryMessage(ConsumerRecord<String, String> record, Acknowledgment acknowledgment) {
Thread.sleep(900000);
LOG.info(String.format("Consumed retry message -> %s", record.toString()));
acknowledgment.acknowledge();
}
When I added Thread.sleep(), I'm getting continuous rebalancing error in the logs
Attempt to heartbeat failed since group is rebalancing
My spring kafka version is 2.3.4
Below are the config values:
max.poll.interval.ms = 1200000 (this is higher than thread.sleep)
heartbeat.interval.ms = 3000
session.timeout.ms = 10000
I have tried ack.nack(900000); Still getting the rebalancing error
Any help will be appreciated
A filter is not the right approach; you need to Thread.sleep() the thread and make sure that max.poll.interval.ms is larger than the total sleep and processing time for the records received by the poll.
In 2.3, the container has the option to sleep between polls; with earlier versions, you have to do the sleep yourself.
EDIT
I just found this in my server.properties (homebrew on Mac OS):
############################# Group Coordinator Settings #############################
# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0
That explains why we see the partitions initially assigned to the first consumer (see comment below).
Setting it back to the default 3000 works for me.

Error handling - Consumer - apache kafka and spring

I am learning to use kafka, I have two services a producer and a consumer.
The producer produces messages that require processing (queries to services and database). These messages are received by the consumer, it is responsible for processing them and saves the result in a database
Producer
#Autowired
private KafkaTemplate<String, String> kafkaTemplate;
...
kafkaTemplate.send(topic, message);
Consumer
#KafkaListener(topics = "....")
public void listen(#Payload String message) {
....
}
I would like all messages to be processed correctly by the consumer.
I do not know how to handle errors on the consumer side in this context. For example, a database might be temporarily disabled and could not handle certain messages.
What to do in these cases?
I know that the responsibility belongs to the consumer.
I could do retries, but retry several times in a row if a database is down does not seem like a good idea. And if I continue to consume messages, the index advances and I lose the events that I could not process by mistake.
You have control over kafka consumer in form of committing the offset of records read. Kafka will continue to return the same records unless the offset is committed. You can set offset commit to manual and based on the success of your business logic decide whether to commit or not. See a sample below
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("group.id", "test");
props.put("enable.auto.commit", "false");
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
final int minBatchSize = 200;
List<ConsumerRecord<String, String>> buffer = new ArrayList<>();
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
buffer.add(record);
}
if (buffer.size() >= minBatchSize) {
insertIntoDb(buffer);
consumer.commitSync();
buffer.clear();
}
}
Consumer.commitsync() commits the offset.
Also see the kakfa consumer documentation to understand the consumer offsets here .
This link was very helpful https://dzone.com/articles/spring-for-apache-kafka-deep-dive-part-1-error-han
Spring provides the DeadLetterPublishingRecoverer class that performs a correct handling of errors.

Kafka listener receiving List<ConsumerRecord<String, String>>, is it possible to consume?

I am super new in Kafka and I frankly have no idea about this type of consumer (as far as I understood is like that due is batch ready), so I am struggling to figure out how to basically consume the list of these events.
I have something like this:
#KafkaListener(topics = "#{'${kafka.listener.list-of-topics}'.split(',')}")
public void readMessage(List<ConsumerRecord<String, String>> records,
final Acknowledgment acknowledgment) {
try {
....
I know when I receive an event (at least a single one) is of type "MyObject" so I can do it fine when I get a single message.
I believe there must be a way to read/cast this List<ConsumerRecords<String,String> but I cannot figure out how..
any ideas?
See the reference manual: Batch Listeners.
Starting with version 1.1, #KafkaListener methods can be configured to receive the entire batch of consumer records received from the consumer poll. To configure the listener container factory to create batch listeners, set the batchListener property:
#Bean
public KafkaListenerContainerFactory<?> batchFactory() {
ConcurrentKafkaListenerContainerFactory<Integer, String> factory =
new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setBatchListener(true); // <<<<<<<<<<<<<<<<<<<<<<<<<
return factory;
}
...
You can also receive a list of ConsumerRecord<?, ?> objects but it must be the only parameter (aside from optional Acknowledgment, when using manual commits, and/or Consumer<?, ?> parameters) defined on the method:
...
When using Spring Boot, set the property spring.kafka.listener.type=batch.

Spring kafka Batch Listener- commit offsets manually in Batch

I am implementing spring kafka batch listener, which reads list of messages from Kafka topic and posts the data to a REST service.
I would like to understand the offset management in case of the REST service goes down, the offsets for the batch should not be committed and the messages should be processed for the next poll. I have read spring kafka documentation but there is confusion in understanding the difference between Listener Error Handler and Seek to current container error handlers in batch. I am using spring-boot-2.0.0.M7 version and below is my code.
Listener Config:
#Bean
KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<String, String>> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.setConcurrency(Integer.parseInt(env.getProperty("spring.kafka.listener.concurrency")));
// factory.getContainerProperties().setPollTimeout(3000);
factory.getContainerProperties().setBatchErrorHandler(kafkaErrorHandler());
factory.getContainerProperties().setAckMode(AckMode.BATCH);
factory.setBatchListener(true);
return factory;
}
#Bean
public Map<String, Object> consumerConfigs() {
Map<String, Object> propsMap = new HashMap<>();
propsMap.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, env.getProperty("spring.kafka.bootstrap-servers"));
propsMap.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG,
env.getProperty("spring.kafka.consumer.enable-auto-commit"));
propsMap.put(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG,
env.getProperty("spring.kafka.consumer.auto-commit-interval"));
propsMap.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, env.getProperty("spring.kafka.session.timeout"));
propsMap.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
propsMap.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
propsMap.put(ConsumerConfig.GROUP_ID_CONFIG, env.getProperty("spring.kafka.consumer.group-id"));
return propsMap;
}
Listener Class:
#KafkaListener(topics = "${spring.kafka.consumer.topic}", containerFactory = "kafkaListenerContainerFactory")
public void listen(List<String> payloadList) throws Exception {
if (payloadList.size() > 0)
//Post to the service
}
Kafka Error Handler:
public class KafkaErrorHandler implements BatchErrorHandler {
private static Logger LOGGER = LoggerFactory.getLogger(KafkaErrorHandler.class);
#Override
public void handle(Exception thrownException, ConsumerRecords<?, ?> data) {
LOGGER.info("Exception occured while processing::" + thrownException.getMessage());
}
}
How to handle Kafka listener so that if something happens during processing batch of records, I wouldn't loose data.
With Apache Kafka we never lose the data. There is indeed an offset in partition logs to seek to any arbitrary position.
On the other hand, when we consume records from a partition there is no requirement to commit their offsets - the current consumer holds the state in the memory. We need to commit only for other, new consumers in the same group when the current one is dead. Independently of the error, the current consumer always moves on to poll new data behind its current in-memory offset.
So, to reprocess the same data in the same consumer we definitely have to use seek operation to move the consumer back to the desired position. That's why Spring Kafka introduces SeekToCurrentErrorHandler:
This allows implementations to seek all unprocessed topic/partitions so the current record (and the others remaining) will be retrieved by the next poll. The SeekToCurrentErrorHandler does exactly this.
https://docs.spring.io/spring-kafka/reference/htmlsingle/#_seek_to_current_container_error_handlers

Resources