reconsume Kafka 0.10.1 topic in Spring - spring

In Spring-Kafka I want to reconsume a Kafka topic from the beginning. Doing this by changing the group.id to something unknown to Kafka of course works:
#KafkaListener(topics = "sensordata.t")
public void receiveMessage(String message) {
...
}
#Bean
public Map consumerConfigs() {
Map props = new HashMap<>();
props.put(ConsumerConfig.GROUP_ID_CONFIG, "NewGroupID");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false"); //it still commits though...
return props;
}
However, starting over by setting the offset to 0 fails.
#KafkaListener(topicPartitions =
{ #TopicPartition(topic = "sensordata.t",
partitionOffsets = #PartitionOffset(partition = "0", initialOffset = "0"))})
public void receiveMessage(String message) {
...
}
#Bean
public Map consumerConfigs() {
Map props = new HashMap<>();
...
props.put(ConsumerConfig.GROUP_ID_CONFIG, "NewGroupID");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "10000"); //making timeout window larger seems to have no influence
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "1"); //setting max records to 1 makes no difference
return props;
}
The error I get:
2016-11-14 14:07:59.018 INFO 8165 --- [ main] c.i.t.s.server.SpringKafkaApplication : Started SpringKafkaApplication in 4.134 seconds (JVM running for 4.745)
2016-11-14 14:07:59.125 INFO 8165 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : Discovered coordinator bto:9092 (id: 2147483647 rack: null) for group spring8.
2016-11-14 14:07:59.125 INFO 8165 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : Discovered coordinator bto:9092 (id: 2147483647 rack: null) for group spring8.
2016-11-14 14:07:59.129 INFO 8165 --- [afka-consumer-1] o.a.k.c.c.internals.ConsumerCoordinator : Revoking previously assigned partitions [] for group spring8
2016-11-14 14:07:59.129 INFO 8165 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[]
2016-11-14 14:07:59.129 INFO 8165 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : (Re-)joining group spring8
2016-11-14 14:07:59.338 ERROR 8165 --- [afka-consumer-1] essageListenerContainer$ListenerConsumer : Container exception
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:600) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:541) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:426) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1059) ~[kafka-clients-0.10.0.1.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitIfNecessary(KafkaMessageListenerContainer.java:939) ~[spring-kafka-1.1.1.RELEASE.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.processCommits(KafkaMessageListenerContainer.java:816) ~[spring-kafka-1.1.1.RELEASE.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:526) ~[spring-kafka-1.1.1.RELEASE.jar:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_92]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_92]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_92]
Anyone familiar with this?
I'm using Kafka 0.10.1.0 and
<properties>
<java.version>1.8</java.version>
<spring-kafka.version>1.1.1.RELEASE</spring-kafka.version>
</properties>

Why have you decided that it is the problem of offset 0?
Your StackTrace says that you have longer pollTimeout is longer than session.timeout.ms:
Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
How about to adjust them properly?

Related

How to implement Retry logic in RabbitMQ?

In my project I'm setting SimpleRetryPolicy to add custom exception and RetryOperationsInterceptor which is consuming this policy.
#Bean
public SimpleRetryPolicy rejectionRetryPolicy() {
Map<Class<? extends Throwable>, Boolean> exceptionsMap = new HashMap<Class<? extends Throwable>, Boolean>();
exceptionsMap.put(DoNotRetryException.class, false);//not retriable
exceptionsMap.put(RetryException.class, true); //retriable
return new SimpleRetryPolicy(3, exceptionsMap, true);
}
#Bean
RetryOperationsInterceptor interceptor() {
return RetryInterceptorBuilder.stateless()
.retryPolicy(rejectionRetryPolicy())
.backOffOptions(2000L, 2, 3000L)
.recoverer(
new RepublishMessageRecoverer(rabbitTemplate(), "dlExchange", "dlRoutingKey"))
.build();
}
But with these configurations retry is not working for both RetryException and DoNotRetryException, where I want RetryException to be retried finite number of time and DoNotRetryException to send to DLQ
Please help with the issue, I'm attaching repo link if in case of need.
https://github.com/aviralnimbekar/RabbitMQ/tree/main/src
Your GlobalErrorHandler does its logic before the retry happens and you override there an exception with the AmqpRejectAndDontRequeueException. And looks like you do there a publish to DLX. Consider to move your GlobalErrorHandler logic to a more general ErrorHandler for the factory.setErrorHandler(); instead.
See more info in docs: https://docs.spring.io/spring-amqp/reference/html/#exception-handling
UPDATE
After removing errorHandler = "globalErrorHandler" from your #RabbitListener, I got this in logs:
2022-08-03 16:02:08.093 INFO 16896 --- [nio-8080-exec-4] c.t.r.producer.RabbitMQProducer : Message sent -> retry
2022-08-03 16:02:08.095 INFO 16896 --- [ntContainer#0-1] c.t.r.consumer.RabbitMQConsumer : Retrying message...
2022-08-03 16:02:10.096 INFO 16896 --- [ntContainer#0-1] c.t.r.consumer.RabbitMQConsumer : Retrying message...
2022-08-03 16:02:13.099 INFO 16896 --- [ntContainer#0-1] c.t.r.consumer.RabbitMQConsumer : Retrying message...
2022-08-03 16:02:13.100 WARN 16896 --- [ntContainer#0-1] o.s.a.r.retry.RepublishMessageRecoverer : Republishing failed message to exchange 'dlExchange' with routing key dlRoutingKey
2022-08-03 16:02:17.736 INFO 16896 --- [nio-8080-exec-5] c.t.r.producer.RabbitMQProducer : Message sent -> 1231231
2022-08-03 16:02:17.738 INFO 16896 --- [ntContainer#0-1] c.t.r.consumer.RabbitMQConsumer : sending into dlq...
2022-08-03 16:02:17.739 WARN 16896 --- [ntContainer#0-1] o.s.a.r.retry.RepublishMessageRecoverer : Republishing failed message to exchange 'dlExchange' with routing key dlRoutingKey
Which definitely reflects your original requirements.

Spring kafka application with multiple consumer groups stops consuming messages

kafka version 2.3.1
Spring boot verison: 2.2.5.RELEASE
I have spring boot Kafka application with 3 consumer groups. It stops consuming messages because of failing heartbeat. I tried updating the consumer configuration suggested by multiple stack overflow threads. Even after that, I am facing the issue. In the configuration,
Consumers are taking less than one second as per log to consume a message till the point they are being consumed and suddenly it stops. Also, some of the processing in the consumer happening in an asynchronous thread
Below is the configuration for one of the consumer factories.
I gave 10 seconds buffered time for each record and based on that configured MAX_POLL_INTERVAL_MS_CONFIG
#Bean
public ConsumerFactory<Object, Object> reqConsumerFactory()
{
Map<String, Object> props = new HashMap<>();
props.put(ConsumerConfig.GROUP_ID_CONFIG, "req-event-group");
props.put(ConsumerConfig.CLIENT_ID_CONFIG, "req-event-group");
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerConfig.getBootstrapAddress());
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, false);
props.put(ConsumerConfig.RETRY_BACKOFF_MS_CONFIG, 1000);
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, 50*15*1000);
props.put(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG, 5000);
props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, 50*10*1000);
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 50);
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
return new DefaultKafkaConsumerFactory<>(props);
}
#Bean
public KafkaListenerContainerFactory<ConcurrentMessageListenerContainer<Object, Object>> reqKafkaListenerContainerFactory()
{
ConcurrentKafkaListenerContainerFactory<Object, Object> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(reqConsumerFactory());
factory.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
factory.setErrorHandler(new SeekToCurrentErrorHandler(2));
factory.setConcurrency(2);
return factory;
}
One of the consumer method
#KafkaListener(topicPattern = "${process.update.requirement.topic.name}", containerFactory = "reqKafkaListenerContainerFactory", groupId = "req-event-group")
public void handleCompleteAndErrorRequirement(ConsumerRecord<String, Object> consumerRecord,Acknowledgment acknowledgment)
{
RequirementEventMsg requirementEventMsg = (RequirementEventMsg)consumerRecord;
acknowledgment.acknowledge();
//asynchronous method call here
}
I don't see any error other than this
2020-06-23 13:28:06.815 [INFO ] AbstractCoordinator:855 - [Consumer clientId=consumer-6, groupId=process-consumer-group] Attempt to heartbeat failed since group is rebalancing
2020-06-23 13:28:06.835 [INFO ] AbstractCoordinator:855 - [Consumer clientId=consumer-4, groupId=process-consumer-group] Attempt to heartbeat failed since group is rebalancing
2020-06-23 13:28:07.175 [INFO ] ConsumerCoordinator:472 - [Consumer clientId=consumer-4, groupId=process-consumer-group] Revoking previously assigned partitions [UPDATE_REQUIREMENT_TOPIC-1, UPDATE_REQUIREMENT_TOPIC-0]
2020-06-23 13:28:07.176 [INFO ] KafkaMessageListenerContainer:394 - partitions revoked: [UPDATE_REQUIREMENT_TOPIC-1, UPDATE_REQUIREMENT_TOPIC-0]
2020-06-23 13:28:07.177 [INFO ] AbstractCoordinator:509 - [Consumer clientId=consumer-4, groupId=process-consumer-group] (Re-)joining group
2020-06-23 13:28:07.233 [INFO ] ConsumerCoordinator:472 - [Consumer clientId=consumer-6, groupId=process-consumer-group] Revoking previously assigned partitions [PROCESS_EVENT_TOPIC-0, PROCESS_EVENT_TOPIC-1]
2020-06-23 13:28:07.233 [INFO ] KafkaMessageListenerContainer:394 - partitions revoked: [PROCESS_EVENT_TOPIC-0, PROCESS_EVENT_TOPIC-1]

KAFKA : splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE

I am sending 10 messages . 2 messagesare "right" and 1 message has size over 1MB which gets rejected by Kafka broker due to RecordTooLargeException.
I have 2 doubts
1) MESSAGE_TOO_LARGE appears only when the Scheduler calls the method second time onwards.When the method is called for the first time by scheduler splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE doesnt appear.
2)Why retries are not getting decreased.I have given retry=1 .
I am calling Sender class using Spring Boot Scheduling mechanism.Something like this
#Scheduled(fixedDelay = 30000)
public void process() {
sender.sendThem();
}
I am using Spring Boot KafkaTemplate.
#Configuration
#EnableKafka
public class KakfaConfiguration {
#Bean
public ProducerFactory<String, String> producerFactory() {
Map<String, Object> config = new HashMap<>();
// props.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SSL");
// props.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG,
// appProps.getJksLocation());
// props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG,
// appProps.getJksPassword());
config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
config.put(ProducerConfig.ACKS_CONFIG, acks);
config.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, retryBackOffMsConfig);
config.put(ProducerConfig.RETRIES_CONFIG, retries);
config.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
config.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "prod-99");
return new DefaultKafkaProducerFactory<>(config);
}
#Bean
public KafkaTemplate<String, String> kafkaTemplate() {
return new KafkaTemplate<>(producerFactory());
}
#Bean(name = "ktm")
public KafkaTransactionManager kafkaTransactionManager() {
KafkaTransactionManager ktm = new KafkaTransactionManager(producerFactory());
ktm.setTransactionSynchronization(AbstractPlatformTransactionManager.SYNCHRONIZATION_ON_ACTUAL_TRANSACTION);
return ktm;
}
}
#Component
#EnableTransactionManagement
class Sender {
#Autowired
private KafkaTemplate<String, String> template;
private static final Logger LOG = LoggerFactory.getLogger(Sender.class);
#Transactional("ktm")
public void sendThem(List<String> toSend) throws InterruptedException {
List<ListenableFuture<SendResult<String, String>>> futures = new ArrayList<>();
CountDownLatch latch = new CountDownLatch(toSend.size());
ListenableFutureCallback<SendResult<String, String>> callback = new ListenableFutureCallback<SendResult<String, String>>() {
#Override
public void onSuccess(SendResult<String, String> result) {
LOG.info(" message sucess : " + result.getProducerRecord().value());
latch.countDown();
}
#Override
public void onFailure(Throwable ex) {
LOG.error("Message Failed ");
latch.countDown();
}
};
toSend.forEach(str -> {
ListenableFuture<SendResult<String, String>> future = template.send("t_101", str);
future.addCallback(callback);
});
if (latch.await(12, TimeUnit.MINUTES)) {
LOG.info("All sent ok");
} else {
for (int i = 0; i < toSend.size(); i++) {
if (!futures.get(i).isDone()) {
LOG.error("No send result for " + toSend.get(i));
}
}
}
I am getting the following logs
2020-05-01 15:55:18.346 INFO 6476 --- [ scheduling-1] o.a.kafka.common.utils.AppInfoParser : Kafka startTimeMs: 1588328718345
2020-05-01 15:55:18.347 INFO 6476 --- [ scheduling-1] o.a.k.c.p.internals.TransactionManager : [Producer clientId=producer-prod-991, transactionalId=prod-991] ProducerId set to -1 with epoch -1
2020-05-01 15:55:18.351 INFO 6476 --- [oducer-prod-991] org.apache.kafka.clients.Metadata : [Producer clientId=producer-prod-991, transactionalId=prod-991] Cluster ID: bL-uhcXlRSWGaOaSeDpIog
2020-05-01 15:55:48.358 INFO 6476 --- [oducer-prod-991] o.a.k.c.p.internals.TransactionManager : [Producer clientId=producer-prod-991, transactionalId=prod-991] ProducerId set to 13000 with epoch 10
Value of kafka template----- 1518752790
2020-05-01 15:55:48.377 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 8 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.379 INFO 6476 --- [oducer-prod-991] com.a.kafkaproducer.producer.Sender : message sucess : TTTT0
2020-05-01 15:55:48.379 INFO 6476 --- [oducer-prod-991] com.a.kafkaproducer.producer.Sender : message sucess : TTTT1
2020-05-01 15:55:48.511 ERROR 6476 --- [oducer-prod-991] com.a.kafkaproducer.producer.Sender : Message Failed
2020-05-01 15:55:48.512 ERROR 6476 --- [oducer-prod-991] o.s.k.support.LoggingProducerListener : Exception thrown when sending a message with key='null' and payload='
2020-05-01 15:55:48.514 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 10 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.518 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 11 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.523 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 12 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.527 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 13 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.531 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 14 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.534 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 15 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.538 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 16 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.542 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 17 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.546 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 18 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
Then after sometime the program completes with following log
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 2 record(s) for t_101-0:120000 ms has passed since batch creation
2020-05-01 16:18:31.322 WARN 17816 --- [ scheduling-1] o.s.k.core.DefaultKafkaProducerFactory : Error during transactional operation; producer removed from cache; possible cause: broker restarted during transaction: CloseSafeProducer [delegate=org.apache.kafka.clients.producer.KafkaProducer#7085a4dd, txId=prod-991]
2020-05-01 16:18:31.322 INFO 17816 --- [ scheduling-1] o.a.k.clients.producer.KafkaProducer : [Producer clientId=producer-prod-991, transactionalId=prod-991] Closing the Kafka producer with timeoutMillis = 5000 ms.
2020-05-01 16:18:31.324 INFO 17816 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Aborting incomplete transaction due to shutdown
error messahe here
------ processing done in parent class------
A broad picture of producer workflow is given below.
By setting RETRIES_CONFIG property, we can guarantee that in case of failure this producer will try sending that message.
If the batch is too large, we split the batch and send the split batches again. We do not decrement the retry attempts in this case.
You can go through the source code given below and find the scenarios in which retry count is decremented.
https://github.com/apache/kafka/blob/68ac551966e2be5b13adb2f703a01211e6f7a34b/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

Enabling exactly once causes streams shutdown due to timeout while initializing transactional state

I've written a simple example to test the join functionality. As I sometimes get messages duplicated in the resulting topic and sometimes missing messages in this topic, I thought while pinpointing the problem to enable exactly once semantics. However while doing this through:
props.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE);
I get a timeout that causes kafka streams to shut down in my app:
2019-05-02 17:02:32.585 INFO 153056 --- [-StreamThread-1] o.a.kafka.common.utils.AppInfoParser : Kafka version : 2.0.1
2019-05-02 17:02:32.585 INFO 153056 --- [-StreamThread-1] o.a.kafka.common.utils.AppInfoParser : Kafka commitId : fa14705e51bd2ce5
2019-05-02 17:02:32.593 INFO 153056 --- [-StreamThread-1] o.a.k.c.p.internals.TransactionManager : [Producer clientId=join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1-0_0-producer, transactionalId=join-test-0_0] ProducerId set to -1 with epoch -1
2019-05-02 17:03:32.599 ERROR 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] Error caught during partition assignment, will abort the current process and re-throw at the end of rebalance: {}
org.apache.kafka.common.errors.TimeoutException: Timeout expired while initializing transactional state in 60000ms.
2019-05-02 17:03:32.599 INFO 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] partition assignment took 60044 ms.
current active tasks: []
current standby tasks: []
previous active tasks: []
2019-05-02 17:03:32.601 INFO 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] State transition from PARTITIONS_ASSIGNED to PENDING_SHUTDOWN
2019-05-02 17:03:32.601 INFO 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] Shutting down
2019-05-02 17:03:32.615 INFO 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] State transition from PENDING_SHUTDOWN to DEAD
2019-05-02 17:03:32.615 INFO 153056 --- [-StreamThread-1] org.apache.kafka.streams.KafkaStreams : stream-client [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72] State transition from REBALANCING to ERROR
2019-05-02 17:03:32.615 WARN 153056 --- [-StreamThread-1] org.apache.kafka.streams.KafkaStreams : stream-client [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72] All stream threads have died. The instance will be in error state and should be closed.
2019-05-02 17:03:32.615 INFO 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] Shutdown complete
Exception in thread "join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] Failed to rebalance.
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:870)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:810)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired while initializing transactional state in 60000ms.
static String ORIGINAL = "original-sensor-data";
static String ERROR = "error-score";
public static void main(String[] args) throws IOException {
SpringApplication.run(JoinTest.class, args);
Properties props = getProperties();
final StreamsBuilder builder = new StreamsBuilder();
final KStream<String, OriginalSensorData> original = builder.stream(ORIGINAL, Consumed.with(Serdes.String(), new OriginalSensorDataSerde()));
final KStream<String, ErrorScore> error = builder.stream(ERROR, Consumed.with(Serdes.String(), new ErrorScoreSerde()));
KStream<String, ErrorScore> result = original.join(
error,
(originalValue, errorValue) -> new ErrorScore(new Date(originalValue.getTimestamp()), errorValue.getE(),
originalValue.getData().get("TE700PV").doubleValue(), errorValue.getT(), errorValue.getR()),
// KStream-KStream joins are always windowed joins, hence we must provide a join window.
JoinWindows.of(Duration.ofMillis(3000).toMillis()),
Joined.with(
Serdes.String(), /* key */
new OriginalSensorDataSerde(), /* left value */
new ErrorScoreSerde() /* right value */
)
).through("atl-joined-data-repartition", Produced.with(Serdes.String(), new ErrorScoreSerde()));
result.foreach((key, value) -> System.out.println("Join Stream: " + key + " " + value));
KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();
}
private static Properties getProperties() {
Properties props = new Properties();
//Url of the kafka broker, this can also be found in the Aiven console
props.put("bootstrap.servers", "localhost:9095");
props.put("group.id", "join-test");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put("application.id", "join-test");
props.put("default.timestamp.extractor", "com.my.SensorDataTimestampExtractor");
//The key of a message is a string
props.put("key.deserializer",
StringDeserializer.class.getName());
props.put("value.deserializer",
StringDeserializer.class.getName());
props.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE);
props.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 1000);
return props;
}
I'm expecting that the app starts without the timeout and continues working

How to stop micro service with Spring Kafka Listener, when connection to Apache Kafka Server is lost?

I am currently implementing a micro service, which reads data from Apache Kafka topic. I am using "spring-boot, version: 1.5.6.RELEASE" for the micro service and "spring-kafka, version: 1.2.2.RELEASE" for the listener in the same micro service. This is my kafka configuration:
#Bean
public Map<String, Object> consumerConfigs() {
return new HashMap<String, Object>() {{
put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, servers);
put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
put(ConsumerConfig.GROUP_ID_CONFIG, groupIdConfig);
put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, autoOffsetResetConfig);
}};
}
#Bean
public ConsumerFactory<String, String> consumerFactory() {
return new DefaultKafkaConsumerFactory<>(consumerConfigs());
}
#Bean
public ConcurrentKafkaListenerContainerFactory<String, String> kafkaListenerContainerFactory() {
ConcurrentKafkaListenerContainerFactory<String, String> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
return factory;
}
I have implemented the listener via the #KafkaListener annotation:
#KafkaListener(topics = "${kafka.dataSampleTopic}")
public void receive(ConsumerRecord<String, String> payload) {
//business logic
latch.countDown();
}
I need to be able to shutdown the micro service, when the listener looses connection to the Apache Kafka server.
When I kill the kafka server I get the following message in the spring boot log:
2017-11-01 19:58:15.721 INFO 16800 --- [ 0-C-1] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator 192.168.0.4:9092 (id: 2145482646 rack: null) dead for group TestGroup
When I start the kafka sarver, I get:
2017-11-01 20:01:37.748 INFO 16800 --- [ 0-C-1] o.a.k.c.c.internals.AbstractCoordinator : Discovered coordinator 192.168.0.4:9092 (id: 2145482646 rack: null) for group TestGroup.
So clearly the Spring Kafka Listener in my micro service is able to detect when the Kafka Server is up and running and when it's not. In the book by confluent Kafka The Definitive Guide in chapter But How Do We Exit? it is said that the wakeup() method needs to be called on the Consumer, so that a WakeupException would be thrown. So I tried to capture the two events (Kafka server down and Kafka server up) with the #EventListener tag, as described in the Spring for Apache Kafka documentation, and then call wakeup(). But the example in the documentation is on how to detect idle consumer, which is not my case. Could someone please help me with this. Thanks in advance.
I don't know how to get a notification of the server down condition (in my experience, the consumer goes into a tight loop within the poll()).
However, if you figure that out, you can stop the listener container(s) which will wake up the consumer and exit the tight loop...
#Autowired
private KafkaListenerEndpointRegistry registry;
...
this.registry.stop();
2017-11-01 16:29:54.290 INFO 21217 --- [ad | so47062346] o.a.k.c.c.internals.AbstractCoordinator : Marking the coordinator localhost:9092 (id: 2147483647 rack: null) dead for group so47062346
2017-11-01 16:29:54.346 WARN 21217 --- [ntainer#0-0-C-1] org.apache.kafka.clients.NetworkClient : Connection to node 0 could not be established. Broker may not be available.
...
2017-11-01 16:30:00.643 WARN 21217 --- [ntainer#0-0-C-1] org.apache.kafka.clients.NetworkClient : Connection to node 0 could not be established. Broker may not be available.
2017-11-01 16:30:00.680 INFO 21217 --- [ntainer#0-0-C-1] essageListenerContainer$ListenerConsumer : Consumer stopped
You can improve the tight loop by adding reconnect.backoff.ms, but the poll() never exits so we can't emit an idle event.
spring:
kafka:
consumer:
enable-auto-commit: false
group-id: so47062346
properties:
reconnect.backoff.ms: 1000
I suppose you could enable idle events and use a timer to detect if you've received no data (or idle events) for some period of time, and then stop the container(s).

Resources