KAFKA : splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE - spring-boot

I am sending 10 messages . 2 messagesare "right" and 1 message has size over 1MB which gets rejected by Kafka broker due to RecordTooLargeException.
I have 2 doubts
1) MESSAGE_TOO_LARGE appears only when the Scheduler calls the method second time onwards.When the method is called for the first time by scheduler splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE doesnt appear.
2)Why retries are not getting decreased.I have given retry=1 .
I am calling Sender class using Spring Boot Scheduling mechanism.Something like this
#Scheduled(fixedDelay = 30000)
public void process() {
sender.sendThem();
}
I am using Spring Boot KafkaTemplate.
#Configuration
#EnableKafka
public class KakfaConfiguration {
#Bean
public ProducerFactory<String, String> producerFactory() {
Map<String, Object> config = new HashMap<>();
// props.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SSL");
// props.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG,
// appProps.getJksLocation());
// props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG,
// appProps.getJksPassword());
config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
config.put(ProducerConfig.ACKS_CONFIG, acks);
config.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, retryBackOffMsConfig);
config.put(ProducerConfig.RETRIES_CONFIG, retries);
config.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
config.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "prod-99");
return new DefaultKafkaProducerFactory<>(config);
}
#Bean
public KafkaTemplate<String, String> kafkaTemplate() {
return new KafkaTemplate<>(producerFactory());
}
#Bean(name = "ktm")
public KafkaTransactionManager kafkaTransactionManager() {
KafkaTransactionManager ktm = new KafkaTransactionManager(producerFactory());
ktm.setTransactionSynchronization(AbstractPlatformTransactionManager.SYNCHRONIZATION_ON_ACTUAL_TRANSACTION);
return ktm;
}
}
#Component
#EnableTransactionManagement
class Sender {
#Autowired
private KafkaTemplate<String, String> template;
private static final Logger LOG = LoggerFactory.getLogger(Sender.class);
#Transactional("ktm")
public void sendThem(List<String> toSend) throws InterruptedException {
List<ListenableFuture<SendResult<String, String>>> futures = new ArrayList<>();
CountDownLatch latch = new CountDownLatch(toSend.size());
ListenableFutureCallback<SendResult<String, String>> callback = new ListenableFutureCallback<SendResult<String, String>>() {
#Override
public void onSuccess(SendResult<String, String> result) {
LOG.info(" message sucess : " + result.getProducerRecord().value());
latch.countDown();
}
#Override
public void onFailure(Throwable ex) {
LOG.error("Message Failed ");
latch.countDown();
}
};
toSend.forEach(str -> {
ListenableFuture<SendResult<String, String>> future = template.send("t_101", str);
future.addCallback(callback);
});
if (latch.await(12, TimeUnit.MINUTES)) {
LOG.info("All sent ok");
} else {
for (int i = 0; i < toSend.size(); i++) {
if (!futures.get(i).isDone()) {
LOG.error("No send result for " + toSend.get(i));
}
}
}
I am getting the following logs
2020-05-01 15:55:18.346 INFO 6476 --- [ scheduling-1] o.a.kafka.common.utils.AppInfoParser : Kafka startTimeMs: 1588328718345
2020-05-01 15:55:18.347 INFO 6476 --- [ scheduling-1] o.a.k.c.p.internals.TransactionManager : [Producer clientId=producer-prod-991, transactionalId=prod-991] ProducerId set to -1 with epoch -1
2020-05-01 15:55:18.351 INFO 6476 --- [oducer-prod-991] org.apache.kafka.clients.Metadata : [Producer clientId=producer-prod-991, transactionalId=prod-991] Cluster ID: bL-uhcXlRSWGaOaSeDpIog
2020-05-01 15:55:48.358 INFO 6476 --- [oducer-prod-991] o.a.k.c.p.internals.TransactionManager : [Producer clientId=producer-prod-991, transactionalId=prod-991] ProducerId set to 13000 with epoch 10
Value of kafka template----- 1518752790
2020-05-01 15:55:48.377 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 8 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.379 INFO 6476 --- [oducer-prod-991] com.a.kafkaproducer.producer.Sender : message sucess : TTTT0
2020-05-01 15:55:48.379 INFO 6476 --- [oducer-prod-991] com.a.kafkaproducer.producer.Sender : message sucess : TTTT1
2020-05-01 15:55:48.511 ERROR 6476 --- [oducer-prod-991] com.a.kafkaproducer.producer.Sender : Message Failed
2020-05-01 15:55:48.512 ERROR 6476 --- [oducer-prod-991] o.s.k.support.LoggingProducerListener : Exception thrown when sending a message with key='null' and payload='
2020-05-01 15:55:48.514 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 10 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.518 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 11 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.523 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 12 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.527 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 13 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.531 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 14 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.534 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 15 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.538 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 16 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.542 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 17 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.546 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 18 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
Then after sometime the program completes with following log
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 2 record(s) for t_101-0:120000 ms has passed since batch creation
2020-05-01 16:18:31.322 WARN 17816 --- [ scheduling-1] o.s.k.core.DefaultKafkaProducerFactory : Error during transactional operation; producer removed from cache; possible cause: broker restarted during transaction: CloseSafeProducer [delegate=org.apache.kafka.clients.producer.KafkaProducer#7085a4dd, txId=prod-991]
2020-05-01 16:18:31.322 INFO 17816 --- [ scheduling-1] o.a.k.clients.producer.KafkaProducer : [Producer clientId=producer-prod-991, transactionalId=prod-991] Closing the Kafka producer with timeoutMillis = 5000 ms.
2020-05-01 16:18:31.324 INFO 17816 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Aborting incomplete transaction due to shutdown
error messahe here
------ processing done in parent class------

A broad picture of producer workflow is given below.
By setting RETRIES_CONFIG property, we can guarantee that in case of failure this producer will try sending that message.
If the batch is too large, we split the batch and send the split batches again. We do not decrement the retry attempts in this case.
You can go through the source code given below and find the scenarios in which retry count is decremented.
https://github.com/apache/kafka/blob/68ac551966e2be5b13adb2f703a01211e6f7a34b/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

Related

How to make different instances of consumers in the same consumer group consume different shards of the same kinesis stream?

I'm following the example given in spring-cloud-stream-samples with the following modifications.
application.yml
spring:
cloud:
stream:
instanceCount: 2
bindings:
produceOrder-out-0:
destination: test_stream
content-type: application/json
producer:
partitionCount: 2
partitionSelectorName: eventPartitionSelectorStrategy
partitionKeyExtractorName: eventPartitionKeyExtractorStrategy
processOrder-in-0:
group: eventConsumers
destination: test_stream
content-type: application/json
function:
definition: processOrder;produceOrder
ProducerConfiguration.java
package demo.config;
import demo.stream.Event;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.cloud.stream.binder.PartitionKeyExtractorStrategy;
import org.springframework.cloud.stream.binder.PartitionSelectorStrategy;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.messaging.Message;
#Configuration
public class ProducerConfiguration {
private static Logger logger = LoggerFactory.getLogger(ProducerConfiguration.class);
#Bean
public PartitionSelectorStrategy eventPartitionSelectorStrategy() {
return new PartitionSelectorStrategy() {
#Override
public int selectPartition(Object key, int partitionCount) {
if(key instanceof Integer) {
int partition = (((Integer)key)%partitionCount + partitionCount)%partitionCount;
logger.info("key {} falls into partition {}" , key , partition);
return partition;
}
return 0;
}
};
}
#Bean
public PartitionKeyExtractorStrategy eventPartitionKeyExtractorStrategy() {
return new PartitionKeyExtractorStrategy() {
#Override
public Object extractKey(Message<?> message) {
if(message.getPayload() instanceof Event) {
return ((Event) message.getPayload()).hashCode();
} else {
return 0;
}
}
};
}
}
When I run two instances of this application by setting --spring.cloud.stream.instanceIndex=0 and --spring.cloud.stream.instanceIndex=1 I'm able to see the events getting produced. However, only one of the instance is consuming the records from both the partitions, the other instance is not consuming despite the producer creating partitioned records.
Logs seen in KinesisProducer
2022-09-04 00:17:22.628 INFO 34029 --- [ main] a.i.k.KinesisMessageDrivenChannelAdapter : started KinesisMessageDrivenChannelAdapter{shardOffsets=[KinesisShardOffset{iteratorType=LATEST, sequenceNumber='null', timestamp=null, stream='test_stream', shard='shardId-000000000000', reset=false}], consumerGroup='eventConsumers'}
2022-09-04 00:17:22.658 INFO 34029 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 64398 (http) with context path ''
2022-09-04 00:17:22.723 INFO 34029 --- [ main] demo.KinesisApplication : Started KinesisApplication in 18.487 seconds (JVM running for 19.192)
2022-09-04 00:17:23.938 INFO 34029 --- [esis-consumer-1] a.i.k.KinesisMessageDrivenChannelAdapter : The [ShardConsumer{shardOffset=KinesisShardOffset{iteratorType=LATEST, sequenceNumber='null', timestamp=null, stream='test_stream', shard='shardId-000000000000', reset=false}, state=NEW}] has been started.
2022-09-04 00:17:55.222 INFO 34029 --- [io-64398-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring DispatcherServlet 'dispatcherServlet'
2022-09-04 00:17:55.222 INFO 34029 --- [io-64398-exec-1] o.s.web.servlet.DispatcherServlet : Initializing Servlet 'dispatcherServlet'
2022-09-04 00:17:55.224 INFO 34029 --- [io-64398-exec-1] o.s.web.servlet.DispatcherServlet : Completed initialization in 2 ms
2022-09-04 00:17:55.598 INFO 34029 --- [io-64398-exec-1] demo.stream.OrdersSource : Event sent: Event [id=null, subject=Order [id=5fbaca2f-d947-423d-a1f1-b1c9c268d2d0, name=pen], type=ORDER, originator=KinesisProducer]
2022-09-04 00:17:56.337 INFO 34029 --- [ask-scheduler-3] demo.config.ProducerConfiguration : key 1397835167 falls into partition 1
2022-09-04 00:18:02.047 INFO 34029 --- [io-64398-exec-2] demo.stream.OrdersSource : Event sent: Event [id=null, subject=Order [id=83021259-89b5-4451-a0ec-da3152d37a58, name=pen], type=ORDER, originator=KinesisProducer]
2022-09-04 00:18:02.361 INFO 34029 --- [ask-scheduler-3] demo.config.ProducerConfiguration : key 147530256 falls into partition 0
Logs seen in KinesisConsumer
2022-09-04 00:17:28.050 INFO 34058 --- [ main] a.i.k.KinesisMessageDrivenChannelAdapter : started KinesisMessageDrivenChannelAdapter{shardOffsets=[KinesisShardOffset{iteratorType=LATEST, sequenceNumber='null', timestamp=null, stream='test_stream', shard='shardId-000000000001', reset=false}], consumerGroup='eventConsumers'}
2022-09-04 00:17:28.076 INFO 34058 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 64399 (http) with context path ''
2022-09-04 00:17:28.116 INFO 34058 --- [ main] demo.KinesisApplication : Started KinesisApplication in 18.566 seconds (JVM running for 19.839)
2022-09-04 00:17:29.365 INFO 34058 --- [esis-consumer-1] a.i.k.KinesisMessageDrivenChannelAdapter : The [ShardConsumer{shardOffset=KinesisShardOffset{iteratorType=AFTER_SEQUENCE_NUMBER, sequenceNumber='49632927200161141377996226513172299243826807332967284754', timestamp=null, stream='test_stream', shard='shardId-000000000001', reset=false}, state=NEW}] has been started.
2022-09-04 00:17:57.346 INFO 34058 --- [esis-consumer-1] demo.stream.OrderStreamConfiguration : An order has been placed from this service Event [id=null, subject=Order [id=5fbaca2f-d947-423d-a1f1-b1c9c268d2d0, name=pen], type=ORDER, originator=KinesisProducer]
2022-09-04 00:18:04.384 INFO 34058 --- [esis-consumer-1] demo.stream.OrderStreamConfiguration : An order has been placed from this service Event [id=null, subject=Order [id=83021259-89b5-4451-a0ec-da3152d37a58, name=pen], type=ORDER, originator=KinesisProducer]
spring-cloud-stream-binder-kinesis version : 2.2.0
I have these following questions:
For Static shard distribution within a single consumer group, is there any other parameter that needs to be configured that I have missed?
Do I need to specify the DynamoDB Checkpoint properties only for dynamic shard distribution?
EDIT
I have added the DEBUG logs seen in KinesisProducer below:
2022-09-07 08:30:38.120 INFO 4993 --- [io-64398-exec-1] demo.stream.OrdersSource : Event sent: Event [id=null, subject=Order [id=b3927132-a80d-481e-a219-dbd0c0c7d124, name=pen], type=ORDER, originator=KinesisProducer]
2022-09-07 08:30:38.806 INFO 4993 --- [ask-scheduler-3] demo.config.ProducerConfiguration : key 1842629003 falls into partition 1
2022-09-07 08:30:38.812 DEBUG 4993 --- [ask-scheduler-3] o.s.c.s.m.DirectWithAttributesChannel : preSend on channel 'bean 'produceOrder-out-0'', message: GenericMessage [payload=byte[126], headers={scst_partition=1, id=9cb8ec58-4a9e-7b6f-4263-c9d4d1eec906, contentType=application/json, timestamp=1662519638809}]
2022-09-07 08:30:38.813 DEBUG 4993 --- [ask-scheduler-3] tractMessageChannelBinder$SendingHandler : org.springframework.cloud.stream.binder.AbstractMessageChannelBinder$SendingHandler#63811d15 received message: GenericMessage [payload=byte[126], headers={scst_partition=1, scst_partitionOverride=0, id=731f444b-d3df-a51a-33de-8adf78e1e746, contentType=application/json, timestamp=1662519638813}]
2022-09-07 08:30:38.832 DEBUG 4993 --- [ask-scheduler-3] o.s.c.s.m.DirectWithAttributesChannel : postSend (sent=true) on channel 'bean 'produceOrder-out-0'', message: GenericMessage [payload=byte[126], headers={scst_partition=1, scst_partitionOverride=0, id=731f444b-d3df-a51a-33de-8adf78e1e746, contentType=application/json, timestamp=1662519638813}]
2022-09-07 08:35:51.153 INFO 4993 --- [io-64398-exec-2] demo.stream.OrdersSource : Event sent: Event [id=null, subject=Order [id=6a5b3084-11dc-4080-a80e-61cc73315139, name=pen], type=ORDER, originator=KinesisProducer]
2022-09-07 08:35:51.915 INFO 4993 --- [ask-scheduler-5] demo.config.ProducerConfiguration : key 1525662264 falls into partition 0
2022-09-07 08:35:51.916 DEBUG 4993 --- [ask-scheduler-5] o.s.c.s.m.DirectWithAttributesChannel : preSend on channel 'bean 'produceOrder-out-0'', message: GenericMessage [payload=byte[126], headers={scst_partition=0, id=115c5421-00f2-286d-de02-0020e9322a17, contentType=application/json, timestamp=1662519951916}]
2022-09-07 08:35:51.916 DEBUG 4993 --- [ask-scheduler-5] tractMessageChannelBinder$SendingHandler : org.springframework.cloud.stream.binder.AbstractMessageChannelBinder$SendingHandler#63811d15 received message: GenericMessage [payload=byte[126], headers={scst_partition=0, scst_partitionOverride=0, id=145be7e8-381f-af73-e430-9cb645ff785f, contentType=application/json, timestamp=1662519951916}]
2022-09-07 08:35:51.917 DEBUG 4993 --- [ask-scheduler-5] o.s.c.s.m.DirectWithAttributesChannel : postSend (sent=true) on channel 'bean 'produceOrder-out-0'', message: GenericMessage [payload=byte[126], headers={scst_partition=0, scst_partitionOverride=0, id=145be7e8-381f-af73-e430-9cb645ff785f, contentType=application/json, timestamp=1662519951916}]

How to implement Retry logic in RabbitMQ?

In my project I'm setting SimpleRetryPolicy to add custom exception and RetryOperationsInterceptor which is consuming this policy.
#Bean
public SimpleRetryPolicy rejectionRetryPolicy() {
Map<Class<? extends Throwable>, Boolean> exceptionsMap = new HashMap<Class<? extends Throwable>, Boolean>();
exceptionsMap.put(DoNotRetryException.class, false);//not retriable
exceptionsMap.put(RetryException.class, true); //retriable
return new SimpleRetryPolicy(3, exceptionsMap, true);
}
#Bean
RetryOperationsInterceptor interceptor() {
return RetryInterceptorBuilder.stateless()
.retryPolicy(rejectionRetryPolicy())
.backOffOptions(2000L, 2, 3000L)
.recoverer(
new RepublishMessageRecoverer(rabbitTemplate(), "dlExchange", "dlRoutingKey"))
.build();
}
But with these configurations retry is not working for both RetryException and DoNotRetryException, where I want RetryException to be retried finite number of time and DoNotRetryException to send to DLQ
Please help with the issue, I'm attaching repo link if in case of need.
https://github.com/aviralnimbekar/RabbitMQ/tree/main/src
Your GlobalErrorHandler does its logic before the retry happens and you override there an exception with the AmqpRejectAndDontRequeueException. And looks like you do there a publish to DLX. Consider to move your GlobalErrorHandler logic to a more general ErrorHandler for the factory.setErrorHandler(); instead.
See more info in docs: https://docs.spring.io/spring-amqp/reference/html/#exception-handling
UPDATE
After removing errorHandler = "globalErrorHandler" from your #RabbitListener, I got this in logs:
2022-08-03 16:02:08.093 INFO 16896 --- [nio-8080-exec-4] c.t.r.producer.RabbitMQProducer : Message sent -> retry
2022-08-03 16:02:08.095 INFO 16896 --- [ntContainer#0-1] c.t.r.consumer.RabbitMQConsumer : Retrying message...
2022-08-03 16:02:10.096 INFO 16896 --- [ntContainer#0-1] c.t.r.consumer.RabbitMQConsumer : Retrying message...
2022-08-03 16:02:13.099 INFO 16896 --- [ntContainer#0-1] c.t.r.consumer.RabbitMQConsumer : Retrying message...
2022-08-03 16:02:13.100 WARN 16896 --- [ntContainer#0-1] o.s.a.r.retry.RepublishMessageRecoverer : Republishing failed message to exchange 'dlExchange' with routing key dlRoutingKey
2022-08-03 16:02:17.736 INFO 16896 --- [nio-8080-exec-5] c.t.r.producer.RabbitMQProducer : Message sent -> 1231231
2022-08-03 16:02:17.738 INFO 16896 --- [ntContainer#0-1] c.t.r.consumer.RabbitMQConsumer : sending into dlq...
2022-08-03 16:02:17.739 WARN 16896 --- [ntContainer#0-1] o.s.a.r.retry.RepublishMessageRecoverer : Republishing failed message to exchange 'dlExchange' with routing key dlRoutingKey
Which definitely reflects your original requirements.

Why Spring Integration QueueChannel runs sequentially with the delayed delivery message in kafka

When using Kafka integration and configuring a QueueChannel,
The processing of messages after the queue channel receives are executed sequentially with a delay of one second, it is not possible to understand the reason, the queue channel should be an accumulation of messages (up to the configured limit) and release the messages from the queue as long as it is not empty and there is a consumer.
Why are messages released sequentially with a delay of one second?
follows the log, as can be seen, the messages are received immediately (according to the date of the log) and are processed sequentially with a delay of 1 second?
2020-04-06 13:08:28.108 INFO 30718 --- [ntainer#0-0-C-1] o.s.integration.handler.LoggingHandler : readKafkaChannel: item: 2 - enriched
2020-04-06 13:08:28.109 INFO 30718 --- [ask-scheduler-3] o.s.integration.handler.LoggingHandler : channelThatIsProcessingSequential - item: 2 - enriched
2020-04-06 13:08:28.110 INFO 30718 --- [ntainer#0-0-C-1] o.s.integration.handler.LoggingHandler : readKafkaChannel: item: 7 - enriched
2020-04-06 13:08:28.111 INFO 30718 --- [ntainer#0-0-C-1] o.s.integration.handler.LoggingHandler : readKafkaChannel: item: 5 - enriched
2020-04-06 13:08:28.116 INFO 30718 --- [ntainer#0-1-C-1] o.s.integration.handler.LoggingHandler : readKafkaChannel: item: 6 - enriched
2020-04-06 13:08:28.119 INFO 30718 --- [ntainer#0-1-C-1] o.s.integration.handler.LoggingHandler : readKafkaChannel: item: 4 - enriched
2020-04-06 13:08:28.120 INFO 30718 --- [ntainer#0-1-C-1] o.s.integration.handler.LoggingHandler : readKafkaChannel: item: 1 - enriched
2020-04-06 13:08:28.121 INFO 30718 --- [ntainer#0-1-C-1] o.s.integration.handler.LoggingHandler : readKafkaChannel: item: 8 - enriched
2020-04-06 13:08:28.122 INFO 30718 --- [ntainer#0-1-C-1] o.s.integration.handler.LoggingHandler : readKafkaChannel: item: 3 - enriched
2020-04-06 13:08:28.123 INFO 30718 --- [ntainer#0-1-C-1] o.s.integration.handler.LoggingHandler : readKafkaChannel: item: 9 - enriched
2020-04-06 13:08:28.124 INFO 30718 --- [ntainer#0-1-C-1] o.s.integration.handler.LoggingHandler : readKafkaChannel: item: 10 - enriched
2020-04-06 13:08:29.111 INFO 30718 --- [ask-scheduler-2] o.s.integration.handler.LoggingHandler : channelThatIsProcessingSequential - item: 7 - enriched
2020-04-06 13:08:30.112 INFO 30718 --- [ask-scheduler-4] o.s.integration.handler.LoggingHandler : channelThatIsProcessingSequential - item: 5 - enriched
2020-04-06 13:08:31.112 INFO 30718 --- [ask-scheduler-1] o.s.integration.handler.LoggingHandler : channelThatIsProcessingSequential - item: 6 - enriched
2020-04-06 13:08:32.113 INFO 30718 --- [ask-scheduler-5] o.s.integration.handler.LoggingHandler : channelThatIsProcessingSequential - item: 4 - enriched
2020-04-06 13:08:33.113 INFO 30718 --- [ask-scheduler-3] o.s.integration.handler.LoggingHandler : channelThatIsProcessingSequential - item: 1 - enriched
2020-04-06 13:08:34.113 INFO 30718 --- [ask-scheduler-3] o.s.integration.handler.LoggingHandler : channelThatIsProcessingSequential - item: 8 - enriched
2020-04-06 13:08:35.113 INFO 30718 --- [ask-scheduler-3] o.s.integration.handler.LoggingHandler : channelThatIsProcessingSequential - item: 3 - enriched
2020-04-06 13:08:36.114 INFO 30718 --- [ask-scheduler-3] o.s.integration.handler.LoggingHandler : channelThatIsProcessingSequential - item: 9 - enriched
2020-04-06 13:08:37.114 INFO 30718 --- [ask-scheduler-3] o.s.integration.handler.LoggingHandler : channelThatIsProcessingSequential - item: 10 - enriched
Blockquote
package br.com.gubee.kafaexample
import org.apache.kafka.clients.admin.NewTopic
import org.springframework.context.annotation.Bean
import org.springframework.context.annotation.Configuration
import org.springframework.http.MediaType
import org.springframework.integration.annotation.Gateway
import org.springframework.integration.annotation.MessagingGateway
import org.springframework.integration.config.EnableIntegration
import org.springframework.integration.context.IntegrationContextUtils
import org.springframework.integration.dsl.IntegrationFlow
import org.springframework.integration.dsl.IntegrationFlows
import org.springframework.integration.kafka.dsl.Kafka
import org.springframework.kafka.core.ConsumerFactory
import org.springframework.kafka.core.KafkaTemplate
import org.springframework.kafka.listener.ContainerProperties
import org.springframework.scheduling.annotation.Async
import org.springframework.stereotype.Component
import org.springframework.web.bind.annotation.GetMapping
import org.springframework.web.bind.annotation.PathVariable
import org.springframework.web.bind.annotation.RequestMapping
import org.springframework.web.bind.annotation.RestController
#RestController
#RequestMapping(path = ["/testKafka"], produces = [MediaType.APPLICATION_JSON_VALUE])
class TestKafkaResource(private val testKafkaGateway: TestKafkaGateway) {
#GetMapping("init/{param}")
fun init(#PathVariable("param", required = false) param: String? = null) {
(1..10).forEach {
println("Send async item $it")
testKafkaGateway.init("item: $it")
}
}
}
#MessagingGateway(errorChannel = IntegrationContextUtils.ERROR_CHANNEL_BEAN_NAME)
#Component
interface TestKafkaGateway {
#Gateway(requestChannel = "publishKafkaChannel")
#Async
fun init(param: String)
}
#Configuration
#EnableIntegration
class TestKafkaFlow(private val kafkaTemplate: KafkaTemplate<*, *>,
private val consumerFactory: ConsumerFactory<*, *>) {
#Bean
fun readKafkaChannelTopic(): NewTopic {
return NewTopic("readKafkaChannel", 40, 1)
}
#Bean
fun publishKafka(): IntegrationFlow {
return IntegrationFlows
.from("publishKafkaChannel")
.transform<String, String> { "${it} - enriched" }
.handle(
Kafka.outboundChannelAdapter(kafkaTemplate)
.topic("readKafkaChannel")
.sendFailureChannel(IntegrationContextUtils.ERROR_CHANNEL_BEAN_NAME)
)
.get()
}
#Bean
fun readFromKafka(): IntegrationFlow {
return IntegrationFlows
.from(
Kafka.messageDrivenChannelAdapter(consumerFactory, "readKafkaChannel")
.configureListenerContainer { kafkaMessageListenerContainer ->
kafkaMessageListenerContainer.concurrency(2)
kafkaMessageListenerContainer.ackMode(ContainerProperties.AckMode.RECORD)
}
.errorChannel(IntegrationContextUtils.ERROR_CHANNEL_BEAN_NAME)
)
.channel { c -> c.queue(10) }
.log<String> {
"readKafkaChannel: ${it.payload}"
}
.channel("channelThatIsProcessingSequential")
.get()
}
#Bean
fun kafkaFlowAfter(): IntegrationFlow {
return IntegrationFlows
.from("channelThatIsProcessingSequential")
.log<String> {
"channelThatIsProcessingSequential - ${it.payload}"
}
.get()
}
}
As Gary said, it is not good to shift Kafka messages into a QueueChannel. The consumption on the Kafka.messageDrivenChannelAdapter() is already async - really no reason to move messages to the separate thread.
Anyway it looks like you use Spring Cloud Stream with its PollerMetadata configured to a 1 message per second polling policy.
If that doesn't fit your requirements, you always can change that .channel { c -> c.queue(10) } to use a second lambda and configure a custom poller over there.
BTW, we have already some Kotlin DSL implementation in Spring Integration: https://docs.spring.io/spring-integration/docs/5.3.0.M4/reference/html/kotlin-dsl.html#kotlin-dsl

Enabling exactly once causes streams shutdown due to timeout while initializing transactional state

I've written a simple example to test the join functionality. As I sometimes get messages duplicated in the resulting topic and sometimes missing messages in this topic, I thought while pinpointing the problem to enable exactly once semantics. However while doing this through:
props.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE);
I get a timeout that causes kafka streams to shut down in my app:
2019-05-02 17:02:32.585 INFO 153056 --- [-StreamThread-1] o.a.kafka.common.utils.AppInfoParser : Kafka version : 2.0.1
2019-05-02 17:02:32.585 INFO 153056 --- [-StreamThread-1] o.a.kafka.common.utils.AppInfoParser : Kafka commitId : fa14705e51bd2ce5
2019-05-02 17:02:32.593 INFO 153056 --- [-StreamThread-1] o.a.k.c.p.internals.TransactionManager : [Producer clientId=join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1-0_0-producer, transactionalId=join-test-0_0] ProducerId set to -1 with epoch -1
2019-05-02 17:03:32.599 ERROR 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] Error caught during partition assignment, will abort the current process and re-throw at the end of rebalance: {}
org.apache.kafka.common.errors.TimeoutException: Timeout expired while initializing transactional state in 60000ms.
2019-05-02 17:03:32.599 INFO 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] partition assignment took 60044 ms.
current active tasks: []
current standby tasks: []
previous active tasks: []
2019-05-02 17:03:32.601 INFO 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] State transition from PARTITIONS_ASSIGNED to PENDING_SHUTDOWN
2019-05-02 17:03:32.601 INFO 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] Shutting down
2019-05-02 17:03:32.615 INFO 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] State transition from PENDING_SHUTDOWN to DEAD
2019-05-02 17:03:32.615 INFO 153056 --- [-StreamThread-1] org.apache.kafka.streams.KafkaStreams : stream-client [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72] State transition from REBALANCING to ERROR
2019-05-02 17:03:32.615 WARN 153056 --- [-StreamThread-1] org.apache.kafka.streams.KafkaStreams : stream-client [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72] All stream threads have died. The instance will be in error state and should be closed.
2019-05-02 17:03:32.615 INFO 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] Shutdown complete
Exception in thread "join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] Failed to rebalance.
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:870)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:810)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired while initializing transactional state in 60000ms.
static String ORIGINAL = "original-sensor-data";
static String ERROR = "error-score";
public static void main(String[] args) throws IOException {
SpringApplication.run(JoinTest.class, args);
Properties props = getProperties();
final StreamsBuilder builder = new StreamsBuilder();
final KStream<String, OriginalSensorData> original = builder.stream(ORIGINAL, Consumed.with(Serdes.String(), new OriginalSensorDataSerde()));
final KStream<String, ErrorScore> error = builder.stream(ERROR, Consumed.with(Serdes.String(), new ErrorScoreSerde()));
KStream<String, ErrorScore> result = original.join(
error,
(originalValue, errorValue) -> new ErrorScore(new Date(originalValue.getTimestamp()), errorValue.getE(),
originalValue.getData().get("TE700PV").doubleValue(), errorValue.getT(), errorValue.getR()),
// KStream-KStream joins are always windowed joins, hence we must provide a join window.
JoinWindows.of(Duration.ofMillis(3000).toMillis()),
Joined.with(
Serdes.String(), /* key */
new OriginalSensorDataSerde(), /* left value */
new ErrorScoreSerde() /* right value */
)
).through("atl-joined-data-repartition", Produced.with(Serdes.String(), new ErrorScoreSerde()));
result.foreach((key, value) -> System.out.println("Join Stream: " + key + " " + value));
KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();
}
private static Properties getProperties() {
Properties props = new Properties();
//Url of the kafka broker, this can also be found in the Aiven console
props.put("bootstrap.servers", "localhost:9095");
props.put("group.id", "join-test");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put("application.id", "join-test");
props.put("default.timestamp.extractor", "com.my.SensorDataTimestampExtractor");
//The key of a message is a string
props.put("key.deserializer",
StringDeserializer.class.getName());
props.put("value.deserializer",
StringDeserializer.class.getName());
props.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE);
props.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 1000);
return props;
}
I'm expecting that the app starts without the timeout and continues working

reconsume Kafka 0.10.1 topic in Spring

In Spring-Kafka I want to reconsume a Kafka topic from the beginning. Doing this by changing the group.id to something unknown to Kafka of course works:
#KafkaListener(topics = "sensordata.t")
public void receiveMessage(String message) {
...
}
#Bean
public Map consumerConfigs() {
Map props = new HashMap<>();
props.put(ConsumerConfig.GROUP_ID_CONFIG, "NewGroupID");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false"); //it still commits though...
return props;
}
However, starting over by setting the offset to 0 fails.
#KafkaListener(topicPartitions =
{ #TopicPartition(topic = "sensordata.t",
partitionOffsets = #PartitionOffset(partition = "0", initialOffset = "0"))})
public void receiveMessage(String message) {
...
}
#Bean
public Map consumerConfigs() {
Map props = new HashMap<>();
...
props.put(ConsumerConfig.GROUP_ID_CONFIG, "NewGroupID");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "10000"); //making timeout window larger seems to have no influence
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "1"); //setting max records to 1 makes no difference
return props;
}
The error I get:
2016-11-14 14:07:59.018 INFO 8165 --- [ main] c.i.t.s.server.SpringKafkaApplication : Started SpringKafkaApplication in 4.134 seconds (JVM running for 4.745)
2016-11-14 14:07:59.125 INFO 8165 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : Discovered coordinator bto:9092 (id: 2147483647 rack: null) for group spring8.
2016-11-14 14:07:59.125 INFO 8165 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : Discovered coordinator bto:9092 (id: 2147483647 rack: null) for group spring8.
2016-11-14 14:07:59.129 INFO 8165 --- [afka-consumer-1] o.a.k.c.c.internals.ConsumerCoordinator : Revoking previously assigned partitions [] for group spring8
2016-11-14 14:07:59.129 INFO 8165 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[]
2016-11-14 14:07:59.129 INFO 8165 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : (Re-)joining group spring8
2016-11-14 14:07:59.338 ERROR 8165 --- [afka-consumer-1] essageListenerContainer$ListenerConsumer : Container exception
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:600) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:541) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:426) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1059) ~[kafka-clients-0.10.0.1.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitIfNecessary(KafkaMessageListenerContainer.java:939) ~[spring-kafka-1.1.1.RELEASE.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.processCommits(KafkaMessageListenerContainer.java:816) ~[spring-kafka-1.1.1.RELEASE.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:526) ~[spring-kafka-1.1.1.RELEASE.jar:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_92]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_92]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_92]
Anyone familiar with this?
I'm using Kafka 0.10.1.0 and
<properties>
<java.version>1.8</java.version>
<spring-kafka.version>1.1.1.RELEASE</spring-kafka.version>
</properties>
Why have you decided that it is the problem of offset 0?
Your StackTrace says that you have longer pollTimeout is longer than session.timeout.ms:
Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
How about to adjust them properly?

Resources