Enabling exactly once causes streams shutdown due to timeout while initializing transactional state

Enabling exactly once causes streams shutdown due to timeout while initializing transactional state - apache-kafka-streams

I've written a simple example to test the join functionality. As I sometimes get messages duplicated in the resulting topic and sometimes missing messages in this topic, I thought while pinpointing the problem to enable exactly once semantics. However while doing this through:
props.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE);
I get a timeout that causes kafka streams to shut down in my app:
2019-05-02 17:02:32.585 INFO 153056 --- [-StreamThread-1] o.a.kafka.common.utils.AppInfoParser : Kafka version : 2.0.1
2019-05-02 17:02:32.585 INFO 153056 --- [-StreamThread-1] o.a.kafka.common.utils.AppInfoParser : Kafka commitId : fa14705e51bd2ce5
2019-05-02 17:02:32.593 INFO 153056 --- [-StreamThread-1] o.a.k.c.p.internals.TransactionManager : [Producer clientId=join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1-0_0-producer, transactionalId=join-test-0_0] ProducerId set to -1 with epoch -1
2019-05-02 17:03:32.599 ERROR 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] Error caught during partition assignment, will abort the current process and re-throw at the end of rebalance: {}
org.apache.kafka.common.errors.TimeoutException: Timeout expired while initializing transactional state in 60000ms.
2019-05-02 17:03:32.599 INFO 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] partition assignment took 60044 ms.
current active tasks: []
current standby tasks: []
previous active tasks: []
2019-05-02 17:03:32.601 INFO 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] State transition from PARTITIONS_ASSIGNED to PENDING_SHUTDOWN
2019-05-02 17:03:32.601 INFO 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] Shutting down
2019-05-02 17:03:32.615 INFO 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] State transition from PENDING_SHUTDOWN to DEAD
2019-05-02 17:03:32.615 INFO 153056 --- [-StreamThread-1] org.apache.kafka.streams.KafkaStreams : stream-client [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72] State transition from REBALANCING to ERROR
2019-05-02 17:03:32.615 WARN 153056 --- [-StreamThread-1] org.apache.kafka.streams.KafkaStreams : stream-client [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72] All stream threads have died. The instance will be in error state and should be closed.
2019-05-02 17:03:32.615 INFO 153056 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] Shutdown complete
Exception in thread "join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1" org.apache.kafka.streams.errors.StreamsException: stream-thread [join-test-90a0aa93-dfd8-4d4f-894b-85a3c5634f72-StreamThread-1] Failed to rebalance.
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:870)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:810)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:767)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:736)
Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired while initializing transactional state in 60000ms.
static String ORIGINAL = "original-sensor-data";
static String ERROR = "error-score";
public static void main(String[] args) throws IOException {
SpringApplication.run(JoinTest.class, args);
Properties props = getProperties();
final StreamsBuilder builder = new StreamsBuilder();
final KStream<String, OriginalSensorData> original = builder.stream(ORIGINAL, Consumed.with(Serdes.String(), new OriginalSensorDataSerde()));
final KStream<String, ErrorScore> error = builder.stream(ERROR, Consumed.with(Serdes.String(), new ErrorScoreSerde()));
KStream<String, ErrorScore> result = original.join(
error,
(originalValue, errorValue) -> new ErrorScore(new Date(originalValue.getTimestamp()), errorValue.getE(),
originalValue.getData().get("TE700PV").doubleValue(), errorValue.getT(), errorValue.getR()),
// KStream-KStream joins are always windowed joins, hence we must provide a join window.
JoinWindows.of(Duration.ofMillis(3000).toMillis()),
Joined.with(
Serdes.String(), /* key */
new OriginalSensorDataSerde(), /* left value */
new ErrorScoreSerde() /* right value */
)
).through("atl-joined-data-repartition", Produced.with(Serdes.String(), new ErrorScoreSerde()));
result.foreach((key, value) -> System.out.println("Join Stream: " + key + " " + value));
KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();
}
private static Properties getProperties() {
Properties props = new Properties();
//Url of the kafka broker, this can also be found in the Aiven console
props.put("bootstrap.servers", "localhost:9095");
props.put("group.id", "join-test");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put("application.id", "join-test");
props.put("default.timestamp.extractor", "com.my.SensorDataTimestampExtractor");
//The key of a message is a string
props.put("key.deserializer",
StringDeserializer.class.getName());
props.put("value.deserializer",
StringDeserializer.class.getName());
props.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE);
props.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 1000);
return props;
}
I'm expecting that the app starts without the timeout and continues working

Related

How to make different instances of consumers in the same consumer group consume different shards of the same kinesis stream?

I'm following the example given in spring-cloud-stream-samples with the following modifications.
application.yml
spring:
cloud:
stream:
instanceCount: 2
bindings:
produceOrder-out-0:
destination: test_stream
content-type: application/json
producer:
partitionCount: 2
partitionSelectorName: eventPartitionSelectorStrategy
partitionKeyExtractorName: eventPartitionKeyExtractorStrategy
processOrder-in-0:
group: eventConsumers
destination: test_stream
content-type: application/json
function:
definition: processOrder;produceOrder
ProducerConfiguration.java
package demo.config;
import demo.stream.Event;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.cloud.stream.binder.PartitionKeyExtractorStrategy;
import org.springframework.cloud.stream.binder.PartitionSelectorStrategy;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.messaging.Message;
#Configuration
public class ProducerConfiguration {
private static Logger logger = LoggerFactory.getLogger(ProducerConfiguration.class);
#Bean
public PartitionSelectorStrategy eventPartitionSelectorStrategy() {
return new PartitionSelectorStrategy() {
#Override
public int selectPartition(Object key, int partitionCount) {
if(key instanceof Integer) {
int partition = (((Integer)key)%partitionCount + partitionCount)%partitionCount;
logger.info("key {} falls into partition {}" , key , partition);
return partition;
}
return 0;
}
};
}
#Bean
public PartitionKeyExtractorStrategy eventPartitionKeyExtractorStrategy() {
return new PartitionKeyExtractorStrategy() {
#Override
public Object extractKey(Message<?> message) {
if(message.getPayload() instanceof Event) {
return ((Event) message.getPayload()).hashCode();
} else {
return 0;
}
}
};
}
}
When I run two instances of this application by setting --spring.cloud.stream.instanceIndex=0 and --spring.cloud.stream.instanceIndex=1 I'm able to see the events getting produced. However, only one of the instance is consuming the records from both the partitions, the other instance is not consuming despite the producer creating partitioned records.
Logs seen in KinesisProducer
2022-09-04 00:17:22.628 INFO 34029 --- [ main] a.i.k.KinesisMessageDrivenChannelAdapter : started KinesisMessageDrivenChannelAdapter{shardOffsets=[KinesisShardOffset{iteratorType=LATEST, sequenceNumber='null', timestamp=null, stream='test_stream', shard='shardId-000000000000', reset=false}], consumerGroup='eventConsumers'}
2022-09-04 00:17:22.658 INFO 34029 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 64398 (http) with context path ''
2022-09-04 00:17:22.723 INFO 34029 --- [ main] demo.KinesisApplication : Started KinesisApplication in 18.487 seconds (JVM running for 19.192)
2022-09-04 00:17:23.938 INFO 34029 --- [esis-consumer-1] a.i.k.KinesisMessageDrivenChannelAdapter : The [ShardConsumer{shardOffset=KinesisShardOffset{iteratorType=LATEST, sequenceNumber='null', timestamp=null, stream='test_stream', shard='shardId-000000000000', reset=false}, state=NEW}] has been started.
2022-09-04 00:17:55.222 INFO 34029 --- [io-64398-exec-1] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring DispatcherServlet 'dispatcherServlet'
2022-09-04 00:17:55.222 INFO 34029 --- [io-64398-exec-1] o.s.web.servlet.DispatcherServlet : Initializing Servlet 'dispatcherServlet'
2022-09-04 00:17:55.224 INFO 34029 --- [io-64398-exec-1] o.s.web.servlet.DispatcherServlet : Completed initialization in 2 ms
2022-09-04 00:17:55.598 INFO 34029 --- [io-64398-exec-1] demo.stream.OrdersSource : Event sent: Event [id=null, subject=Order [id=5fbaca2f-d947-423d-a1f1-b1c9c268d2d0, name=pen], type=ORDER, originator=KinesisProducer]
2022-09-04 00:17:56.337 INFO 34029 --- [ask-scheduler-3] demo.config.ProducerConfiguration : key 1397835167 falls into partition 1
2022-09-04 00:18:02.047 INFO 34029 --- [io-64398-exec-2] demo.stream.OrdersSource : Event sent: Event [id=null, subject=Order [id=83021259-89b5-4451-a0ec-da3152d37a58, name=pen], type=ORDER, originator=KinesisProducer]
2022-09-04 00:18:02.361 INFO 34029 --- [ask-scheduler-3] demo.config.ProducerConfiguration : key 147530256 falls into partition 0
Logs seen in KinesisConsumer
2022-09-04 00:17:28.050 INFO 34058 --- [ main] a.i.k.KinesisMessageDrivenChannelAdapter : started KinesisMessageDrivenChannelAdapter{shardOffsets=[KinesisShardOffset{iteratorType=LATEST, sequenceNumber='null', timestamp=null, stream='test_stream', shard='shardId-000000000001', reset=false}], consumerGroup='eventConsumers'}
2022-09-04 00:17:28.076 INFO 34058 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 64399 (http) with context path ''
2022-09-04 00:17:28.116 INFO 34058 --- [ main] demo.KinesisApplication : Started KinesisApplication in 18.566 seconds (JVM running for 19.839)
2022-09-04 00:17:29.365 INFO 34058 --- [esis-consumer-1] a.i.k.KinesisMessageDrivenChannelAdapter : The [ShardConsumer{shardOffset=KinesisShardOffset{iteratorType=AFTER_SEQUENCE_NUMBER, sequenceNumber='49632927200161141377996226513172299243826807332967284754', timestamp=null, stream='test_stream', shard='shardId-000000000001', reset=false}, state=NEW}] has been started.
2022-09-04 00:17:57.346 INFO 34058 --- [esis-consumer-1] demo.stream.OrderStreamConfiguration : An order has been placed from this service Event [id=null, subject=Order [id=5fbaca2f-d947-423d-a1f1-b1c9c268d2d0, name=pen], type=ORDER, originator=KinesisProducer]
2022-09-04 00:18:04.384 INFO 34058 --- [esis-consumer-1] demo.stream.OrderStreamConfiguration : An order has been placed from this service Event [id=null, subject=Order [id=83021259-89b5-4451-a0ec-da3152d37a58, name=pen], type=ORDER, originator=KinesisProducer]
spring-cloud-stream-binder-kinesis version : 2.2.0
I have these following questions:
For Static shard distribution within a single consumer group, is there any other parameter that needs to be configured that I have missed?
Do I need to specify the DynamoDB Checkpoint properties only for dynamic shard distribution?
EDIT
I have added the DEBUG logs seen in KinesisProducer below:
2022-09-07 08:30:38.120 INFO 4993 --- [io-64398-exec-1] demo.stream.OrdersSource : Event sent: Event [id=null, subject=Order [id=b3927132-a80d-481e-a219-dbd0c0c7d124, name=pen], type=ORDER, originator=KinesisProducer]
2022-09-07 08:30:38.806 INFO 4993 --- [ask-scheduler-3] demo.config.ProducerConfiguration : key 1842629003 falls into partition 1
2022-09-07 08:30:38.812 DEBUG 4993 --- [ask-scheduler-3] o.s.c.s.m.DirectWithAttributesChannel : preSend on channel 'bean 'produceOrder-out-0'', message: GenericMessage [payload=byte[126], headers={scst_partition=1, id=9cb8ec58-4a9e-7b6f-4263-c9d4d1eec906, contentType=application/json, timestamp=1662519638809}]
2022-09-07 08:30:38.813 DEBUG 4993 --- [ask-scheduler-3] tractMessageChannelBinder$SendingHandler : org.springframework.cloud.stream.binder.AbstractMessageChannelBinder$SendingHandler#63811d15 received message: GenericMessage [payload=byte[126], headers={scst_partition=1, scst_partitionOverride=0, id=731f444b-d3df-a51a-33de-8adf78e1e746, contentType=application/json, timestamp=1662519638813}]
2022-09-07 08:30:38.832 DEBUG 4993 --- [ask-scheduler-3] o.s.c.s.m.DirectWithAttributesChannel : postSend (sent=true) on channel 'bean 'produceOrder-out-0'', message: GenericMessage [payload=byte[126], headers={scst_partition=1, scst_partitionOverride=0, id=731f444b-d3df-a51a-33de-8adf78e1e746, contentType=application/json, timestamp=1662519638813}]
2022-09-07 08:35:51.153 INFO 4993 --- [io-64398-exec-2] demo.stream.OrdersSource : Event sent: Event [id=null, subject=Order [id=6a5b3084-11dc-4080-a80e-61cc73315139, name=pen], type=ORDER, originator=KinesisProducer]
2022-09-07 08:35:51.915 INFO 4993 --- [ask-scheduler-5] demo.config.ProducerConfiguration : key 1525662264 falls into partition 0
2022-09-07 08:35:51.916 DEBUG 4993 --- [ask-scheduler-5] o.s.c.s.m.DirectWithAttributesChannel : preSend on channel 'bean 'produceOrder-out-0'', message: GenericMessage [payload=byte[126], headers={scst_partition=0, id=115c5421-00f2-286d-de02-0020e9322a17, contentType=application/json, timestamp=1662519951916}]
2022-09-07 08:35:51.916 DEBUG 4993 --- [ask-scheduler-5] tractMessageChannelBinder$SendingHandler : org.springframework.cloud.stream.binder.AbstractMessageChannelBinder$SendingHandler#63811d15 received message: GenericMessage [payload=byte[126], headers={scst_partition=0, scst_partitionOverride=0, id=145be7e8-381f-af73-e430-9cb645ff785f, contentType=application/json, timestamp=1662519951916}]
2022-09-07 08:35:51.917 DEBUG 4993 --- [ask-scheduler-5] o.s.c.s.m.DirectWithAttributesChannel : postSend (sent=true) on channel 'bean 'produceOrder-out-0'', message: GenericMessage [payload=byte[126], headers={scst_partition=0, scst_partitionOverride=0, id=145be7e8-381f-af73-e430-9cb645ff785f, contentType=application/json, timestamp=1662519951916}]

How to implement Retry logic in RabbitMQ?

In my project I'm setting SimpleRetryPolicy to add custom exception and RetryOperationsInterceptor which is consuming this policy.
#Bean
public SimpleRetryPolicy rejectionRetryPolicy() {
Map<Class<? extends Throwable>, Boolean> exceptionsMap = new HashMap<Class<? extends Throwable>, Boolean>();
exceptionsMap.put(DoNotRetryException.class, false);//not retriable
exceptionsMap.put(RetryException.class, true); //retriable
return new SimpleRetryPolicy(3, exceptionsMap, true);
}
#Bean
RetryOperationsInterceptor interceptor() {
return RetryInterceptorBuilder.stateless()
.retryPolicy(rejectionRetryPolicy())
.backOffOptions(2000L, 2, 3000L)
.recoverer(
new RepublishMessageRecoverer(rabbitTemplate(), "dlExchange", "dlRoutingKey"))
.build();
}
But with these configurations retry is not working for both RetryException and DoNotRetryException, where I want RetryException to be retried finite number of time and DoNotRetryException to send to DLQ
Please help with the issue, I'm attaching repo link if in case of need.
https://github.com/aviralnimbekar/RabbitMQ/tree/main/src

Your GlobalErrorHandler does its logic before the retry happens and you override there an exception with the AmqpRejectAndDontRequeueException. And looks like you do there a publish to DLX. Consider to move your GlobalErrorHandler logic to a more general ErrorHandler for the factory.setErrorHandler(); instead.
See more info in docs: https://docs.spring.io/spring-amqp/reference/html/#exception-handling
UPDATE
After removing errorHandler = "globalErrorHandler" from your #RabbitListener, I got this in logs:
2022-08-03 16:02:08.093 INFO 16896 --- [nio-8080-exec-4] c.t.r.producer.RabbitMQProducer : Message sent -> retry
2022-08-03 16:02:08.095 INFO 16896 --- [ntContainer#0-1] c.t.r.consumer.RabbitMQConsumer : Retrying message...
2022-08-03 16:02:10.096 INFO 16896 --- [ntContainer#0-1] c.t.r.consumer.RabbitMQConsumer : Retrying message...
2022-08-03 16:02:13.099 INFO 16896 --- [ntContainer#0-1] c.t.r.consumer.RabbitMQConsumer : Retrying message...
2022-08-03 16:02:13.100 WARN 16896 --- [ntContainer#0-1] o.s.a.r.retry.RepublishMessageRecoverer : Republishing failed message to exchange 'dlExchange' with routing key dlRoutingKey
2022-08-03 16:02:17.736 INFO 16896 --- [nio-8080-exec-5] c.t.r.producer.RabbitMQProducer : Message sent -> 1231231
2022-08-03 16:02:17.738 INFO 16896 --- [ntContainer#0-1] c.t.r.consumer.RabbitMQConsumer : sending into dlq...
2022-08-03 16:02:17.739 WARN 16896 --- [ntContainer#0-1] o.s.a.r.retry.RepublishMessageRecoverer : Republishing failed message to exchange 'dlExchange' with routing key dlRoutingKey
Which definitely reflects your original requirements.

KAFKA : splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE

I am sending 10 messages . 2 messagesare "right" and 1 message has size over 1MB which gets rejected by Kafka broker due to RecordTooLargeException.
I have 2 doubts
1) MESSAGE_TOO_LARGE appears only when the Scheduler calls the method second time onwards.When the method is called for the first time by scheduler splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE doesnt appear.
2)Why retries are not getting decreased.I have given retry=1 .
I am calling Sender class using Spring Boot Scheduling mechanism.Something like this
#Scheduled(fixedDelay = 30000)
public void process() {
sender.sendThem();
}
I am using Spring Boot KafkaTemplate.
#Configuration
#EnableKafka
public class KakfaConfiguration {
#Bean
public ProducerFactory<String, String> producerFactory() {
Map<String, Object> config = new HashMap<>();
// props.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SSL");
// props.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG,
// appProps.getJksLocation());
// props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG,
// appProps.getJksPassword());
config.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
config.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
config.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
config.put(ProducerConfig.ACKS_CONFIG, acks);
config.put(ProducerConfig.RETRY_BACKOFF_MS_CONFIG, retryBackOffMsConfig);
config.put(ProducerConfig.RETRIES_CONFIG, retries);
config.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, true);
config.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "prod-99");
return new DefaultKafkaProducerFactory<>(config);
}
#Bean
public KafkaTemplate<String, String> kafkaTemplate() {
return new KafkaTemplate<>(producerFactory());
}
#Bean(name = "ktm")
public KafkaTransactionManager kafkaTransactionManager() {
KafkaTransactionManager ktm = new KafkaTransactionManager(producerFactory());
ktm.setTransactionSynchronization(AbstractPlatformTransactionManager.SYNCHRONIZATION_ON_ACTUAL_TRANSACTION);
return ktm;
}
}
#Component
#EnableTransactionManagement
class Sender {
#Autowired
private KafkaTemplate<String, String> template;
private static final Logger LOG = LoggerFactory.getLogger(Sender.class);
#Transactional("ktm")
public void sendThem(List<String> toSend) throws InterruptedException {
List<ListenableFuture<SendResult<String, String>>> futures = new ArrayList<>();
CountDownLatch latch = new CountDownLatch(toSend.size());
ListenableFutureCallback<SendResult<String, String>> callback = new ListenableFutureCallback<SendResult<String, String>>() {
#Override
public void onSuccess(SendResult<String, String> result) {
LOG.info(" message sucess : " + result.getProducerRecord().value());
latch.countDown();
}
#Override
public void onFailure(Throwable ex) {
LOG.error("Message Failed ");
latch.countDown();
}
};
toSend.forEach(str -> {
ListenableFuture<SendResult<String, String>> future = template.send("t_101", str);
future.addCallback(callback);
});
if (latch.await(12, TimeUnit.MINUTES)) {
LOG.info("All sent ok");
} else {
for (int i = 0; i < toSend.size(); i++) {
if (!futures.get(i).isDone()) {
LOG.error("No send result for " + toSend.get(i));
}
}
}
I am getting the following logs
2020-05-01 15:55:18.346 INFO 6476 --- [ scheduling-1] o.a.kafka.common.utils.AppInfoParser : Kafka startTimeMs: 1588328718345
2020-05-01 15:55:18.347 INFO 6476 --- [ scheduling-1] o.a.k.c.p.internals.TransactionManager : [Producer clientId=producer-prod-991, transactionalId=prod-991] ProducerId set to -1 with epoch -1
2020-05-01 15:55:18.351 INFO 6476 --- [oducer-prod-991] org.apache.kafka.clients.Metadata : [Producer clientId=producer-prod-991, transactionalId=prod-991] Cluster ID: bL-uhcXlRSWGaOaSeDpIog
2020-05-01 15:55:48.358 INFO 6476 --- [oducer-prod-991] o.a.k.c.p.internals.TransactionManager : [Producer clientId=producer-prod-991, transactionalId=prod-991] ProducerId set to 13000 with epoch 10
Value of kafka template----- 1518752790
2020-05-01 15:55:48.377 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 8 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.379 INFO 6476 --- [oducer-prod-991] com.a.kafkaproducer.producer.Sender : message sucess : TTTT0
2020-05-01 15:55:48.379 INFO 6476 --- [oducer-prod-991] com.a.kafkaproducer.producer.Sender : message sucess : TTTT1
2020-05-01 15:55:48.511 ERROR 6476 --- [oducer-prod-991] com.a.kafkaproducer.producer.Sender : Message Failed
2020-05-01 15:55:48.512 ERROR 6476 --- [oducer-prod-991] o.s.k.support.LoggingProducerListener : Exception thrown when sending a message with key='null' and payload='
2020-05-01 15:55:48.514 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 10 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.518 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 11 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.523 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 12 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.527 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 13 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.531 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 14 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.534 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 15 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.538 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 16 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.542 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 17 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
2020-05-01 15:55:48.546 WARN 6476 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Got error produce response in correlation id 18 on topic-partition t_101-2, splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE
Then after sometime the program completes with following log
Caused by: org.apache.kafka.common.errors.TimeoutException: Expiring 2 record(s) for t_101-0:120000 ms has passed since batch creation
2020-05-01 16:18:31.322 WARN 17816 --- [ scheduling-1] o.s.k.core.DefaultKafkaProducerFactory : Error during transactional operation; producer removed from cache; possible cause: broker restarted during transaction: CloseSafeProducer [delegate=org.apache.kafka.clients.producer.KafkaProducer#7085a4dd, txId=prod-991]
2020-05-01 16:18:31.322 INFO 17816 --- [ scheduling-1] o.a.k.clients.producer.KafkaProducer : [Producer clientId=producer-prod-991, transactionalId=prod-991] Closing the Kafka producer with timeoutMillis = 5000 ms.
2020-05-01 16:18:31.324 INFO 17816 --- [oducer-prod-991] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-prod-991, transactionalId=prod-991] Aborting incomplete transaction due to shutdown
error messahe here
------ processing done in parent class------

A broad picture of producer workflow is given below.
By setting RETRIES_CONFIG property, we can guarantee that in case of failure this producer will try sending that message.
If the batch is too large, we split the batch and send the split batches again. We do not decrement the retry attempts in this case.
You can go through the source code given below and find the scenarios in which retry count is decremented.
https://github.com/apache/kafka/blob/68ac551966e2be5b13adb2f703a01211e6f7a34b/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java

How to properly structure a Spring Boot / Kafka app, to prevent the app from shutting down?

I am working on a Spring Boot 2.1.1, which need to communicate with Kafka
As it stands, the app starts up, connects to Kafka, reads / writes a few messages, and then exits.
The goal is to keep the app running, and listening on some Kafka topics, without ever exiting.
Here is the main app file:
package com.example.springbootstarter;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.TimeUnit;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.ConfigurableApplicationContext;
import org.springframework.context.annotation.Bean;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.kafka.annotation.TopicPartition;
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.kafka.support.KafkaHeaders;
import org.springframework.kafka.support.SendResult;
import org.springframework.messaging.handler.annotation.Header;
import org.springframework.messaging.handler.annotation.Payload;
import org.springframework.util.concurrent.ListenableFuture;
import org.springframework.util.concurrent.ListenableFutureCallback;
#SpringBootApplication
public class SpringBootStarterApplication {
public static void main(String[] args) throws Exception {
ConfigurableApplicationContext context = SpringApplication.run(SpringBootStarterApplication.class, args);
MessageProducer producer = context.getBean(MessageProducer.class);
MessageListener listener = context.getBean(MessageListener.class);
/*
* Sending a Hello World message to topic 'baeldung'.
* Must be recieved by both listeners with group foo
* and bar with containerFactory fooKafkaListenerContainerFactory
* and barKafkaListenerContainerFactory respectively.
* It will also be recieved by the listener with
* headersKafkaListenerContainerFactory as container factory
*/
producer.sendMessage("Hello, World!");
listener.latch.await(10, TimeUnit.SECONDS);
/*
* Sending message to a topic with 5 partition,
* each message to a different partition. But as per
* listener configuration, only the messages from
* partition 0 and 3 will be consumed.
*/
for (int i = 0; i < 5; i++) {
producer.sendMessageToPartion("Hello To Partioned Topic!", i);
}
listener.partitionLatch.await(10, TimeUnit.SECONDS);
/*
* Sending message to 'filtered' topic. As per listener
* configuration, all messages with char sequence
* 'World' will be discarded.
*/
producer.sendMessageToFiltered("Hello Baeldung!");
producer.sendMessageToFiltered("Hello World!");
listener.filterLatch.await(10, TimeUnit.SECONDS);
/*
* Sending message to 'greeting' topic. This will send
* and recieved a java object with the help of
* greetingKafkaListenerContainerFactory.
*/
producer.sendGreetingMessage(new Greeting("Greetings", "World!"));
listener.greetingLatch.await(10, TimeUnit.SECONDS);
context.close();
}
#Bean
public MessageProducer messageProducer() {
return new MessageProducer();
}
#Bean
public MessageListener messageListener() {
return new MessageListener();
}
public static class MessageProducer {
#Autowired
private KafkaTemplate<String, String> kafkaTemplate;
#Autowired
private KafkaTemplate<String, Greeting> greetingKafkaTemplate;
#Value(value = "${message.topic.name}")
private String topicName;
#Value(value = "${partitioned.topic.name}")
private String partionedTopicName;
#Value(value = "${filtered.topic.name}")
private String filteredTopicName;
#Value(value = "${greeting.topic.name}")
private String greetingTopicName;
public void sendMessage(String message) {
ListenableFuture<SendResult<String, String>> future = kafkaTemplate.send(topicName, message);
future.addCallback(new ListenableFutureCallback<SendResult<String, String>>() {
#Override
public void onSuccess(SendResult<String, String> result) {
System.out.println("Sent message=[" + message + "] with offset=[" + result.getRecordMetadata().offset() + "]");
}
#Override
public void onFailure(Throwable ex) {
System.out.println("Unable to send message=[" + message + "] due to : " + ex.getMessage());
}
});
}
public void sendMessageToPartion(String message, int partition) {
kafkaTemplate.send(partionedTopicName, partition, null, message);
}
public void sendMessageToFiltered(String message) {
kafkaTemplate.send(filteredTopicName, message);
}
public void sendGreetingMessage(Greeting greeting) {
greetingKafkaTemplate.send(greetingTopicName, greeting);
}
}
public static class MessageListener {
private CountDownLatch latch = new CountDownLatch(3);
private CountDownLatch partitionLatch = new CountDownLatch(2);
private CountDownLatch filterLatch = new CountDownLatch(2);
private CountDownLatch greetingLatch = new CountDownLatch(1);
#KafkaListener(topics = "${message.topic.name}", groupId = "foo", containerFactory = "fooKafkaListenerContainerFactory")
public void listenGroupFoo(String message) {
System.out.println("Received Messasge in group 'foo': " + message);
latch.countDown();
}
#KafkaListener(topics = "${message.topic.name}", groupId = "bar", containerFactory = "barKafkaListenerContainerFactory")
public void listenGroupBar(String message) {
System.out.println("Received Messasge in group 'bar': " + message);
latch.countDown();
}
#KafkaListener(topics = "${message.topic.name}", containerFactory = "headersKafkaListenerContainerFactory")
public void listenWithHeaders(#Payload String message, #Header(KafkaHeaders.RECEIVED_PARTITION_ID) int partition) {
System.out.println("Received Messasge: " + message + " from partition: " + partition);
latch.countDown();
}
#KafkaListener(topicPartitions = #TopicPartition(topic = "${partitioned.topic.name}", partitions = { "0", "3" }))
public void listenToParition(#Payload String message, #Header(KafkaHeaders.RECEIVED_PARTITION_ID) int partition) {
System.out.println("Received Message: " + message + " from partition: " + partition);
this.partitionLatch.countDown();
}
#KafkaListener(topics = "${filtered.topic.name}", containerFactory = "filterKafkaListenerContainerFactory")
public void listenWithFilter(String message) {
System.out.println("Recieved Message in filtered listener: " + message);
this.filterLatch.countDown();
}
#KafkaListener(topics = "${greeting.topic.name}", containerFactory = "greetingKafkaListenerContainerFactory")
public void greetingListener(Greeting greeting) {
System.out.println("Recieved greeting message: " + greeting);
this.greetingLatch.countDown();
}
}
}
And here is the abridged log output of an app run:
019-01-11 13:51:51.728 INFO 40885 --- [ main] c.e.s.SpringBootStarterApplication : Starting SpringBootStarterApplication on CHIMAC11592-2.local with PID 40885 (/Users/e602684/Documents/dev/spring-boot-mongo-crud/target/classes started by e602684 in /Users/e602684/Documents/dev/spring-boot-mongo-crud)
2019-01-11 13:51:51.731 INFO 40885 --- [ main] c.e.s.SpringBootStarterApplication : No active profile set, falling back to default profiles: default
2019-01-11 13:51:52.181 INFO 40885 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Bootstrapping Spring Data repositories in DEFAULT mode.
2019-01-11 13:51:52.223 INFO 40885 --- [ main] .s.d.r.c.RepositoryConfigurationDelegate : Finished Spring Data repository scanning in 39ms. Found 1 repository interfaces.
2019-01-11 13:51:52.389 INFO 40885 --- [ main] trationDelegate$BeanPostProcessorChecker : Bean 'org.springframework.kafka.annotation.KafkaBootstrapConfiguration' of type [org.springframework.kafka.annotation.KafkaBootstrapConfiguration$$EnhancerBySpringCGLIB$$6ef5e3a3] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
2019-01-11 13:51:52.666 INFO 40885 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat initialized with port(s): 8080 (http)
2019-01-11 13:51:52.684 INFO 40885 --- [ main] o.apache.catalina.core.StandardService : Starting service [Tomcat]
2019-01-11 13:51:52.684 INFO 40885 --- [ main] org.apache.catalina.core.StandardEngine : Starting Servlet Engine: Apache Tomcat/9.0.13
2019-01-11 13:51:52.689 INFO 40885 --- [ main] o.a.catalina.core.AprLifecycleListener : The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: [/Users/e602684/Library/Java/Extensions:/Library/Java/Extensions:/Network/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java:.]
2019-01-11 13:51:52.773 INFO 40885 --- [ main] o.a.c.c.C.[Tomcat].[localhost].[/] : Initializing Spring embedded WebApplicationContext
2019-01-11 13:51:52.774 INFO 40885 --- [ main] o.s.web.context.ContextLoader : Root WebApplicationContext: initialization completed in 1012 ms
2019-01-11 13:51:52.993 INFO 40885 --- [ main] org.mongodb.driver.cluster : Cluster created with settings {hosts=[localhost:27017], mode=MULTIPLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
2019-01-11 13:51:52.993 INFO 40885 --- [ main] org.mongodb.driver.cluster : Adding discovered server localhost:27017 to client view of cluster
2019-01-11 13:51:53.031 INFO 40885 --- [localhost:27017] org.mongodb.driver.connection : Opened connection [connectionId{localValue:1, serverValue:12}] to localhost:27017
2019-01-11 13:51:53.034 INFO 40885 --- [localhost:27017] org.mongodb.driver.cluster : Monitor thread successfully connected to server with description ServerDescription{address=localhost:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 0, 4]}, minWireVersion=0, maxWireVersion=7, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=1632137}
2019-01-11 13:51:53.035 INFO 40885 --- [localhost:27017] org.mongodb.driver.cluster : Discovered cluster type of STANDALONE
2019-01-11 13:51:53.412 INFO 40885 --- [ main] o.s.s.concurrent.ThreadPoolTaskExecutor : Initializing ExecutorService 'applicationTaskExecutor'
2019-01-11 13:51:53.574 INFO 40885 --- [ main] o.a.k.clients.admin.AdminClientConfig : AdminClientConfig values:
bootstrap.servers = [127.0.0.1:9092]
client.id =
connections.max.idle.ms = 300000
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 120000
retries = 5
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
2019-01-11 13:51:53.968 INFO 40885 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka version : 2.0.1
2019-01-11 13:51:53.968 INFO 40885 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka commitId : fa14705e51bd2ce5
2019-01-11 13:51:53.973 INFO 40885 --- [ad | producer-1] org.apache.kafka.clients.Metadata : Cluster ID: QW5A9DYxTlSZVC2M8pxAsg
Sent message=[Hello, World!] with offset=[9]
2019-01-11 13:51:56.858 INFO 40885 --- [ntainer#2-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=headers] Successfully joined group with generation 9
2019-01-11 13:51:56.859 INFO 40885 --- [ntainer#2-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=headers] Setting newly assigned partitions [test-0]
2019-01-11 13:51:56.862 INFO 40885 --- [ntainer#2-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [test-0]
Received Messasge: Hello, World! from partition: 0
2019-01-11 13:51:56.895 INFO 40885 --- [ntainer#4-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-6, groupId=filter] Successfully joined group with generation 9
2019-01-11 13:51:56.896 INFO 40885 --- [ntainer#4-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-6, groupId=filter] Setting newly assigned partitions [test-0]
2019-01-11 13:51:56.898 INFO 40885 --- [ntainer#4-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [test-0]
2019-01-11 13:51:56.915 INFO 40885 --- [ntainer#5-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-8, groupId=greeting] Successfully joined group with generation 9
2019-01-11 13:51:56.916 INFO 40885 --- [ntainer#5-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-8, groupId=greeting] Setting newly assigned partitions [greetings-0]
2019-01-11 13:51:56.919 INFO 40885 --- [ntainer#5-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [greetings-0]
2019-01-11 13:51:56.930 INFO 40885 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-10, groupId=foo] Successfully joined group with generation 9
2019-01-11 13:51:56.931 INFO 40885 --- [ntainer#0-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-10, groupId=foo] Setting newly assigned partitions [test-0]
2019-01-11 13:51:56.933 INFO 40885 --- [ntainer#0-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [test-0]
Recieved greeting message: Greetings, World!!
Received Messasge in group 'foo': Hello, World!
2019-01-11 13:51:56.944 INFO 40885 --- [ntainer#1-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-12, groupId=bar] Successfully joined group with generation 9
2019-01-11 13:51:56.944 INFO 40885 --- [ntainer#1-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-12, groupId=bar] Setting newly assigned partitions [test-0]
2019-01-11 13:51:56.946 INFO 40885 --- [ntainer#1-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [test-0]
Received Messasge in group 'bar': Hello, World!
Received Message: Hello To Partioned Topic! from partition: 0
Received Message: Hello To Partioned Topic! from partition: 3
Received Messasge: Hello Baeldung! from partition: 0
Received Messasge: Hello World! from partition: 0
Received Messasge in group 'bar': Hello Baeldung!
Received Messasge in group 'bar': Hello World!
Received Messasge in group 'foo': Hello Baeldung!
Received Messasge in group 'foo': Hello World!
Recieved Message in filtered listener: Hello Baeldung!
2019-01-11 13:52:06.968 INFO 40885 --- [ main] o.a.k.clients.producer.ProducerConfig : ProducerConfig values:
acks = 1
batch.size = 16384
bootstrap.servers = [127.0.0.1:9092]
buffer.memory = 33554432
client.id =
compression.type = none
connections.max.idle.ms = 540000
enable.idempotence = false
interceptor.classes = []
key.serializer = class org.apache.kafka.common.serialization.StringSerializer
linger.ms = 0
max.block.ms = 60000
max.in.flight.requests.per.connection = 5
max.request.size = 1048576
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retries = 0
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
send.buffer.bytes = 131072
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
transaction.timeout.ms = 60000
transactional.id = null
value.serializer = class org.springframework.kafka.support.serializer.JsonSerializer
2019-01-11 13:52:06.971 INFO 40885 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka version : 2.0.1
2019-01-11 13:52:06.971 INFO 40885 --- [ main] o.a.kafka.common.utils.AppInfoParser : Kafka commitId : fa14705e51bd2ce5
2019-01-11 13:52:06.984 INFO 40885 --- [ad | producer-2] org.apache.kafka.clients.Metadata : Cluster ID: QW5A9DYxTlSZVC2M8pxAsg
2019-01-11 13:52:07.002 INFO 40885 --- [ntainer#3-0-C-1] o.s.s.c.ThreadPoolTaskScheduler : Shutting down ExecutorService
2019-01-11 13:52:07.003 INFO 40885 --- [ntainer#0-0-C-1] o.s.s.c.ThreadPoolTaskScheduler : Shutting down ExecutorService
2019-01-11 13:52:07.003 INFO 40885 --- [ntainer#2-0-C-1] o.s.s.c.ThreadPoolTaskScheduler : Shutting down ExecutorService
2019-01-11 13:52:07.003 INFO 40885 --- [ntainer#1-0-C-1] o.s.s.c.ThreadPoolTaskScheduler : Shutting down ExecutorService
2019-01-11 13:52:07.003 INFO 40885 --- [ntainer#4-0-C-1] o.s.s.c.ThreadPoolTaskScheduler : Shutting down ExecutorService
2019-01-11 13:52:07.005 INFO 40885 --- [ntainer#5-0-C-1] o.s.s.c.ThreadPoolTaskScheduler : Shutting down ExecutorService
2019-01-11 13:52:07.007 INFO 40885 --- [ntainer#3-0-C-1] essageListenerContainer$ListenerConsumer : Consumer stopped
2019-01-11 13:52:07.014 INFO 40885 --- [ntainer#0-0-C-1] essageListenerContainer$ListenerConsumer : Consumer stopped
2019-01-11 13:52:07.014 INFO 40885 --- [ntainer#2-0-C-1] essageListenerContainer$ListenerConsumer : Consumer stopped
2019-01-11 13:52:07.021 INFO 40885 --- [ntainer#1-0-C-1] essageListenerContainer$ListenerConsumer : Consumer stopped
2019-01-11 13:52:07.021 INFO 40885 --- [ntainer#4-0-C-1] essageListenerContainer$ListenerConsumer : Consumer stopped
2019-01-11 13:52:07.021 INFO 40885 --- [ntainer#5-0-C-1] essageListenerContainer$ListenerConsumer : Consumer stopped
2019-01-11 13:52:07.022 INFO 40885 --- [ main] o.s.s.concurrent.ThreadPoolTaskExecutor : Shutting down ExecutorService 'applicationTaskExecutor'
2019-01-11 13:52:07.022 INFO 40885 --- [ main] o.a.k.clients.producer.KafkaProducer : [Producer clientId=producer-2] Closing the Kafka producer with timeoutMillis = 30000 ms.
2019-01-11 13:52:07.024 INFO 40885 --- [ main] o.a.k.clients.producer.KafkaProducer : [Producer clientId=producer-1] Closing the Kafka producer with timeoutMillis = 30000 ms.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.catalina.loader.WebappClassLoaderBase (file:/Users/e602684/.m2/repository/org/apache/tomcat/embed/tomcat-embed-core/9.0.13/tomcat-embed-core-9.0.13.jar) to field java.io.ObjectStreamClass$Caches.localDescs
WARNING: Please consider reporting this to the maintainers of org.apache.catalina.loader.WebappClassLoaderBase
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Disconnected from the target VM, address: '127.0.0.1:57175', transport: 'socket'
Process finished with exit code 0
What should I change in the main app file, to keep this app from exiting?

Because I have the context assignment as a part of my run statement:
ConfigurableApplicationContext context =
SpringApplication.run(SpringBootStarterApplication.class, args);
All I had to do, to keep the app from shutting down was to comment out context.close():
//context.close();

reconsume Kafka 0.10.1 topic in Spring

In Spring-Kafka I want to reconsume a Kafka topic from the beginning. Doing this by changing the group.id to something unknown to Kafka of course works:
#KafkaListener(topics = "sensordata.t")
public void receiveMessage(String message) {
...
}
#Bean
public Map consumerConfigs() {
Map props = new HashMap<>();
props.put(ConsumerConfig.GROUP_ID_CONFIG, "NewGroupID");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false"); //it still commits though...
return props;
}
However, starting over by setting the offset to 0 fails.
#KafkaListener(topicPartitions =
{ #TopicPartition(topic = "sensordata.t",
partitionOffsets = #PartitionOffset(partition = "0", initialOffset = "0"))})
public void receiveMessage(String message) {
...
}
#Bean
public Map consumerConfigs() {
Map props = new HashMap<>();
...
props.put(ConsumerConfig.GROUP_ID_CONFIG, "NewGroupID");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "10000"); //making timeout window larger seems to have no influence
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "1"); //setting max records to 1 makes no difference
return props;
}
The error I get:
2016-11-14 14:07:59.018 INFO 8165 --- [ main] c.i.t.s.server.SpringKafkaApplication : Started SpringKafkaApplication in 4.134 seconds (JVM running for 4.745)
2016-11-14 14:07:59.125 INFO 8165 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : Discovered coordinator bto:9092 (id: 2147483647 rack: null) for group spring8.
2016-11-14 14:07:59.125 INFO 8165 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : Discovered coordinator bto:9092 (id: 2147483647 rack: null) for group spring8.
2016-11-14 14:07:59.129 INFO 8165 --- [afka-consumer-1] o.a.k.c.c.internals.ConsumerCoordinator : Revoking previously assigned partitions [] for group spring8
2016-11-14 14:07:59.129 INFO 8165 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[]
2016-11-14 14:07:59.129 INFO 8165 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : (Re-)joining group spring8
2016-11-14 14:07:59.338 ERROR 8165 --- [afka-consumer-1] essageListenerContainer$ListenerConsumer : Container exception
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:600) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:541) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:426) ~[kafka-clients-0.10.0.1.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1059) ~[kafka-clients-0.10.0.1.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitIfNecessary(KafkaMessageListenerContainer.java:939) ~[spring-kafka-1.1.1.RELEASE.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.processCommits(KafkaMessageListenerContainer.java:816) ~[spring-kafka-1.1.1.RELEASE.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:526) ~[spring-kafka-1.1.1.RELEASE.jar:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_92]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_92]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_92]
Anyone familiar with this?
I'm using Kafka 0.10.1.0 and
<properties>
<java.version>1.8</java.version>
<spring-kafka.version>1.1.1.RELEASE</spring-kafka.version>
</properties>

Why have you decided that it is the problem of offset 0?
Your StackTrace says that you have longer pollTimeout is longer than session.timeout.ms:
Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured session.timeout.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
How about to adjust them properly?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Enabling exactly once causes streams shutdown due to timeout while initializing transactional state - apache-kafka-streams

Related

How to make different instances of consumers in the same consumer group consume different shards of the same kinesis stream?

How to implement Retry logic in RabbitMQ?

KAFKA : splitting and retrying (1 attempts left). Error: MESSAGE_TOO_LARGE

How to properly structure a Spring Boot / Kafka app, to prevent the app from shutting down?

reconsume Kafka 0.10.1 topic in Spring

Categories

Resources