Offset out of range resetting offset 2nd consumer group

Offset out of range resetting offset 2nd consumer group - spring-boot

Offset is getting reset when single topic is consumed by two different consumer groups.
I am using Kafka version 0.10.1.0 and Spring-Kafka version 2.2.4 Release.
I produce a message in topic "topic_X" which should be consumed by two different consumer groups "consumerA" and "consumerB" with single consumers each.
Lets say "consumerB" is down I have produced 100 messages on "topic_X" and "consumerA" is already running and consumed all of them. When I bring "consumberB" up again the offset is set to 100 instead of starting from 0.
I have tried by setting auto-offset-reset:earliest and still it is not working. Below are the logs I got from console.
When I start the "consumberB" I dont want to reset the offset how can do that?
2019-05-15 11:28:44.309 INFO 16152 --- [eted-Data-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-4, groupId=consumer_group] (Re-)joining group
2019-05-15 11:28:44.536 INFO 16152 --- [ment-Data-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=consumer_group] Successfully joined group with generation 25
2019-05-15 11:28:44.539 INFO 16152 --- [ment-Data-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=consumer_group] Setting newly assigned partitions [topic_x-0]
2019-05-15 11:28:44.788 INFO 16152 --- [ment-Data-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [topic_x-0]
2019-05-15 11:28:45.906 INFO 16152 --- [ment-Data-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-2, groupId=consumer_group] Fetch offset 5411 is out of range for partition topic_x-0, resetting offset
2019-05-15 11:28:46.187 INFO 16152 --- [ment-Data-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-2, groupId=consumer_group] Resetting offset for partition topic_x-0 to offset 5651.
2019-05-15 11:28:47.864 INFO 16152 --- [ment-Data-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=consumer_group] Attempt to heartbeat failed since group is rebalancing
2019-05-15 11:28:48.142 INFO 16152 --- [ment-Data-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=consumer_group] Revoking previously assigned partitions [topic_x-0]
2019-05-15 11:28:48.142 INFO 16152 --- [ment-Data-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked: [topic_x-0]
2019-05-15 11:28:48.142 INFO 16152 --- [ment-Data-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=consumer_group] (Re-)joining group
2019-05-15 11:28:48.976 INFO 16152 --- [eted-Data-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-4, groupId=consumer_group] Successfully joined group with generation 26
2019-05-15 11:28:48.976 INFO 16152 --- [ment-Data-0-C-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=consumer-2, groupId=consumer_group] Successfully joined group with generation 26
2019-05-15 11:28:48.976 INFO 16152 --- [ment-Data-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-2, groupId=consumer_group] Setting newly assigned partitions [topic_x-0]
2019-05-15 11:28:48.976 INFO 16152 --- [eted-Data-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-4, groupId=consumer_group] Setting newly assigned partitions [topic_y-0]
2019-05-15 11:28:49.246 INFO 16152 --- [ment-Data-0-C-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned: [topic_x-0]
Below is the code I am using to create consumer. Another consumer in group "consumerA" is already running in different application. When I start my application consumer in group "consumerB" the offset is getting reset.
#KafkaListener(id="client-1", topics= "topic_x", groupId = "consumerB")
public void completed(final byte[] bytes) throws IOException {
//Handler code
}

Related

Kafka Consumer stuck in rebalancing state

We are having a Kafka consumer, which all of a sudden(without any activity) went into a rebalancing state and got stuck. This caused the CPU of the k8 pod to shoot and GC time was also nearly 70-80%. The node didn't recover from that state. Upon deleting all the topics, it recovered after almost 4-5 hours.
Kafka version - 2.1.1
No of topics - 520( with 10 partitions each)
Consumer groups - 1
partition assignment strategy - sticky
Attaching some of the info logs here at that time.
2021-10-12 10:41:21
2021-10-12 05:11:21.160 INFO 6 CID: UID: RID: --- [cation_consumer] s.consumer.internals.AbstractCoordinator : [Consumer clientId=consumer-staging_notification_consumer-10, groupId=staging_notification_consumer] Member consumer-staging_notification_consumer-10-8a7eb399-c4e0-4443-b330-bfac42ca89ae sending LeaveGroup request to coordinator kafka5:9092 (id: 2147483642 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
2021-10-12 05:10:27.340 INFO 6 CID:4967 UID: RID:17c72e2d517-4d5e --- [ntainer#9-0-C-1] s.consumer.internals.ConsumerCoordinator : [Consumer clientId=consumer-staging_notification_consumer-9, groupId=staging_notification_consumer] Giving away all assigned partitions as lost since generation has been reset,indicating that consumer is no longer part of the group
2021-10-12 10:40:27
2021-10-12 05:10:27.340 INFO 6 CID:4967 UID: RID:17c72e2d517-4d5e --- [ntainer#9-0-C-1] s.consumer.internals.ConsumerCoordinator : [Consumer clientId=consumer-staging_notification_consumer-9, groupId=staging_notification_consumer] Lost previously assigned partitions staging_notification_topic_app_low_mobi_3119-4, staging_notification_topic_app_low_mobi_2023-4, staging_notification_topic_app_low_mobi_5170-9, staging_notification_topic_app_low_mobi_3540-9, staging_notification_topic_app_low_mobi_5722-9,
2021-10-12 10:40:37
2021-10-12 05:10:37.247 INFO 6 CID: UID: RID: --- [cation_consumer] s.consumer.internals.AbstractCoordinator : [Consumer clientId=consumer-staging_notification_consumer-12, groupId=staging_notification_consumer] Attempt to heartbeat failed since group is rebalancing
2021-10-12 10:40:37
2021-10-12 05:10:37.183 INFO 6 CID: UID: RID: --- [cation_consumer] s.consumer.internals.AbstractCoordinator : [Consumer clientId=consumer-staging_notification_consumer-10, groupId=staging_notification_consumer] Attempt to heartbeat failed since group is rebalancing
2021-10-12 10:40:57
2021-10-12 05:10:57.435 INFO 6 CID: UID: RID: --- [cation_consumer] s.consumer.internals.AbstractCoordinator : [Consumer clientId=consumer-staging_notification_consumer-10, groupId=staging_notification_consumer] Attempt to heartbeat failed since group is rebalancing
2021-10-12 10:40:57
2021-10-12 05:10:57.435 INFO 6 CID: UID: RID: --- [cation_consumer] s.consumer.internals.AbstractCoordinator : [Consumer clientId=consumer-staging_notification_consumer-12, groupId=staging_notification_consumer] Attempt to heartbeat failed since group is rebalancing
2021-10-12 10:41:56
2021-10-12 05:11:56.922 INFO 6 CID:4967 UID: RID:17c72e2579c-22e6 --- [ntainer#9-3-C-1] s.consumer.internals.ConsumerCoordinator : [Consumer clientId=consumer-staging_notification_consumer-12, groupId=staging_notification_consumer] Failing OffsetCommit request since the consumer is not part of an active group
2021-10-12 10:41:56
2021-10-12 05:11:56.923 ERROR 6 CID:4967 UID: RID:17c72e2579c-22e6 --- [ntainer#9-3-C-1] essageListenerContainer$ListenerConsumer : Consumer exception
2021-10-12 10:41:56
java.lang.IllegalStateException: This error handler cannot process 'org.apache.kafka.clients.consumer.CommitFailedException's; no record information is available
2021-10-12 10:41:56
at org.springframework.kafka.listener.SeekUtils.seekOrRecover(SeekUtils.java:151)
2021-10-12 10:41:56
at org.springframework.kafka.listener.SeekToCurrentErrorHandler.handle(SeekToCurrentErrorHandler.java:113)
2021-10-12 10:41:56
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.handleConsumerException(KafkaMessageListenerContainer.java:1427)
2021-10-12 10:41:56
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1124)
2021-10-12 10:41:56
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
2021-10-12 10:41:56
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
2021-10-12 10:41:56
at java.lang.Thread.run(Thread.java:748)
2021-10-12 10:41:56
Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Offset commit cannot be completed since the consumer is not part of an active group for auto partition assignment; it is likely that the consumer was kicked out of the group.
2021-10-12 10:41:56
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:1134)
2021-10-12 10:41:56
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:999)
2021-10-12 10:41:56
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1504)
2021-10-12 10:41:56
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doCommitSync(KafkaMessageListenerContainer.java:2396)
2021-10-12 10:41:56
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitSync(KafkaMessageListenerContainer.java:2391)
2021-10-12 10:41:56
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitIfNecessary(KafkaMessageListenerContainer.java:2377)
2021-10-12 10:41:56
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.processCommits(KafkaMessageListenerContainer.java:2191)
2021-10-12 10:41:56
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1149)
2021-10-12 10:41:56
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1075)
2021-10-12 10:41:56
... 3 common frames omitted
2021-10-12 10:41:58
2021-10-12 05:11:58.442 INFO 6 CID:4967 UID: RID:17c72e2579c-22e6 --- [ntainer#9-3-C-1] s.consumer.internals.ConsumerCoordinator : [Consumer clientId=consumer-staging_notification_consumer-12, groupId=staging_notification_consumer] Giving away all assigned partitions as lost since generation has been reset,indicating that consumer is no longer part of the group
021-10-12 10:58:36
2021-10-12 05:28:36.039 INFO 6 CID:4412 UID: RID:17c72ebadab-64d7 --- [ntainer#3-1-C-1] s.consumer.internals.AbstractCoordinator : [Consumer clientId=consumer-staging_notification_consumer-17, groupId=staging_notification_consumer] Join group failed with org.apache.kafka.common.errors.RebalanceInProgressException: The group is rebalancing, so a rejoin is needed.
2021-10-12 10:58:36
2021-10-12 05:28:36.044 INFO 6 CID:4412 UID: RID:17c72ebadab-64d7 --- [ntainer#3-1-C-1] s.consumer.internals.AbstractCoordinator : [Consumer clientId=consumer-staging_notification_consumer-17, groupId=staging_notification_consumer] (Re-)joining group
021-10-12 10:58:36
2021-10-12 05:28:36.039 INFO 6 CID:4412 UID: RID:17c72ebadab-64d7 --- [ntainer#3-1-C-1] s.consumer.internals.AbstractCoordinator : [Consumer clientId=consumer-staging_notification_consumer-17, groupId=staging_notification_consumer] Join group failed with org.apache.kafka.common.errors.RebalanceInProgressException: The group is rebalancing, so a rejoin is needed.
2021-10-12 10:58:36
2021-10-12 05:28:36.044 INFO 6 CID:4412 UID: RID:17c72ebadab-64d7 --- [ntainer#3-1-C-1] s.consumer.internals.AbstractCoordinator : [Consumer clientId=consumer-staging_notification_consumer-17, groupId=staging_notification_consumer] (Re-)joining group
2021-10-12 11:05:36
2021-10-12 05:35:36.955 INFO 6 CID:3435 UID: RID:17c72f29913-886c --- [ntainer#7-3-C-1] s.consumer.internals.AbstractCoordinator : [Consumer clientId=consumer-staging_notification_consumer-5, groupId=staging_notification_consumer] Join group failed with org.apache.kafka.common.errors.DisconnectException
This was resolved when I deleted the topics.
Similar behaviour is observed when I increase the number of threads in concurrent listener factory, the consumer fails to come up with similar logs of constant rebalancing

The first log indicates that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms (five minutes default).
When this happens, the consumer client will actively initiate a LeaveGroup request to the coordinator to trigger rebalance.
You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.
Of course, a better way is to check the reason for the slow processing of the program and optimize it.

Kafka consumer does not fetch new records when using topic pattern and large messages

I hope someone of you can help me.
I'm using spring boot 2.3.4 with spring kafka 2.5.6. I recently had to reset an offset and saw some strange behavior. We consumed the messages, but after every X (variating) messages we had a timeout of 10 seconds before the consumption continued.
This is my configuration:
spring:
kafka:
bootstrap-servers: localhost:9092
consumer:
enable-auto-commit: false
auto-offset-reset: earliest
heartbeat-interval: 1000
max-poll-records: 50
group-id: kafka-fetch-demo
fetch-max-wait: 10000
listener:
type: single
concurrency: 1
poll-timeout: 1000
no-poll-threshold: 2
monitor-interval: 10
ack-mode: manual
producer:
acks: all
batch-size: 0
retries: 0
This is an examle listener code:
#KafkaListener(id = LISTENER_ID, idIsGroup = false, topicPattern = "#{demoProperties.getTopicPattern()}")
public void onEvent(Acknowledgment acknowledgment, ConsumerRecord<byte[], String> record) {
log.info("Received record on topic {}, partition {} and offset {}",
record.topic(),
record.partition(),
record.offset());
acknowledgment.acknowledge();
}
Analysis
I figured out that the 10 second timeout came from the fetch.max.wait.ms property. However I'm not able to figure out why this property applies.
As far as I understand the fetch-max-wait property only determines the maximum time the broker waits before providing the consumer with new records even if the fetch.min.bytes is not exceeded. (Which in my case is set to the default 1 and should always be fullfilled)
Furthermore I analyzed that this problem only applies when using topic patterns and "larger" messages.
Reproduction
I uploaded an demo application on Github to reproduce the issue: https://github.com/kraennix/kafka-fetch-demo.
How I did reproduce it:
I put a thousand messages with 17,1 KB per message on a kafka topic.
I start my consuming application that listens per topic pattern to this topic. Then you can see this stopping behaviour.
Note: If I do the same with "small" messages (89 Bytes) it works as expected.
Logs
In the logs you can see the successful commit, but then the it says Skipping fetch
2021-01-16 15:04:40.773 DEBUG 19244 --- [_LISTENER-0-C-1] essageListenerContainer$ListenerConsumer : Commit list: {publish.LargeTopic.2.test-0=OffsetAndMetadata{offset=488, leaderEpoch=null, metadata=''}}
2021-01-16 15:04:40.773 DEBUG 19244 --- [_LISTENER-0-C-1] essageListenerContainer$ListenerConsumer : Committing: {publish.LargeTopic.2.test-0=OffsetAndMetadata{offset=488, leaderEpoch=null, metadata=''}}
2021-01-16 15:04:40.773 TRACE 19244 --- [_LISTENER-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Sending OffsetCommit request with {publish.LargeTopic.2.test-0=OffsetAndMetadata{offset=488, leaderEpoch=null, metadata=''}} to coordinator localhost:9092 (id: 2147483647 rack: null)
2021-01-16 15:04:40.773 DEBUG 19244 --- [_LISTENER-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Using older server API v7 to send OFFSET_COMMIT {group_id=kafka-fetch-demo,generation_id=4,member_id=consumer-kafka-fetch-demo-1-cf8e747f-531d-457a-aca8-18960c518ef9,group_instance_id=null,topics=[{name=publish.LargeTopic.2.test,partitions=[{partition_index=0,committed_offset=488,committed_leader_epoch=-1,committed_metadata=}]}]} with correlation id 62 to node 2147483647
2021-01-16 15:04:40.778 TRACE 19244 --- [_LISTENER-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Completed receive from node 2147483647 for OFFSET_COMMIT with correlation id 62, received {throttle_time_ms=0,topics=[{name=publish.LargeTopic.2.test,partitions=[{partition_index=0,error_code=0}]}]}
2021-01-16 15:04:40.779 DEBUG 19244 --- [_LISTENER-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Committed offset 488 for partition publish.LargeTopic.2.test-0
2021-01-16 15:04:40.779 TRACE 19244 --- [_LISTENER-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Skipping fetch for partition publish.LargeTopic.1.test-0 because previous request to localhost:9092 (id: 0 rack: null) has not been processed
2021-01-16 15:04:40.779 TRACE 19244 --- [_LISTENER-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Skipping fetch for partition publish.LargeTopic.2.test-0 because previous request to localhost:9092 (id: 0 rack: null) has not been processed
2021-01-16 15:04:40.779 TRACE 19244 --- [_LISTENER-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Skipping fetch for partition publish.LargeTopic.1.test-0 because previous request to localhost:9092 (id: 0 rack: null) has not been processed
2021-01-16 15:04:40.779 TRACE 19244 --- [_LISTENER-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Skipping fetch for partition publish.LargeTopic.2.test-0 because previous request to localhost:9092 (id: 0 rack: null) has not been processed

When there is a change in the Size of the message, you might need to change below 2 Props
heartbeat-interval: 1000
max-poll-records: 50
Your heart beat interval is 1sec and Max poll wait is 10secs. If the size of the message is high and you are processing the consumed messages in the same thread, then Heartbeat check will fail by the time the next Pull triggered. Make sure to process messages by an Executor using Callable.
Increase the Heart Beat Interval to 5 to 10 secs and Reduce Max Poll records to 15 when the messages size is high. Hope, this can help

Why so many connections are used by Spring reactive with Mongo

I got the exception 'MongoWaitQueueFullException' and I realize the number of connections that my application is using. I use the default configuration of Spring boot (2.2.7.RELEASE) with reactive MongoDB (4.2.8). Transactions are used.
Even when running an integration test that basically creates a bit more than 200 elements then groups them (200 groups). 10 connections are used. When this algorithm is executed over a real data-set, this exception is thrown. The default limit of the waiting queue (500) was reached. This does not make the application scalable.
My question is: is there a way to design a reactive application that helps to reduce the number of connections?
This is the output of my test. Basically, it scans all translations of bundle files and them group them per translation key. An element is persisted per translation key.
return Flux
.fromIterable(bundleFile.getFiles())
.map(ScannedBundleFileEntry::getLocale)
.flatMap(locale ->
handler
.scanTranslations(bundleFileEntity.toLocation(), locale, context)
.index()
.map(indexedTranslation ->
createTranslation(
workspaceEntity,
bundleFileEntity,
locale.getId(),
indexedTranslation.getT1(), // index
indexedTranslation.getT2().getKey(), // bundle key
indexedTranslation.getT2().getValue() // translation
)
)
.flatMap(bundleKeyTemporaryRepository::save)
)
.thenMany(groupIntoBundleKeys(bundleFileEntity))
.then(bundleKeyTemporaryRepository.deleteByBundleFile(bundleFileEntity.getId()))
.then(Mono.just(bundleFileEntity));
The grouping function:
private Flux<BundleKeyEntity> groupIntoBundleKeys(BundleFileEntity bundleFile) {
return this
.findBundleKeys(bundleFile)
.groupBy(BundleKeyGroupKey::new)
.flatMap(bundleKeyGroup ->
bundleKeyGroup
.collectList()
.map(bundleKeys -> {
final BundleKeyGroupKey key = bundleKeyGroup.key();
final BundleKeyEntity entity = new BundleKeyEntity(key.getWorkspace(), key.getBundleFile(), key.getKey());
bundleKeys.forEach(entity::mergeInto);
return entity;
})
)
.flatMap(bundleKeyEntityRepository::save);
}
The test output:
560 [main] INFO o.s.b.t.c.SpringBootTestContextBootstrapper - Neither #ContextConfiguration nor #ContextHierarchy found for test class [be.sgerard.i18n.controller.TranslationControllerTest], using SpringBootContextLoader
569 [main] INFO o.s.t.c.s.AbstractContextLoader - Could not detect default resource locations for test class [be.sgerard.i18n.controller.TranslationControllerTest]: no resource found for suffixes {-context.xml, Context.groovy}.
870 [main] INFO o.s.b.t.c.SpringBootTestContextBootstrapper - Loaded default TestExecutionListener class names from location [META-INF/spring.factories]: [org.springframework.boot.test.mock.mockito.MockitoTestExecutionListener, org.springframework.boot.test.mock.mockito.ResetMocksTestExecutionListener, org.springframework.boot.test.autoconfigure.restdocs.RestDocsTestExecutionListener, org.springframework.boot.test.autoconfigure.web.client.MockRestServiceServerResetTestExecutionListener, org.springframework.boot.test.autoconfigure.web.servlet.MockMvcPrintOnlyOnFailureTestExecutionListener, org.springframework.boot.test.autoconfigure.web.servlet.WebDriverTestExecutionListener, org.springframework.test.context.web.ServletTestExecutionListener, org.springframework.test.context.support.DirtiesContextBeforeModesTestExecutionListener, org.springframework.test.context.support.DependencyInjectionTestExecutionListener, org.springframework.test.context.support.DirtiesContextTestExecutionListener, org.springframework.test.context.transaction.TransactionalTestExecutionListener, org.springframework.test.context.jdbc.SqlScriptsTestExecutionListener, org.springframework.test.context.event.EventPublishingTestExecutionListener, org.springframework.security.test.context.support.WithSecurityContextTestExecutionListener, org.springframework.security.test.context.support.ReactorContextTestExecutionListener]
897 [main] INFO o.s.b.t.c.SpringBootTestContextBootstrapper - Using TestExecutionListeners: [org.springframework.test.context.support.DirtiesContextBeforeModesTestExecutionListener#4372b9b6, org.springframework.boot.test.mock.mockito.MockitoTestExecutionListener#232a7d73, org.springframework.boot.test.autoconfigure.SpringBootDependencyInjectionTestExecutionListener#4b41e4dd, org.springframework.test.context.support.DirtiesContextTestExecutionListener#22ffa91a, org.springframework.test.context.transaction.TransactionalTestExecutionListener#74960bfa, org.springframework.test.context.jdbc.SqlScriptsTestExecutionListener#42721fe, org.springframework.test.context.event.EventPublishingTestExecutionListener#40844aab, org.springframework.security.test.context.support.WithSecurityContextTestExecutionListener#1f6c9cd8, org.springframework.security.test.context.support.ReactorContextTestExecutionListener#5b619d14, org.springframework.boot.test.mock.mockito.ResetMocksTestExecutionListener#66746f57, org.springframework.boot.test.autoconfigure.restdocs.RestDocsTestExecutionListener#447a020, org.springframework.boot.test.autoconfigure.web.client.MockRestServiceServerResetTestExecutionListener#7f36662c, org.springframework.boot.test.autoconfigure.web.servlet.MockMvcPrintOnlyOnFailureTestExecutionListener#28e8dde3, org.springframework.boot.test.autoconfigure.web.servlet.WebDriverTestExecutionListener#6d23017e]
1551 [background-preinit] INFO o.h.v.i.x.c.ValidationBootstrapParameters - HV000006: Using org.hibernate.validator.HibernateValidator as validation provider.
1677 [main] INFO b.s.i.c.TranslationControllerTest - Starting TranslationControllerTest on sgerard with PID 538 (started by sgerard in /home/sgerard/sandboxes/github-oauth/server)
1678 [main] INFO b.s.i.c.TranslationControllerTest - The following profiles are active: test
3250 [main] INFO o.s.d.r.c.RepositoryConfigurationDelegate - Bootstrapping Spring Data Reactive MongoDB repositories in DEFAULT mode.
3747 [main] INFO o.s.d.r.c.RepositoryConfigurationDelegate - Finished Spring Data repository scanning in 493ms. Found 9 Reactive MongoDB repository interfaces.
5143 [main] INFO o.s.c.s.PostProcessorRegistrationDelegate$BeanPostProcessorChecker - Bean 'org.springframework.security.config.annotation.method.configuration.ReactiveMethodSecurityConfiguration' of type [org.springframework.security.config.annotation.method.configuration.ReactiveMethodSecurityConfiguration] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
5719 [main] INFO org.mongodb.driver.cluster - Cluster created with settings {hosts=[localhost:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
5996 [cluster-ClusterId{value='5f42490f1c60f43aff9d7d46', description='null'}-localhost:27017] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:1, serverValue:4337}] to localhost:27017
6010 [cluster-ClusterId{value='5f42490f1c60f43aff9d7d46', description='null'}-localhost:27017] INFO org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=localhost:27017, type=REPLICA_SET_PRIMARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 2, 8]}, minWireVersion=0, maxWireVersion=8, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=12207332, setName='rs0', canonicalAddress=4802c4aff450:27017, hosts=[4802c4aff450:27017], passives=[], arbiters=[], primary='4802c4aff450:27017', tagSet=TagSet{[]}, electionId=7fffffff0000000000000013, setVersion=1, lastWriteDate=Sun Aug 23 12:46:30 CEST 2020, lastUpdateTimeNanos=384505436362981}
6019 [main] INFO org.mongodb.driver.cluster - Cluster created with settings {hosts=[localhost:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500}
6040 [cluster-ClusterId{value='5f42490f1c60f43aff9d7d47', description='null'}-localhost:27017] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:2, serverValue:4338}] to localhost:27017
6042 [cluster-ClusterId{value='5f42490f1c60f43aff9d7d47', description='null'}-localhost:27017] INFO org.mongodb.driver.cluster - Monitor thread successfully connected to server with description ServerDescription{address=localhost:27017, type=REPLICA_SET_PRIMARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[4, 2, 8]}, minWireVersion=0, maxWireVersion=8, maxDocumentSize=16777216, logicalSessionTimeoutMinutes=30, roundTripTimeNanos=1727974, setName='rs0', canonicalAddress=4802c4aff450:27017, hosts=[4802c4aff450:27017], passives=[], arbiters=[], primary='4802c4aff450:27017', tagSet=TagSet{[]}, electionId=7fffffff0000000000000013, setVersion=1, lastWriteDate=Sun Aug 23 12:46:30 CEST 2020, lastUpdateTimeNanos=384505468960066}
7102 [nioEventLoopGroup-2-2] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:3, serverValue:4339}] to localhost:27017
11078 [main] INFO o.s.b.a.e.web.EndpointLinksResolver - Exposing 1 endpoint(s) beneath base path ''
11158 [main] INFO o.h.v.i.x.c.ValidationBootstrapParameters - HV000006: Using org.hibernate.validator.HibernateValidator as validation provider.
11720 [main] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:4, serverValue:4340}] to localhost:27017
12084 [main] INFO o.s.s.c.ThreadPoolTaskScheduler - Initializing ExecutorService 'taskScheduler'
12161 [main] INFO b.s.i.c.TranslationControllerTest - Started TranslationControllerTest in 11.157 seconds (JVM running for 13.532)
20381 [nioEventLoopGroup-2-3] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:5, serverValue:4341}] to localhost:27017
20408 [nioEventLoopGroup-2-2] INFO b.s.i.s.w.WorkspaceManagerImpl - Synchronize, there is no workspace for the branch [master], let's create it.
20416 [nioEventLoopGroup-2-3] INFO b.s.i.s.w.WorkspaceManagerImpl - The workspace [master] alias [e3cea374-0d37-4c57-bdbf-8bd14d279c12] has been created.
20421 [nioEventLoopGroup-2-3] INFO b.s.i.s.w.WorkspaceManagerImpl - Initializing workspace [master] alias [e3cea374-0d37-4c57-bdbf-8bd14d279c12].
20525 [nioEventLoopGroup-2-2] INFO b.s.i.s.i18n.TranslationManagerImpl - A bundle file has been found located in [server/src/main/resources/i18n] named [exception] with 2 file(s).
20812 [nioEventLoopGroup-2-4] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:6, serverValue:4342}] to localhost:27017
21167 [nioEventLoopGroup-2-8] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:10, serverValue:4345}] to localhost:27017
21167 [nioEventLoopGroup-2-6] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:8, serverValue:4344}] to localhost:27017
21393 [nioEventLoopGroup-2-5] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:7, serverValue:4343}] to localhost:27017
21398 [nioEventLoopGroup-2-7] INFO org.mongodb.driver.connection - Opened connection [connectionId{localValue:9, serverValue:4346}] to localhost:27017
21442 [nioEventLoopGroup-2-2] INFO b.s.i.s.i18n.TranslationManagerImpl - A bundle file has been found located in [server/src/main/resources/i18n] named [validation] with 2 file(s).
21503 [nioEventLoopGroup-2-2] INFO b.s.i.s.i18n.TranslationManagerImpl - A bundle file has been found located in [server/src/test/resources/be/sgerard/i18n/service/i18n/file] named [file] with 2 file(s).
21621 [nioEventLoopGroup-2-2] INFO b.s.i.s.i18n.TranslationManagerImpl - A bundle file has been found located in [front/src/main/web/src/assets/i18n] named [i18n] with 2 file(s).
22745 [SpringContextShutdownHook] INFO o.s.s.c.ThreadPoolTaskScheduler - Shutting down ExecutorService 'taskScheduler'
22763 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:4, serverValue:4340}] to localhost:27017 because the pool has been closed.
22766 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:9, serverValue:4346}] to localhost:27017 because the pool has been closed.
22767 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:6, serverValue:4342}] to localhost:27017 because the pool has been closed.
22768 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:8, serverValue:4344}] to localhost:27017 because the pool has been closed.
22768 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:5, serverValue:4341}] to localhost:27017 because the pool has been closed.
22769 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:10, serverValue:4345}] to localhost:27017 because the pool has been closed.
22770 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:7, serverValue:4343}] to localhost:27017 because the pool has been closed.
22776 [SpringContextShutdownHook] INFO org.mongodb.driver.connection - Closed connection [connectionId{localValue:3, serverValue:4339}] to localhost:27017 because the pool has been closed.
Process finished with exit code 0

Spring Reactive is asynchronous. Imagine you have 3 items in your dataset. It opens a connection for the save of the first item. But it won't wait for it to finish and use for the second save. Instead, it opens a second connection as soon as possible. Thus you'll end up overloading all the possible connections in the pool.

Spring Kafka Always rebalance after 5 min even i pause consumer

there is a Time-consuming operation (about 10 min) ,but kafka aways rebalance after 5 min,even i pause the consumer. the consumer method :
#KafkaListener(topics = {TopicAppoint.EXECUTE_SCHOOL_DATA_STATICS_TASK})
public void receiveMessage(#Payload String payload, Consumer<String, String> consumer) {
Set<TopicPartition> assignment = consumer.assignment();
consumer.pause(assignment);
if (StringUtils.isNotEmpty(payload)) {
SchoolStatisticsTaskDTO staticsTaskDTO = JSONObject.parseObject(payload, SchoolStatisticsTaskDTO.class);
Optional<SchoolStatisticsTaskDO> taskOptional = schoolStatisticsTaskRepository.findById(staticsTaskDTO.getTrackId());
taskOptional.ifPresent(schoolStaticsTaskDO -> {
// handler
});
}
consumer.resume(assignment);
}
this is my config:
kafka:
bootstrap-servers: 192.168.0.230:9092
producer:
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.apache.kafka.common.serialization.StringSerializer
retries: 3
properties:
max.request.size: 12582912
consumer:
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
group-id: dc-fitness-data-consumer-group
properties:
max.partition.fetch.bytes: 12582912
#enable-auto-commit: false
listener:
ack-mode: record
concurrency: 6
LOGS
13:09:20.219 [org.springframework.kafka.KafkaListenerEndpointContainer#2-0-C-1] INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-25, groupId=dc-fitness-data-consumer-group] Attempt to heartbeat failed since group is rebalancing
13:09:20.219 [org.springframework.kafka.KafkaListenerEndpointContainer#2-0-C-1] INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-25, groupId=dc-fitness-data-consumer-group] Revoking previously assigned partitions [ft.es.records.incremental.update.task-4, ft.es.records.incremental.update.task-5, ft.es.records.incremental.update.task-2, ft.es.records.incremental.update.task-3, ft.es.records.incremental.update.task-0, ft.es.records.incremental.update.task-1]
13:09:20.219 [org.springframework.kafka.KafkaListenerEndpointContainer#2-0-C-1] INFO o.s.k.l.KafkaMessageListenerContainer - partitions revoked: [ft.es.records.incremental.update.task-4, ft.es.records.incremental.update.task-5, ft.es.records.incremental.update.task-2, ft.es.records.incremental.update.task-3, ft.es.records.incremental.update.task-0, ft.es.records.incremental.update.task-1]
13:09:20.219 [org.springframework.kafka.KafkaListenerEndpointContainer#2-0-C-1] INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-25, groupId=dc-fitness-data-consumer-group] (Re-)joining group
13:09:20.220 [org.springframework.kafka.KafkaListenerEndpointContainer#9-2-C-1] INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-21, groupId=dc-fitness-data-consumer-group] Attempt to heartbeat failed since group is rebalancing
13:09:20.221 [org.springframework.kafka.KafkaListenerEndpointContainer#9-2-C-1] INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-21, groupId=dc-fitness-data-consumer-group] Revoking previously assigned partitions [ft.student.batch.upload.task-17, ft.student.batch.upload.task-14, ft.student.batch.upload.task-13, ft.student.batch.upload.task-16, ft.student.batch.upload.task-15, ft.student.batch.upload.task-12]
13:09:20.221 [org.springframework.kafka.KafkaListenerEndpointContainer#9-2-C-1] INFO o.s.k.l.KafkaMessageListenerContainer - partitions revoked: [ft.student.batch.upload.task-17, ft.student.batch.upload.task-14, ft.student.batch.upload.task-13, ft.student.batch.upload.task-16, ft.student.batch.upload.task-15, ft.student.batch.upload.task-12]
13:09:20.221 [org.springframework.kafka.KafkaListenerEndpointContainer#9-2-C-1] INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-21, groupId=dc-fitness-data-consumer-group] (Re-)joining group
13:09:20.221 [org.springframework.kafka.KafkaListenerEndpointContainer#6-3-C-1] INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-4, groupId=dc-fitness-data-consumer-group] Attempt to heartbeat failed since group is rebalancing
13:09:20.221 [org.springframework.kafka.KafkaListenerEndpointContainer#8-5-C-1] INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-18, groupId=dc-fitness-data-consumer-group] Attempt to heartbeat failed since group is rebalancing
13:09:20.221 [org.springframework.kafka.KafkaListenerEndpointContainer#8-5-C-1] INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-18, groupId=dc-fitness-data-consumer-group] Revoking previously assigned partitions [ft.record.batch.upload.task-32, ft.record.batch.upload.task-31, ft.record.batch.upload.task-34, ft.record.batch.upload.task-33, ft.record.batch.upload.task-30, ft.record.batch.upload.task-35]
13:09:20.221 [org.springframework.kafka.KafkaListenerEndpointContainer#8-5-C-1] INFO o.s.k.l.KafkaMessageListenerContainer - partitions revoked: [ft.record.batch.upload.task-32, ft.record.batch.upload.task-31, ft.record.batch.upload.task-34, ft.record.batch.upload.task-33, ft.record.batch.upload.task-30, ft.record.batch.upload.task-35]
13:09:20.221 [org.springframework.kafka.KafkaListenerEndpointContainer#8-5-C-1] INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-18, groupId=dc-fitness-data-consumer-group] (Re-)joining group
13:09:20.221 [org.springframework.kafka.KafkaListenerEndpointContainer#6-3-C-1] INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-4, groupId=dc-fitness-data-consumer-group] Revoking previously assigned partitions [ft.class.batch.upload.task-18, ft.class.batch.upload.task-19, ft.class.batch.upload.task-20, ft.class.batch.upload.task-21, ft.class.batch.upload.task-22, ft.class.batch.upload.task-23]
13:09:20.221 [org.springframework.kafka.KafkaListenerEndpointContainer#6-4-C-1] INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-5, groupId=dc-fitness-data-consumer-group] Attempt to heartbeat failed since group is rebalancing
13:09:20.221 [org.springframework.kafka.KafkaListenerEndpointContainer#0-4-C-1] INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-53, groupId=dc-fitness-data-consumer-group] Attempt to heartbeat failed since group is rebalancing
13:09:20.221 [org.springframework.kafka.KafkaListenerEndpointContainer#0-3-C-1] INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-52, groupId=dc-fitness-data-consumer-group] Attempt to heartbeat failed since group is rebalancing
13:09:20.223 [org.springframework.kafka.KafkaListenerEndpointContainer#6-4-C-1] INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-5, groupId=dc-fitness-data-consumer-group] Revoking previously assigned partitions [ft.class.batch.upload.task-26, ft.class.batch.upload.task-27, ft.class.batch.upload.task-28, ft.class.batch.upload.task-29, ft.class.batch.upload.task-24, ft.class.batch.upload.task-25]
13:09:20.223 [org.springframework.kafka.KafkaListenerEndpointContainer#6-3-C-1] INFO o.s.k.l.KafkaMessageListenerContainer - partitions revoked: [ft.class.batch.upload.task-18, ft.class.batch.upload.task-19, ft.class.batch.upload.task-20, ft.class.batch.upload.task-21, ft.class.batch.upload.task-22, ft.class.batch.upload.task-23]
13:09:20.223 [org.springframework.kafka.KafkaListenerEndpointContainer#0-4-C-1] INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-53, groupId=dc-fitness-data-consumer-group] Revoking previously assigned partitions [ft.es.class.incremental.update.task-26, ft.es.class.incremental.update.task-27, ft.es.class.incremental.update.task-28, ft.es.class.incremental.update.task-29, ft.es.class.incremental.update.task-24, ft.es.class.incremental.update.task-25]
13:09:20.223 [org.springframework.kafka.KafkaListenerEndpointContainer#6-4-C-1] INFO o.s.k.l.KafkaMessageListenerContainer - partitions revoked: [ft.class.batch.upload.task-26, ft.class.batch.upload.task-27, ft.class.batch.upload.task-28, ft.class.batch.upload.task-29, ft.class.batch.upload.task-24, ft.class.batch.upload.task-25]
13:09:20.223 [org.springframework.kafka.KafkaListenerEndpointContainer#0-4-C-1] INFO o.s.k.l.KafkaMessageListenerContainer - partitions revoked: [ft.es.class.incremental.update.task-26, ft.es.class.incremental.update.task-27, ft.es.class.incremental.update.task-28, ft.es.class.incremental.update.task-29, ft.es.class.incremental.update.task-24, ft.es.class.incremental.update.task-25]
13:09:20.223 [org.springframework.kafka.KafkaListenerEndpointContainer#6-4-C-1] INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-5, groupId=dc-fitness-data-consumer-group] (Re-)joining group
13:09:20.223 [org.springframework.kafka.KafkaListenerEndpointContainer#6-3-C-1] INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-4, groupId=dc-fitness-data-consumer-group] (Re-)joining group
13:09:20.223 [org.springframework.kafka.KafkaListenerEndpointContainer#0-4-C-1] INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-53, groupId=dc-fitness-data-consumer-group] (Re-)joining group
13:09:20.223 [org.springframework.kafka.KafkaListenerEndpointContainer#0-3-C-1] INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-52, groupId=dc-fitness-data-consumer-group] Revoking previously assigned partitions [ft.es.class.incremental.update.task-22, ft.es.class.incremental.update.task-23, ft.es.class.incremental.update.task-18, ft.es.class.incremental.update.task-19, ft.es.class.incremental.update.task-20, ft.es.class.incremental.update.task-21]
13:09:20.223 [org.springframework.kafka.KafkaListenerEndpointContainer#0-3-C-1] INFO o.s.k.l.KafkaMessageListenerContainer - partitions revoked: [ft.es.class.incremental.update.task-22, ft.es.class.incremental.update.task-23, ft.es.class.incremental.update.task-18, ft.es.class.incremental.update.task-19, ft.es.class.incremental.update.task-20, ft.es.class.incremental.update.task-21]
13:09:20.223 [org.springframework.kafka.KafkaListenerEndpointContainer#5-4-C-1] INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-47, groupId=dc-fitness-data-consumer-group] Attempt to heartbeat failed since group is rebalancing
13:09:20.225 [org.springframework.kafka.KafkaListenerEndpointContainer#0-3-C-1] INFO o.a.k.c.c.i.AbstractCoordinator - [Consumer clientId=consumer-52, groupId=dc-fitness-data-consumer-group] (Re-)joining group

Lets look at the documentation for a while, begin with pause() method
public void pause(Collection partitions) here
Suspend fetching from the requested partitions. Future calls to poll(long) will not return any records from these partitions until they have been resumed using resume(Collection). Note that this method does not affect partition subscription. In particular, it does not cause a group rebalance when automatic assignment is used.
So from the above description pause() method will suspend the partition from fetching the messages but it will not pause the consumer thread, which means consumer thread will do the subsequent poll() request to paused partitions but will not fetch any records
In your application : partitions are paused but consumer thread is busy in executing Time consuming operations for more that 10 minutes without doing any poll request.
Detecting Consumer Failures : so when poll stops heartbeat will not sent to the cluster
After subscribing to a set of topics, the consumer will automatically join the group when poll(long) is invoked. The poll API is designed to ensure consumer liveness. As long as you continue to call poll, the consumer will stay in the group and continue to receive messages from the partitions it was assigned. Underneath the covers, the poll API sends periodic heartbeats to the server; when you stop calling poll (perhaps because an exception was thrown), then no heartbeats will be sent. If a period of the configured session timeout elapses before the server has received a heartbeat, then the consumer will be kicked out of the group and its partitions will be reassigned.
Solution : increase the timeouts for below properties since default time is 5 minutes you are seeing rebalancing for every 5 minutes (personally will not support increasing timeouts) here
The new Java Consumer now supports heartbeating from a background thread. There is a new configuration max.poll.interval.ms which controls the maximum time between poll invocations before the consumer will proactively leave the group (5 minutes by default). The value of the configuration request.timeout.ms must always be larger than max.poll.interval.ms because this is the maximum time that a JoinGroup request can block on the server while the consumer is rebalancing, so we have changed its default value to just above 5 minutes. Finally, the default value of session.timeout.ms has been adjusted down to 10 seconds, and the default value of max.poll.records has been changed to 500.
Solution 2: if you need to process huge data with less amount of time increase more partition and consume each partition with each thread (which means more concurrency), less data that can be processed in 5 minutes

Spring Kafka, Testing with Embedded Kafka

We are observing a strange behavior with our Servicetest and embedded Kafka.
The Test is a Spock Test, we use the JUnit Rule KafkaEmbedded and propagate brokersAsString as follows:
#ClassRule
#Shared
KafkaEmbedded embeddedKafka = new KafkaEmbedded(1)
#Autowired
KafkaListenerEndpointRegistry endpointRegistry
def setupSpec() {
System.setProperty("kafka.bootstrapServers", embeddedKafka.getBrokersAsString())
}
From inspecting the Code of KafkaEmbedded, constructing an Instance with KafkaEmbedded(int count) leads to one Kafka Server with two partitions per topic.
In order to tackle issues with partition assignment and server-client synchronization in the test, we follow the strategy as seen in ContainerTestUtils class from spring-kafka.
public static void waitForAssignment(KafkaMessageListenerContainer<String, String> container, int partitions)
throws Exception {
log.info(
"Waiting for " + container.getContainerProperties().getTopics() + " to connect to " + partitions + " " +
"partitions.")
int n = 0;
int count = 0;
while (n++ < 600 && count < partitions) {
count = 0;
container.getAssignedPartitions().each {
TopicPartition it ->
log.info(it.topic() + ":" + it.partition() + "; ")
}
if (container.getAssignedPartitions() != null) {
count = container.getAssignedPartitions().size();
}
if (count < partitions) {
Thread.sleep(100);
}
}
}
When we observe the logs we notice the following pattern:
2016-07-29 11:24:02.600 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 1 : {deliveryZipCode_v1=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.600 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 1 : {staggering=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.600 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 1 : {moa=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.696 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 3 : {staggering=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.699 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 3 : {moa=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.699 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 3 : {deliveryZipCode_v1=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.807 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 5 : {deliveryZipCode_v1=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.811 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 5 : {staggering=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.812 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 5 : {moa=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:03.544 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[]
2016-07-29 11:24:03.544 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[]
2016-07-29 11:24:03.544 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[]
2016-07-29 11:24:03.602 INFO 1160 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : SyncGroup for group timeslot-service-group-06x failed due to coordinator rebalance, rejoining the group
2016-07-29 11:24:03.637 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[]
2016-07-29 11:24:03.637 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[]
2016-07-29 11:24:04.065 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[staggering-0]
2016-07-29 11:24:04.066 INFO 1160 --- [ main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat started on port(s): 50810 (http)
2016-07-29 11:24:04.073 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : Started AllocationsDeliveryZonesServiceSpec in 20.616 seconds (JVM running for 25.456)
2016-07-29 11:24:04.237 INFO 1160 --- [ main] org.eclipse.jetty.server.Server : jetty-9.2.17.v20160517
2016-07-29 11:24:04.265 INFO 1160 --- [ main] o.e.jetty.server.handler.ContextHandler : Started o.e.j.s.ServletContextHandler#6a8598e7{/__admin,null,AVAILABLE}
2016-07-29 11:24:04.270 INFO 1160 --- [ main] o.e.jetty.server.handler.ContextHandler : Started o.e.j.s.ServletContextHandler#104ea372{/,null,AVAILABLE}
2016-07-29 11:24:04.279 INFO 1160 --- [ main] o.eclipse.jetty.server.ServerConnector : Started ServerConnector#3c9b416a{HTTP/1.1}{0.0.0.0:50811}
2016-07-29 11:24:04.430 INFO 1160 --- [ main] o.eclipse.jetty.server.ServerConnector : Started ServerConnector#7c214597{SSL-http/1.1}{0.0.0.0:50812}
2016-07-29 11:24:04.430 INFO 1160 --- [ main] org.eclipse.jetty.server.Server : Started #25813ms
2016-07-29 11:24:04.632 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : waiting...
2016-07-29 11:24:04.662 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : Waiting for [moa] to connect to 2 partitions.^
2016-07-29 11:24:13.644 INFO 1160 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : Attempt to heart beat failed since the group is rebalancing, try to re-join group.
2016-07-29 11:24:13.644 INFO 1160 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : Attempt to heart beat failed since the group is rebalancing, try to re-join group.
2016-07-29 11:24:13.644 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[]
2016-07-29 11:24:13.644 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[]
2016-07-29 11:24:13.655 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[staggering-0]
2016-07-29 11:24:13.655 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[moa-0]
2016-07-29 11:24:13.655 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[deliveryZipCode_v1-0]
2016-07-29 11:24:13.740 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : moa:0;
[...]
2016-07-29 11:24:16.644 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : moa:0;
2016-07-29 11:24:16.666 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[staggering-0]
2016-07-29 11:24:16.750 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : moa:0;
[...]
2016-07-29 11:24:23.559 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : moa:0;
2016-07-29 11:24:23.660 INFO 1160 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : Attempt to heart beat failed since the group is rebalancing, try to re-join group.
2016-07-29 11:24:23.660 INFO 1160 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : Attempt to heart beat failed since the group is rebalancing, try to re-join group.
2016-07-29 11:24:23.662 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : moa:0;
2016-07-29 11:24:23.686 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[moa-0]
2016-07-29 11:24:23.686 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[deliveryZipCode_v1-0]
2016-07-29 11:24:23.695 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[moa-0]
2016-07-29 11:24:23.695 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[staggering-0]
2016-07-29 11:24:23.695 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[deliveryZipCode_v1-0]
Please note the [..] indication omitted lines
We set metadata.max.age.ms to 3000 ms
As a result it tries to refresh the metadata information frequently.
What puzzles us now is, that if we wait for two partitions to connect, the wait will time out. Only if we wait for one partition to connect, after a while everything runs successfully.
Did we understand the code wrong, that there are two partitions per topic in the embedded Kafka? Is it normal that only one is assigned to our Listeners?

For testing, it is important to set spring.kafka.consumer.auto-offset-reset=earliest to avoid race condition (sequence or timing of consumer versus producer), see https://docs.spring.io/spring-kafka/reference/html/#junit
Starting with version 2.5, the consumerProps method sets the ConsumerConfig.AUTO_OFFSET_RESET_CONFIG to earliest. This is because, in most cases, you want the consumer to consume any messages sent in a test case. The ConsumerConfig default is latest which means that messages already sent by a test, before the consumer starts, will not receive those records. To revert to the previous behavior, set the property to latest after calling the method.

I can't explain the flakiness you're seeing; yes, each topic gets 2 partitions by default. I just ran one of the framework container tests and see this...
09:24:06.139 INFO [testSlow3-kafka-consumer-1][org.springframework.kafka.listener.KafkaMessageListenerContainer] partitions revoked:[]
09:24:06.611 INFO [testSlow3-kafka-consumer-1][org.springframework.kafka.listener.KafkaMessageListenerContainer] partitions assigned:[testTopic3-1, testTopic3-0]

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Offset out of range resetting offset 2nd consumer group - spring-boot

Related

Kafka Consumer stuck in rebalancing state

Kafka consumer does not fetch new records when using topic pattern and large messages

Why so many connections are used by Spring reactive with Mongo

Spring Kafka Always rebalance after 5 min even i pause consumer

Spring Kafka, Testing with Embedded Kafka

Categories

Resources