Given the setup https://gist.github.com/gel-hidden/0a8627cf93f5396d6b73c2a6e71aad3e, I would expect when I send a message that the ServiceActivator would be called with a delay of 10 000 between messages.
The first channel takes in a list, then split the messages and then call another QueueChannel. But for some reason each pull polls all the split messages. I know I am missing something stupid, or I'm just too stupid to understand whats happening.
Related test case: https://gist.github.com/gel-hidden/de7975fffd0853ec8ce49f9d6fa6531d
Output:
2022-10-26 15:22:02.708 INFO 78647 --- [ scheduling-1] com.example.demo.DemoApplicationTests : Received message Hello
2022-10-26 15:22:02.708 INFO 78647 --- [ scheduling-1] com.example.demo.UpdateLocationFlow : Doing some work for model with id 2
2022-10-26 15:22:03.009 INFO 78647 --- [ scheduling-1] com.example.demo.UpdateLocationFlow : Completed some work for model with id 2
2022-10-26 15:22:03.017 INFO 78647 --- [ scheduling-1] com.example.demo.DemoApplicationTests : Received message World
2022-10-26 15:22:03.018 INFO 78647 --- [ scheduling-1] com.example.demo.UpdateLocationFlow : Doing some work for model with id 3
2022-10-26 15:22:03.319 INFO 78647 --- [ scheduling-1] com.example.demo.UpdateLocationFlow : Completed some work for model with id 3
2022-10-26 15:22:04.322 INFO 78647 --- [ scheduling-1] o.s.i.a.AggregatingMessageHandler : Expiring MessageGroup with correlationKey[1]
My thoughts is that the messages should be something like:
00:01 Doing some work for model with id 2
00:02 Completed some work for model with id 2
00:12 Doing some work for model with id 3
00:13 Completed some work for model it id 3
So, it is a bug in the Spring Integration around lifecycle management for the IntegrationFlowAdapter management. It just starts twice.
As a workaround I suggest to pull your #ServiceActivator handle() into an individual component with its own #Poller configuration and an inputChannel and outputChannel. In other words int must go outside of your UpdateLocationFlow. This way the IntegrationFlowAdapter won't have a control for its lifecycle and won't start it twice.
Meanwhile I'm looking how to fix it.
Thank you for reporting this!
Related
I have a spring-boot application where using webflux and r2dbc-postgres. I have discovered a strange issue when trying to do some db operations in a flatMap().
Code example:
#Transactional
public Mono<Void> insertDummyFooBars() {
return Flux.fromIterable(IntStream.rangeClosed(1, 260).boxed().collect(Collectors.toList()))
.log()
.flatMap(i -> this.repository.save(FooBar.builder().foo("test-" + i).build()))
.log()
.concatMap(i -> this.repository.findAll())
.then();
}
It seems like flatMap can process max 256 elements in batches. (Queues.SMALL_BUFFER_SIZE default value is 256). So when I tried to run the code above (with 260 elements) I've got an exception - TransientDataAccessResourceException and the following message:
Cannot exchange messages because the request queue limit is exceeded; nested exception is io.r2dbc.postgresql.client.ReactorNettyClient$RequestQueueException
There is no Releasing R2DBC Connection after this exception. The pgdb connection/session remains in idle in transaction state and the app is not able to run properly when pool max size is reached and all of the connections are in idle in transaction state. I think the connection should be released even if an exception happened or not.
If I use concatMap instead of flatMap it works as expected - no exception, connection released! It's also ok with flatMap when the elements are less than or equal to 256.
Is it possible to force pgdb connection closure? What should I do If I have lot of db operations in flatMap like this? Should I replace all of them with concatMap? Is there a global solution for this?
Versions:
Postgres: 12.6, Spring-boot: 2.7.6
Demo project
LOG:
2022-12-08 16:32:13.092 INFO 17932 --- [actor-tcp-nio-1] reactor.Flux.Iterable.1 : | onNext(256)
2022-12-08 16:32:13.092 DEBUG 17932 --- [actor-tcp-nio-1] o.s.r2dbc.core.DefaultDatabaseClient : Executing SQL statement [INSERT INTO foo_bar (foo) VALUES ($1)]
2022-12-08 16:32:13.114 INFO 17932 --- [actor-tcp-nio-1] reactor.Flux.FlatMap.2 : onNext(FooBar(id=258, foo=test-1))
2022-12-08 16:32:13.143 DEBUG 17932 --- [actor-tcp-nio-1] o.s.r2dbc.core.DefaultDatabaseClient : Executing SQL statement [SELECT foo_bar.* FROM foo_bar]
2022-12-08 16:32:13.143 INFO 17932 --- [actor-tcp-nio-1] reactor.Flux.Iterable.1 : | request(1)
2022-12-08 16:32:13.143 INFO 17932 --- [actor-tcp-nio-1] reactor.Flux.Iterable.1 : | onNext(257)
2022-12-08 16:32:13.144 DEBUG 17932 --- [actor-tcp-nio-1] o.s.r2dbc.core.DefaultDatabaseClient : Executing SQL statement [INSERT INTO foo_bar (foo) VALUES ($1)]
2022-12-08 16:32:13.149 INFO 17932 --- [actor-tcp-nio-1] reactor.Flux.Iterable.1 : | onComplete()
2022-12-08 16:32:13.149 INFO 17932 --- [actor-tcp-nio-1] reactor.Flux.Iterable.1 : | cancel()
2022-12-08 16:32:13.160 ERROR 17932 --- [actor-tcp-nio-1] reactor.Flux.FlatMap.2 : onError(org.springframework.dao.TransientDataAccessResourceException: executeMany; SQL [INSERT INTO foo_bar (foo) VALUES ($1)]; Cannot exchange messages because the request queue limit is exceeded; nested exception is io.r2dbc.postgresql.client.ReactorNettyClient$RequestQueueException: [08006] Cannot exchange messages because the request queue limit is exceeded)
2022-12-08 16:32:13.167 ERROR 17932 --- [actor-tcp-nio-1] reactor.Flux.FlatMap.2 :
org.springframework.dao.TransientDataAccessResourceException: executeMany; SQL [INSERT INTO foo_bar (foo) VALUES ($1)]; Cannot exchange messages because the request queue limit is exceeded; nested exception is io.r2dbc.postgresql.client.ReactorNettyClient$RequestQueueException: [08006] Cannot exchange messages because the request queue limit is exceeded
at org.springframework.r2dbc.connection.ConnectionFactoryUtils.convertR2dbcException(ConnectionFactoryUtils.java:215) ~[spring-r2dbc-5.3.24.jar:5.3.24]
at org.springframework.r2dbc.core.DefaultDatabaseClient.lambda$inConnectionMany$8(DefaultDatabaseClient.java:147) ~[spring-r2dbc-5.3.24.jar:5.3.24]
at reactor.core.publisher.Flux.lambda$onErrorMap$29(Flux.java:7105) ~[reactor-core-3.4.25.jar:3.4.25]
at reactor.core.publisher.Flux.lambda$onErrorResume$30(Flux.java:7158) ~[reactor-core-3.4.25.jar:3.4.25]
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:94) ~[reactor-core-3.4.25.jar:3.4.25]
I have tried to change the Queues.SMALL_BUFFER_SIZE, and also tried to add a concurrency value to the flatmap. It works when I reduced the value to 255 but I think it is not a good solution.
I hope someone of you can help me.
I'm using spring boot 2.3.4 with spring kafka 2.5.6. I recently had to reset an offset and saw some strange behavior. We consumed the messages, but after every X (variating) messages we had a timeout of 10 seconds before the consumption continued.
This is my configuration:
spring:
kafka:
bootstrap-servers: localhost:9092
consumer:
enable-auto-commit: false
auto-offset-reset: earliest
heartbeat-interval: 1000
max-poll-records: 50
group-id: kafka-fetch-demo
fetch-max-wait: 10000
listener:
type: single
concurrency: 1
poll-timeout: 1000
no-poll-threshold: 2
monitor-interval: 10
ack-mode: manual
producer:
acks: all
batch-size: 0
retries: 0
This is an examle listener code:
#KafkaListener(id = LISTENER_ID, idIsGroup = false, topicPattern = "#{demoProperties.getTopicPattern()}")
public void onEvent(Acknowledgment acknowledgment, ConsumerRecord<byte[], String> record) {
log.info("Received record on topic {}, partition {} and offset {}",
record.topic(),
record.partition(),
record.offset());
acknowledgment.acknowledge();
}
Analysis
I figured out that the 10 second timeout came from the fetch.max.wait.ms property. However I'm not able to figure out why this property applies.
As far as I understand the fetch-max-wait property only determines the maximum time the broker waits before providing the consumer with new records even if the fetch.min.bytes is not exceeded. (Which in my case is set to the default 1 and should always be fullfilled)
Furthermore I analyzed that this problem only applies when using topic patterns and "larger" messages.
Reproduction
I uploaded an demo application on Github to reproduce the issue: https://github.com/kraennix/kafka-fetch-demo.
How I did reproduce it:
I put a thousand messages with 17,1 KB per message on a kafka topic.
I start my consuming application that listens per topic pattern to this topic. Then you can see this stopping behaviour.
Note: If I do the same with "small" messages (89 Bytes) it works as expected.
Logs
In the logs you can see the successful commit, but then the it says Skipping fetch
2021-01-16 15:04:40.773 DEBUG 19244 --- [_LISTENER-0-C-1] essageListenerContainer$ListenerConsumer : Commit list: {publish.LargeTopic.2.test-0=OffsetAndMetadata{offset=488, leaderEpoch=null, metadata=''}}
2021-01-16 15:04:40.773 DEBUG 19244 --- [_LISTENER-0-C-1] essageListenerContainer$ListenerConsumer : Committing: {publish.LargeTopic.2.test-0=OffsetAndMetadata{offset=488, leaderEpoch=null, metadata=''}}
2021-01-16 15:04:40.773 TRACE 19244 --- [_LISTENER-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Sending OffsetCommit request with {publish.LargeTopic.2.test-0=OffsetAndMetadata{offset=488, leaderEpoch=null, metadata=''}} to coordinator localhost:9092 (id: 2147483647 rack: null)
2021-01-16 15:04:40.773 DEBUG 19244 --- [_LISTENER-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Using older server API v7 to send OFFSET_COMMIT {group_id=kafka-fetch-demo,generation_id=4,member_id=consumer-kafka-fetch-demo-1-cf8e747f-531d-457a-aca8-18960c518ef9,group_instance_id=null,topics=[{name=publish.LargeTopic.2.test,partitions=[{partition_index=0,committed_offset=488,committed_leader_epoch=-1,committed_metadata=}]}]} with correlation id 62 to node 2147483647
2021-01-16 15:04:40.778 TRACE 19244 --- [_LISTENER-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Completed receive from node 2147483647 for OFFSET_COMMIT with correlation id 62, received {throttle_time_ms=0,topics=[{name=publish.LargeTopic.2.test,partitions=[{partition_index=0,error_code=0}]}]}
2021-01-16 15:04:40.779 DEBUG 19244 --- [_LISTENER-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Committed offset 488 for partition publish.LargeTopic.2.test-0
2021-01-16 15:04:40.779 TRACE 19244 --- [_LISTENER-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Skipping fetch for partition publish.LargeTopic.1.test-0 because previous request to localhost:9092 (id: 0 rack: null) has not been processed
2021-01-16 15:04:40.779 TRACE 19244 --- [_LISTENER-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Skipping fetch for partition publish.LargeTopic.2.test-0 because previous request to localhost:9092 (id: 0 rack: null) has not been processed
2021-01-16 15:04:40.779 TRACE 19244 --- [_LISTENER-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Skipping fetch for partition publish.LargeTopic.1.test-0 because previous request to localhost:9092 (id: 0 rack: null) has not been processed
2021-01-16 15:04:40.779 TRACE 19244 --- [_LISTENER-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Skipping fetch for partition publish.LargeTopic.2.test-0 because previous request to localhost:9092 (id: 0 rack: null) has not been processed
When there is a change in the Size of the message, you might need to change below 2 Props
heartbeat-interval: 1000
max-poll-records: 50
Your heart beat interval is 1sec and Max poll wait is 10secs. If the size of the message is high and you are processing the consumed messages in the same thread, then Heartbeat check will fail by the time the next Pull triggered. Make sure to process messages by an Executor using Callable.
Increase the Heart Beat Interval to 5 to 10 secs and Reduce Max Poll records to 15 when the messages size is high. Hope, this can help
I have a distributed process which runs across two different servers (A and B) and I get two different log files A.log and B.log, I need this merged into a single
file.
I have referred to following links but I am unable to get a merged file from the same:
https://github.com/spring-cloud/spring-cloud-sleuth
https://howtodoinjava.com/spring-cloud/spring-cloud-zipkin-sleuth-tutorial/
Is there something that I am missing?
Edit: I need the logs in something along these lines
service1.log:2016-02-26 11:15:47.561 INFO [service1,2485ec27856c56f4,2485ec27856c56f4,true] 68058 --- [nio-8081-exec-1] i.s.c.sleuth.docs.service1.Application : Hello from service1. Calling service2
service2.log:2016-02-26 11:15:47.710 INFO [service2,2485ec27856c56f4,9aa10ee6fbde75fa,true] 68059 --- [nio-8082-exec-1] i.s.c.sleuth.docs.service2.Application : Hello from service2. Calling service3 and then service4
service3.log:2016-02-26 11:15:47.895 INFO [service3,2485ec27856c56f4,1210be13194bfe5,true] 68060 --- [nio-8083-exec-1] i.s.c.sleuth.docs.service3.Application : Hello from service3
service2.log:2016-02-26 11:15:47.924 INFO [service2,2485ec27856c56f4,9aa10ee6fbde75fa,true] 68059 --- [nio-8082-exec-1] i.s.c.sleuth.docs.service2.Application : Got response from service3 [Hello from service3]
service4.log:2016-02-26 11:15:48.134 INFO [service4,2485ec27856c56f4,1b1845262ffba49d,true] 68061 --- [nio-8084-exec-1] i.s.c.sleuth.docs.service4.Application : Hello from service4
service2.log:2016-02-26 11:15:48.156 INFO [service2,2485ec27856c56f4,9aa10ee6fbde75fa,true] 68059 --- [nio-8082-exec-1] i.s.c.sleuth.docs.service2.Application : Got response from service4 [Hello from service4]
service1.log:2016-02-26 11:15:48.182 INFO [service1,2485ec27856c56f4,2485ec27856c56f4,true] 68058 --- [nio-8081-exec-1] i.s.c.sleuth.docs.service1.Application : Got response from service2 [Hello from service2, response from service3 [Hello from service3] and from service4 [Hello from service4]]
We are observing a strange behavior with our Servicetest and embedded Kafka.
The Test is a Spock Test, we use the JUnit Rule KafkaEmbedded and propagate brokersAsString as follows:
#ClassRule
#Shared
KafkaEmbedded embeddedKafka = new KafkaEmbedded(1)
#Autowired
KafkaListenerEndpointRegistry endpointRegistry
def setupSpec() {
System.setProperty("kafka.bootstrapServers", embeddedKafka.getBrokersAsString())
}
From inspecting the Code of KafkaEmbedded, constructing an Instance with KafkaEmbedded(int count) leads to one Kafka Server with two partitions per topic.
In order to tackle issues with partition assignment and server-client synchronization in the test, we follow the strategy as seen in ContainerTestUtils class from spring-kafka.
public static void waitForAssignment(KafkaMessageListenerContainer<String, String> container, int partitions)
throws Exception {
log.info(
"Waiting for " + container.getContainerProperties().getTopics() + " to connect to " + partitions + " " +
"partitions.")
int n = 0;
int count = 0;
while (n++ < 600 && count < partitions) {
count = 0;
container.getAssignedPartitions().each {
TopicPartition it ->
log.info(it.topic() + ":" + it.partition() + "; ")
}
if (container.getAssignedPartitions() != null) {
count = container.getAssignedPartitions().size();
}
if (count < partitions) {
Thread.sleep(100);
}
}
}
When we observe the logs we notice the following pattern:
2016-07-29 11:24:02.600 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 1 : {deliveryZipCode_v1=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.600 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 1 : {staggering=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.600 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 1 : {moa=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.696 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 3 : {staggering=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.699 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 3 : {moa=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.699 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 3 : {deliveryZipCode_v1=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.807 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 5 : {deliveryZipCode_v1=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.811 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 5 : {staggering=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:02.812 WARN 1160 --- [afka-consumer-1] org.apache.kafka.clients.NetworkClient : Error while fetching metadata with correlation id 5 : {moa=LEADER_NOT_AVAILABLE}
2016-07-29 11:24:03.544 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[]
2016-07-29 11:24:03.544 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[]
2016-07-29 11:24:03.544 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[]
2016-07-29 11:24:03.602 INFO 1160 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : SyncGroup for group timeslot-service-group-06x failed due to coordinator rebalance, rejoining the group
2016-07-29 11:24:03.637 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[]
2016-07-29 11:24:03.637 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[]
2016-07-29 11:24:04.065 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[staggering-0]
2016-07-29 11:24:04.066 INFO 1160 --- [ main] s.b.c.e.t.TomcatEmbeddedServletContainer : Tomcat started on port(s): 50810 (http)
2016-07-29 11:24:04.073 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : Started AllocationsDeliveryZonesServiceSpec in 20.616 seconds (JVM running for 25.456)
2016-07-29 11:24:04.237 INFO 1160 --- [ main] org.eclipse.jetty.server.Server : jetty-9.2.17.v20160517
2016-07-29 11:24:04.265 INFO 1160 --- [ main] o.e.jetty.server.handler.ContextHandler : Started o.e.j.s.ServletContextHandler#6a8598e7{/__admin,null,AVAILABLE}
2016-07-29 11:24:04.270 INFO 1160 --- [ main] o.e.jetty.server.handler.ContextHandler : Started o.e.j.s.ServletContextHandler#104ea372{/,null,AVAILABLE}
2016-07-29 11:24:04.279 INFO 1160 --- [ main] o.eclipse.jetty.server.ServerConnector : Started ServerConnector#3c9b416a{HTTP/1.1}{0.0.0.0:50811}
2016-07-29 11:24:04.430 INFO 1160 --- [ main] o.eclipse.jetty.server.ServerConnector : Started ServerConnector#7c214597{SSL-http/1.1}{0.0.0.0:50812}
2016-07-29 11:24:04.430 INFO 1160 --- [ main] org.eclipse.jetty.server.Server : Started #25813ms
2016-07-29 11:24:04.632 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : waiting...
2016-07-29 11:24:04.662 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : Waiting for [moa] to connect to 2 partitions.^
2016-07-29 11:24:13.644 INFO 1160 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : Attempt to heart beat failed since the group is rebalancing, try to re-join group.
2016-07-29 11:24:13.644 INFO 1160 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : Attempt to heart beat failed since the group is rebalancing, try to re-join group.
2016-07-29 11:24:13.644 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[]
2016-07-29 11:24:13.644 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[]
2016-07-29 11:24:13.655 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[staggering-0]
2016-07-29 11:24:13.655 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[moa-0]
2016-07-29 11:24:13.655 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[deliveryZipCode_v1-0]
2016-07-29 11:24:13.740 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : moa:0;
[...]
2016-07-29 11:24:16.644 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : moa:0;
2016-07-29 11:24:16.666 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[staggering-0]
2016-07-29 11:24:16.750 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : moa:0;
[...]
2016-07-29 11:24:23.559 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : moa:0;
2016-07-29 11:24:23.660 INFO 1160 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : Attempt to heart beat failed since the group is rebalancing, try to re-join group.
2016-07-29 11:24:23.660 INFO 1160 --- [afka-consumer-1] o.a.k.c.c.internals.AbstractCoordinator : Attempt to heart beat failed since the group is rebalancing, try to re-join group.
2016-07-29 11:24:23.662 INFO 1160 --- [ main] .t.s.AllocationsDeliveryZonesServiceSpec : moa:0;
2016-07-29 11:24:23.686 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[moa-0]
2016-07-29 11:24:23.686 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions revoked:[deliveryZipCode_v1-0]
2016-07-29 11:24:23.695 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[moa-0]
2016-07-29 11:24:23.695 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[staggering-0]
2016-07-29 11:24:23.695 INFO 1160 --- [afka-consumer-1] o.s.k.l.KafkaMessageListenerContainer : partitions assigned:[deliveryZipCode_v1-0]
Please note the [..] indication omitted lines
We set metadata.max.age.ms to 3000 ms
As a result it tries to refresh the metadata information frequently.
What puzzles us now is, that if we wait for two partitions to connect, the wait will time out. Only if we wait for one partition to connect, after a while everything runs successfully.
Did we understand the code wrong, that there are two partitions per topic in the embedded Kafka? Is it normal that only one is assigned to our Listeners?
For testing, it is important to set spring.kafka.consumer.auto-offset-reset=earliest to avoid race condition (sequence or timing of consumer versus producer), see https://docs.spring.io/spring-kafka/reference/html/#junit
Starting with version 2.5, the consumerProps method sets the ConsumerConfig.AUTO_OFFSET_RESET_CONFIG to earliest. This is because, in most cases, you want the consumer to consume any messages sent in a test case. The ConsumerConfig default is latest which means that messages already sent by a test, before the consumer starts, will not receive those records. To revert to the previous behavior, set the property to latest after calling the method.
I can't explain the flakiness you're seeing; yes, each topic gets 2 partitions by default. I just ran one of the framework container tests and see this...
09:24:06.139 INFO [testSlow3-kafka-consumer-1][org.springframework.kafka.listener.KafkaMessageListenerContainer] partitions revoked:[]
09:24:06.611 INFO [testSlow3-kafka-consumer-1][org.springframework.kafka.listener.KafkaMessageListenerContainer] partitions assigned:[testTopic3-1, testTopic3-0]
Does #JmsListener use a poller under the hood, or is it message-driven? When testing with concurrency=1, it seems to read one message per second:
2016-06-23 09:09:46.117 INFO 13044 --- [enerContainer-1] c.s.s.core.service.PolicyChangedHandler : Received: 1: This is a test
2016-06-23 09:09:46.922 INFO 13044 --- [enerContainer-1] c.s.s.core.service.PolicyChangedHandler : Received: 2: This is a test
2016-06-23 09:09:47.730 INFO 13044 --- [enerContainer-1] c.s.s.core.service.PolicyChangedHandler : Received: 3: This is a test
2016-06-23 09:09:48.535 INFO 13044 --- [enerContainer-1] c.s.s.core.service.PolicyChangedHandler : Received: 4: This is a test
2016-06-23 09:09:49.338 INFO 13044 --- [enerContainer-1] c.s.s.core.service.PolicyChangedHandler : Received: 5: This is a test
2016-06-23 09:09:50.155 INFO 13044 --- [enerContainer-1] c.s.s.core.service.PolicyChangedHandler : Received: 6: This is a test
If it is polling, how do I adjust the polling rate or increase the number of messages read per poll?
If it is message-driven, I don't why it is so slow???
Yes Spring JMSListener uses polling under the hood by default.
See DefaultMessageListenerContainer
See also the default receiveTimeout which is 1s.
The receive timeout for each attempt can be configured through the "receiveTimeout" property. setReceiveTimeout
Set the timeout to use for receive calls, in milliseconds. The default is 1000 ms, that is, 1 second.
NOTE: This value needs to be smaller than the transaction timeout used by the transaction manager (in the appropriate unit, of course). 0 indicates no timeout at all; however, this is only feasible if not running within a transaction manager and generally discouraged since such a listener container cannot cleanly shut down. A negative value such as -1 indicates a no-wait receive operation.