I have a spring-boot application where using webflux and r2dbc-postgres. I have discovered a strange issue when trying to do some db operations in a flatMap().
Code example:
#Transactional
public Mono<Void> insertDummyFooBars() {
return Flux.fromIterable(IntStream.rangeClosed(1, 260).boxed().collect(Collectors.toList()))
.log()
.flatMap(i -> this.repository.save(FooBar.builder().foo("test-" + i).build()))
.log()
.concatMap(i -> this.repository.findAll())
.then();
}
It seems like flatMap can process max 256 elements in batches. (Queues.SMALL_BUFFER_SIZE default value is 256). So when I tried to run the code above (with 260 elements) I've got an exception - TransientDataAccessResourceException and the following message:
Cannot exchange messages because the request queue limit is exceeded; nested exception is io.r2dbc.postgresql.client.ReactorNettyClient$RequestQueueException
There is no Releasing R2DBC Connection after this exception. The pgdb connection/session remains in idle in transaction state and the app is not able to run properly when pool max size is reached and all of the connections are in idle in transaction state. I think the connection should be released even if an exception happened or not.
If I use concatMap instead of flatMap it works as expected - no exception, connection released! It's also ok with flatMap when the elements are less than or equal to 256.
Is it possible to force pgdb connection closure? What should I do If I have lot of db operations in flatMap like this? Should I replace all of them with concatMap? Is there a global solution for this?
Versions:
Postgres: 12.6, Spring-boot: 2.7.6
Demo project
LOG:
2022-12-08 16:32:13.092 INFO 17932 --- [actor-tcp-nio-1] reactor.Flux.Iterable.1 : | onNext(256)
2022-12-08 16:32:13.092 DEBUG 17932 --- [actor-tcp-nio-1] o.s.r2dbc.core.DefaultDatabaseClient : Executing SQL statement [INSERT INTO foo_bar (foo) VALUES ($1)]
2022-12-08 16:32:13.114 INFO 17932 --- [actor-tcp-nio-1] reactor.Flux.FlatMap.2 : onNext(FooBar(id=258, foo=test-1))
2022-12-08 16:32:13.143 DEBUG 17932 --- [actor-tcp-nio-1] o.s.r2dbc.core.DefaultDatabaseClient : Executing SQL statement [SELECT foo_bar.* FROM foo_bar]
2022-12-08 16:32:13.143 INFO 17932 --- [actor-tcp-nio-1] reactor.Flux.Iterable.1 : | request(1)
2022-12-08 16:32:13.143 INFO 17932 --- [actor-tcp-nio-1] reactor.Flux.Iterable.1 : | onNext(257)
2022-12-08 16:32:13.144 DEBUG 17932 --- [actor-tcp-nio-1] o.s.r2dbc.core.DefaultDatabaseClient : Executing SQL statement [INSERT INTO foo_bar (foo) VALUES ($1)]
2022-12-08 16:32:13.149 INFO 17932 --- [actor-tcp-nio-1] reactor.Flux.Iterable.1 : | onComplete()
2022-12-08 16:32:13.149 INFO 17932 --- [actor-tcp-nio-1] reactor.Flux.Iterable.1 : | cancel()
2022-12-08 16:32:13.160 ERROR 17932 --- [actor-tcp-nio-1] reactor.Flux.FlatMap.2 : onError(org.springframework.dao.TransientDataAccessResourceException: executeMany; SQL [INSERT INTO foo_bar (foo) VALUES ($1)]; Cannot exchange messages because the request queue limit is exceeded; nested exception is io.r2dbc.postgresql.client.ReactorNettyClient$RequestQueueException: [08006] Cannot exchange messages because the request queue limit is exceeded)
2022-12-08 16:32:13.167 ERROR 17932 --- [actor-tcp-nio-1] reactor.Flux.FlatMap.2 :
org.springframework.dao.TransientDataAccessResourceException: executeMany; SQL [INSERT INTO foo_bar (foo) VALUES ($1)]; Cannot exchange messages because the request queue limit is exceeded; nested exception is io.r2dbc.postgresql.client.ReactorNettyClient$RequestQueueException: [08006] Cannot exchange messages because the request queue limit is exceeded
at org.springframework.r2dbc.connection.ConnectionFactoryUtils.convertR2dbcException(ConnectionFactoryUtils.java:215) ~[spring-r2dbc-5.3.24.jar:5.3.24]
at org.springframework.r2dbc.core.DefaultDatabaseClient.lambda$inConnectionMany$8(DefaultDatabaseClient.java:147) ~[spring-r2dbc-5.3.24.jar:5.3.24]
at reactor.core.publisher.Flux.lambda$onErrorMap$29(Flux.java:7105) ~[reactor-core-3.4.25.jar:3.4.25]
at reactor.core.publisher.Flux.lambda$onErrorResume$30(Flux.java:7158) ~[reactor-core-3.4.25.jar:3.4.25]
at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:94) ~[reactor-core-3.4.25.jar:3.4.25]
I have tried to change the Queues.SMALL_BUFFER_SIZE, and also tried to add a concurrency value to the flatmap. It works when I reduced the value to 255 but I think it is not a good solution.
Related
Given the setup https://gist.github.com/gel-hidden/0a8627cf93f5396d6b73c2a6e71aad3e, I would expect when I send a message that the ServiceActivator would be called with a delay of 10 000 between messages.
The first channel takes in a list, then split the messages and then call another QueueChannel. But for some reason each pull polls all the split messages. I know I am missing something stupid, or I'm just too stupid to understand whats happening.
Related test case: https://gist.github.com/gel-hidden/de7975fffd0853ec8ce49f9d6fa6531d
Output:
2022-10-26 15:22:02.708 INFO 78647 --- [ scheduling-1] com.example.demo.DemoApplicationTests : Received message Hello
2022-10-26 15:22:02.708 INFO 78647 --- [ scheduling-1] com.example.demo.UpdateLocationFlow : Doing some work for model with id 2
2022-10-26 15:22:03.009 INFO 78647 --- [ scheduling-1] com.example.demo.UpdateLocationFlow : Completed some work for model with id 2
2022-10-26 15:22:03.017 INFO 78647 --- [ scheduling-1] com.example.demo.DemoApplicationTests : Received message World
2022-10-26 15:22:03.018 INFO 78647 --- [ scheduling-1] com.example.demo.UpdateLocationFlow : Doing some work for model with id 3
2022-10-26 15:22:03.319 INFO 78647 --- [ scheduling-1] com.example.demo.UpdateLocationFlow : Completed some work for model with id 3
2022-10-26 15:22:04.322 INFO 78647 --- [ scheduling-1] o.s.i.a.AggregatingMessageHandler : Expiring MessageGroup with correlationKey[1]
My thoughts is that the messages should be something like:
00:01 Doing some work for model with id 2
00:02 Completed some work for model with id 2
00:12 Doing some work for model with id 3
00:13 Completed some work for model it id 3
So, it is a bug in the Spring Integration around lifecycle management for the IntegrationFlowAdapter management. It just starts twice.
As a workaround I suggest to pull your #ServiceActivator handle() into an individual component with its own #Poller configuration and an inputChannel and outputChannel. In other words int must go outside of your UpdateLocationFlow. This way the IntegrationFlowAdapter won't have a control for its lifecycle and won't start it twice.
Meanwhile I'm looking how to fix it.
Thank you for reporting this!
I hope someone of you can help me.
I'm using spring boot 2.3.4 with spring kafka 2.5.6. I recently had to reset an offset and saw some strange behavior. We consumed the messages, but after every X (variating) messages we had a timeout of 10 seconds before the consumption continued.
This is my configuration:
spring:
kafka:
bootstrap-servers: localhost:9092
consumer:
enable-auto-commit: false
auto-offset-reset: earliest
heartbeat-interval: 1000
max-poll-records: 50
group-id: kafka-fetch-demo
fetch-max-wait: 10000
listener:
type: single
concurrency: 1
poll-timeout: 1000
no-poll-threshold: 2
monitor-interval: 10
ack-mode: manual
producer:
acks: all
batch-size: 0
retries: 0
This is an examle listener code:
#KafkaListener(id = LISTENER_ID, idIsGroup = false, topicPattern = "#{demoProperties.getTopicPattern()}")
public void onEvent(Acknowledgment acknowledgment, ConsumerRecord<byte[], String> record) {
log.info("Received record on topic {}, partition {} and offset {}",
record.topic(),
record.partition(),
record.offset());
acknowledgment.acknowledge();
}
Analysis
I figured out that the 10 second timeout came from the fetch.max.wait.ms property. However I'm not able to figure out why this property applies.
As far as I understand the fetch-max-wait property only determines the maximum time the broker waits before providing the consumer with new records even if the fetch.min.bytes is not exceeded. (Which in my case is set to the default 1 and should always be fullfilled)
Furthermore I analyzed that this problem only applies when using topic patterns and "larger" messages.
Reproduction
I uploaded an demo application on Github to reproduce the issue: https://github.com/kraennix/kafka-fetch-demo.
How I did reproduce it:
I put a thousand messages with 17,1 KB per message on a kafka topic.
I start my consuming application that listens per topic pattern to this topic. Then you can see this stopping behaviour.
Note: If I do the same with "small" messages (89 Bytes) it works as expected.
Logs
In the logs you can see the successful commit, but then the it says Skipping fetch
2021-01-16 15:04:40.773 DEBUG 19244 --- [_LISTENER-0-C-1] essageListenerContainer$ListenerConsumer : Commit list: {publish.LargeTopic.2.test-0=OffsetAndMetadata{offset=488, leaderEpoch=null, metadata=''}}
2021-01-16 15:04:40.773 DEBUG 19244 --- [_LISTENER-0-C-1] essageListenerContainer$ListenerConsumer : Committing: {publish.LargeTopic.2.test-0=OffsetAndMetadata{offset=488, leaderEpoch=null, metadata=''}}
2021-01-16 15:04:40.773 TRACE 19244 --- [_LISTENER-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Sending OffsetCommit request with {publish.LargeTopic.2.test-0=OffsetAndMetadata{offset=488, leaderEpoch=null, metadata=''}} to coordinator localhost:9092 (id: 2147483647 rack: null)
2021-01-16 15:04:40.773 DEBUG 19244 --- [_LISTENER-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Using older server API v7 to send OFFSET_COMMIT {group_id=kafka-fetch-demo,generation_id=4,member_id=consumer-kafka-fetch-demo-1-cf8e747f-531d-457a-aca8-18960c518ef9,group_instance_id=null,topics=[{name=publish.LargeTopic.2.test,partitions=[{partition_index=0,committed_offset=488,committed_leader_epoch=-1,committed_metadata=}]}]} with correlation id 62 to node 2147483647
2021-01-16 15:04:40.778 TRACE 19244 --- [_LISTENER-0-C-1] org.apache.kafka.clients.NetworkClient : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Completed receive from node 2147483647 for OFFSET_COMMIT with correlation id 62, received {throttle_time_ms=0,topics=[{name=publish.LargeTopic.2.test,partitions=[{partition_index=0,error_code=0}]}]}
2021-01-16 15:04:40.779 DEBUG 19244 --- [_LISTENER-0-C-1] o.a.k.c.c.internals.ConsumerCoordinator : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Committed offset 488 for partition publish.LargeTopic.2.test-0
2021-01-16 15:04:40.779 TRACE 19244 --- [_LISTENER-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Skipping fetch for partition publish.LargeTopic.1.test-0 because previous request to localhost:9092 (id: 0 rack: null) has not been processed
2021-01-16 15:04:40.779 TRACE 19244 --- [_LISTENER-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Skipping fetch for partition publish.LargeTopic.2.test-0 because previous request to localhost:9092 (id: 0 rack: null) has not been processed
2021-01-16 15:04:40.779 TRACE 19244 --- [_LISTENER-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Skipping fetch for partition publish.LargeTopic.1.test-0 because previous request to localhost:9092 (id: 0 rack: null) has not been processed
2021-01-16 15:04:40.779 TRACE 19244 --- [_LISTENER-0-C-1] o.a.k.c.consumer.internals.Fetcher : [Consumer clientId=consumer-kafka-fetch-demo-1, groupId=kafka-fetch-demo] Skipping fetch for partition publish.LargeTopic.2.test-0 because previous request to localhost:9092 (id: 0 rack: null) has not been processed
When there is a change in the Size of the message, you might need to change below 2 Props
heartbeat-interval: 1000
max-poll-records: 50
Your heart beat interval is 1sec and Max poll wait is 10secs. If the size of the message is high and you are processing the consumed messages in the same thread, then Heartbeat check will fail by the time the next Pull triggered. Make sure to process messages by an Executor using Callable.
Increase the Heart Beat Interval to 5 to 10 secs and Reduce Max Poll records to 15 when the messages size is high. Hope, this can help
I have a distributed process which runs across two different servers (A and B) and I get two different log files A.log and B.log, I need this merged into a single
file.
I have referred to following links but I am unable to get a merged file from the same:
https://github.com/spring-cloud/spring-cloud-sleuth
https://howtodoinjava.com/spring-cloud/spring-cloud-zipkin-sleuth-tutorial/
Is there something that I am missing?
Edit: I need the logs in something along these lines
service1.log:2016-02-26 11:15:47.561 INFO [service1,2485ec27856c56f4,2485ec27856c56f4,true] 68058 --- [nio-8081-exec-1] i.s.c.sleuth.docs.service1.Application : Hello from service1. Calling service2
service2.log:2016-02-26 11:15:47.710 INFO [service2,2485ec27856c56f4,9aa10ee6fbde75fa,true] 68059 --- [nio-8082-exec-1] i.s.c.sleuth.docs.service2.Application : Hello from service2. Calling service3 and then service4
service3.log:2016-02-26 11:15:47.895 INFO [service3,2485ec27856c56f4,1210be13194bfe5,true] 68060 --- [nio-8083-exec-1] i.s.c.sleuth.docs.service3.Application : Hello from service3
service2.log:2016-02-26 11:15:47.924 INFO [service2,2485ec27856c56f4,9aa10ee6fbde75fa,true] 68059 --- [nio-8082-exec-1] i.s.c.sleuth.docs.service2.Application : Got response from service3 [Hello from service3]
service4.log:2016-02-26 11:15:48.134 INFO [service4,2485ec27856c56f4,1b1845262ffba49d,true] 68061 --- [nio-8084-exec-1] i.s.c.sleuth.docs.service4.Application : Hello from service4
service2.log:2016-02-26 11:15:48.156 INFO [service2,2485ec27856c56f4,9aa10ee6fbde75fa,true] 68059 --- [nio-8082-exec-1] i.s.c.sleuth.docs.service2.Application : Got response from service4 [Hello from service4]
service1.log:2016-02-26 11:15:48.182 INFO [service1,2485ec27856c56f4,2485ec27856c56f4,true] 68058 --- [nio-8081-exec-1] i.s.c.sleuth.docs.service1.Application : Got response from service2 [Hello from service2, response from service3 [Hello from service3] and from service4 [Hello from service4]]
guys !
I notices that from time to time, my response is generated about approx. 10 sec. which is not normal. This normally happens when i submit a post form, but not always. I'm using Spring Boot 1.5.6 with Thymeleaf 3.0.2.
2017-10-16 09:09:21.737 TRACE 19366 --- [https-jsse-nio-8443-exec-1] o.t.s.e.LinkExpression : [THYMELEAF][https-jsse-nio-8443-exec-1] Evaluating link: "#{'/resources/js/lightbox.js'}"
2017-10-16 09:09:21.737 TRACE 19366 --- [https-jsse-nio-8443-exec-1] o.t.s.e.TextLiteralExpression : [THYMELEAF][https-jsse-nio-8443-exec-1] Evaluating text literal: "'/resources/js/lightbox.js'"
2017-10-16 09:09:21.737 TRACE 19366 --- [https-jsse-nio-8443-exec-1] o.t.TemplateEngine : [THYMELEAF][https-jsse-nio-8443-exec-1] FINISHED PROCESS AND OUTPUT OF TEMPLATE "distributor/createDistributor" WITH LOCALE en_US
2017-10-16 09:09:21.737 TRACE 19366 --- [https-jsse-nio-8443-exec-1] o.t.T.TIMER : [THYMELEAF][https-jsse-nio-8443-exec-1][distributor/createDistributor][en_US][8336337999][8336] TEMPLATE "distributor/createDistributor" WITH LOCALE en_US PROCESSED IN 8336337999 nanoseconds (approx. 8336ms)
2017-10-16 09:09:21.738 TRACE 19366 --- [https-jsse-nio-8443-exec-1] ationConfigEmbeddedWebApplicationContext : Publishing event in org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext#1c6b6478: ServletRequestHandledEvent: url=[/distributor/add]; client=[192........]; method=[GET]; servlet=[dispatcherServlet]; session=[F2E366C8DA7EC3C5BD2026FA388F3437]; user=[admin]; time=[8338ms]; status=[OK]
On line 4 ( TIMER ) is showing that processing time is time=[8338ms]
Can somebody help me how to debug this ?
Does #JmsListener use a poller under the hood, or is it message-driven? When testing with concurrency=1, it seems to read one message per second:
2016-06-23 09:09:46.117 INFO 13044 --- [enerContainer-1] c.s.s.core.service.PolicyChangedHandler : Received: 1: This is a test
2016-06-23 09:09:46.922 INFO 13044 --- [enerContainer-1] c.s.s.core.service.PolicyChangedHandler : Received: 2: This is a test
2016-06-23 09:09:47.730 INFO 13044 --- [enerContainer-1] c.s.s.core.service.PolicyChangedHandler : Received: 3: This is a test
2016-06-23 09:09:48.535 INFO 13044 --- [enerContainer-1] c.s.s.core.service.PolicyChangedHandler : Received: 4: This is a test
2016-06-23 09:09:49.338 INFO 13044 --- [enerContainer-1] c.s.s.core.service.PolicyChangedHandler : Received: 5: This is a test
2016-06-23 09:09:50.155 INFO 13044 --- [enerContainer-1] c.s.s.core.service.PolicyChangedHandler : Received: 6: This is a test
If it is polling, how do I adjust the polling rate or increase the number of messages read per poll?
If it is message-driven, I don't why it is so slow???
Yes Spring JMSListener uses polling under the hood by default.
See DefaultMessageListenerContainer
See also the default receiveTimeout which is 1s.
The receive timeout for each attempt can be configured through the "receiveTimeout" property. setReceiveTimeout
Set the timeout to use for receive calls, in milliseconds. The default is 1000 ms, that is, 1 second.
NOTE: This value needs to be smaller than the transaction timeout used by the transaction manager (in the appropriate unit, of course). 0 indicates no timeout at all; however, this is only feasible if not running within a transaction manager and generally discouraged since such a listener container cannot cleanly shut down. A negative value such as -1 indicates a no-wait receive operation.