Ratelimiting for Flux flatMap() - spring

An application I am working on is leveraging Spring Webflux + Hazelcast IQueue for inbound data processing.
A problem I am facing is that the application would be overloaded during the peak hour. The situation occurred when processing those messages (consumed from Hazelcast IQueue and converted them as Flux) by Flux.flatMap().
My question is that, is there anyway to accurately know how "busy" the flatMap processing happened inside? So that I could write some code to slow down the message consumption from Hazelcast distributed queue.

In your situation the simplest way to solve the problem is to limit concurrency. By default, flatMap will process Queues.SMALL_BUFFER_SIZE = 256 number of in-flight inner sequences.
You could control concurrency flatMap(item -> process(item), concurrency) or use concatMap operator if you want to process sequentially. Check flatMap(..., int concurrency, int prefetch) for details.


Spring Boot #kafkaListner with blocking queue

I am new to Spring Boot #kafkaListener. My application receiving almost 200K message per second on topic. I want to separate message listener and processing of the message.
How can I use java.util.concurrent.BlockingQueue with #kafkaListener? Can I use it by using CompletableFuture?
Any sample code will help more.
I believe you want to have your consumer with pipelining implemented. Its not uncommon for one to implement this in a scenario like yours. Why? Well, the KafkaConsumer lacks in that decompressing / deserializing can be time consuming without considering the time it takes to do processing. Since these operations are stacked behind one thread, it would be ideal to separate the polling from the processing, which is achieved through a couple of buffers.
One way to do this: your EventReceiver spins up a thread for the polling. That thread would do the same thing you always do, but instead of firing off the listeners for each event, you'd pass the event to a receivedEvents buffer which could be BlockingQueue<RecieveEvent>. So in the for loop, you pass each record to the blocking queue. This thread would leverage another buffer once the for loop is over, like Queue<Map<TopicPartition, OffsetAndMetadata>> -- and it would commit the offsets that the processingThread has successfully processed.
Next, your EventReceiver spins up another thread - processingThread. This would handle pulling records from the buffer, firing the event to all the listeners for this receiver, and then update the Queues state for the pollingThread to commit.
Why doesn't the processingThread just commit the events instead of passing it back to the pollingThread? This is bc KafkaConsumer requires that the same thread that calls .poll() should be the one that calls consumer.commitAsync(...) or else you'll get a concurrency exception.
This approach doesn't work with auto commit enabled.
In terms of how one can do this using Spring Kafka, I'm not completely sure. However, I do know Spring Kafka separates EventReceiver from EventListener (#KafkaListener) which is separating the low-level kafka work from the business logic. In theory, you'd have to tune their implementation, but I think implementing this one without Spring Kafka library would be easier.

Retry after delay on back pressure with Spring Project Reactor?

I'm trying to implement something similar to a simple non-blocking rate-limiter with Spring Project Reactor version 3.3.0. For example, to limit the number to 100 requests per second I use this implementation:
.bufferTimeout(100, Duration.ofSeconds(1))
This works fine for my use case but if the subscriber doesn't keep up with the rate of the myFlux publisher it'll (rightly) throw an OverflowException:
reactor.core.Exceptions$OverflowException: Could not emit buffer due to lack of requests
at reactor.core.Exceptions.failWithOverflow(Exceptions.java:215)
Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Assembly trace from producer [reactor.core.publisher.FluxLift] :
In my case it's important that all elements are consumed by the subscriber so e.g. dropping on back pressure (onBackpressureDrop()) is not acceptable.
Is there a way to, instead of dropping elements on back pressure, just pause the publishing of messages until the subscriber has caught up? In my case myFlux is publishing a finite, but large set of, elements persisted in a durable database so dropping elements should not be required imho.
bufferTimeout(int maxSize, Duration maxTime) requests an unbounded amount of messages, thus being insensitive to backpressure. That makes it unsuitable for your case.
On a conceptual level, bufferTimeout cannot be backpressure sensitive, because you clearly instruct the publisher to emit one batch (even if it is empty) for every elapsed duration. If the subscriber is too slow, this will - rightfully - cause an overflow.
Instead, try:
buffer(int maxSize) requests the correct amount upstream (request * maxSize), and so is sensitive to backpressure from the subscribers.

Is it acceptable to use Mono<Object>.publishOn(Schedulers.elastic) for blocking operations?

I understand when using blocking operations in reactive streams we should use Publisher<Object>.publishOn(Schedulers.elastic).subscribe(//blocking operations go here)
I understand that it makes sense when my publisher publishes a list of items (For ex: Flux) the future items does not have to wait for the current item getting blocked by a blocking operation. But in case of Mono is it necessary ? Because there will be only one item flowing in my pipe.
PS. I am using spring boot 2 reactive flux controller something like this.
public Mono<Response> saveItem(Mono<Item> item) {
item.publishOn(Schedulers.elastic()) **//Do I need this ?**
.map(item -> new Response(Item);
Yes, absolutely!
If you don't do it you are blocking on the main processing/event loop threads. Of these, you should have only as many as your machine has (effective) CPUs.
Let's say that's 8. This means with just 8 concurrent requests that are waiting for the blocking operation you bring your application to a full stop!
Also, make sure to move processing after the blocking operation back to a thread pool intended for CPU intense work.

How do you actually "manage" the max number of webthreads using Spring 5's Reactive Programming?

When using a classical Tomcat approach, you can give your server a maximum number of threads it can use to handle web requests from users. Using the Reactive Programming paradigm, and Reactor in Spring 5, we are able to scale better vertically, making sure we are blocked minimally.
It seems to me that it makes this less manageable than the classical Tomcat approach, where you simply define the max number of concurrent requests. When you have a max number of concurrent requests, it's easier to estimate the maximum memory your application will need and scale accordingly. When you use Spring 5's Reactive Programming this seems like more of a hassle.
When I talk about these new technologies to sysadmin friends, they reply with worry about applications running out of RAM, or even threads on the OS level. So how can we deal with this better?
No blocking I/O at ALL
First of all, if you don't have any blocking operation then you should not worry at all about How much Thread should I provide for managing concurrency. In that case, we have only one worker which process all connections asynchronously and nonblockingly. And in that case, we may easily scale connection-servant workers which process all connections without contention and coherence (each worker has its own queue of received connections, each worker works on its own CPU) and we may scale application better in that case (shared nothing design).
Summary: in that case you manage max number of webthread identically as previously, by configuration application-container (Tomcat, WebSphere, etc) or similar in case of non-Servlet servers like Netty, or hybrid Undertow. The benefit - you may process muuuuuuch more users requests but with the same resources consumption.
Blocking Database and Non-Blocking Web API (such as WebFlux over Netty).
In case we should deal somehow with blocking I/O, for an instant communication with DB over blocking JDBC, the most appropriate way to keep your app scalable and efficient as possible we should use dedicated thread-pool for I/O.
Thread-pool requirements
First of all, we should create thread-pool with exactly the same amount of workers as available connections in JDBC connections-pool. Hence, we will have exactly the same amount of thread which will be blockingly wait for the response and we utilize our resources as efficiently as it possible, so no more memory will be consumed for Thread stack as it actually needed (In other word Thread per Connection model).
How to configure thread-pool accordingly to size of connection-pool
Since access to properties is varying for a particular database and JDBC driver, we may always externalize that configuration on a particular property, which in turn means that it may be configured by devops or sysadmin.
A configuration of Threadpool (in our example it is configuring of Scheduler of Project Reactor 3) may looks like next:
public class ReactorJdbcSchedulerConfig {
int schedulerSize;
public Scheduler jdbcScheduler() {
return Schedulers.fromExecutor(new ForkJoinPool(schedulerSize));
// similarly
// ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
// taskExecutor.setCorePoolSize(schedulerSize);
// taskExecutor.setMaxPoolSize(schedulerSize);
// taskExecutor.setQueueCapacity(schedulerSize);
// taskExecutor.initialize();
// return Schedulres.fromExecutor(taskExecutor);
Scheduler jdbcScheduler;
public Mono myJdbcInteractionIsolated(String id) {
return Mono.fromCallable(() -> jpaRepo.findById(id))
As it might be noted, with that technique, we may delegate our shared thread-pool configuration to an external team (sysadmins for an instance) and allows them to manage consumption of memory which is used for created Java Threads.
Keep your blocking I/O thread pool only for I/O work
This statement means that I/O thread should be only for operations which are blockingly waiting. In turn, it means that after the thread has done his awaiting the response, you should move result processing to another thread.
That is why in the above code-snippet I put .publishOn right after .subscribeOn.
So, to summarize, with that technique we may allow external team managing application sizing by controlling thread-pool size to connection-pool size accordingly. All results processing will be executed within one thread and there will be no redundant, uncontrolled memory consumption hence.
Finally, Blocking API (Spring MVC) and blocking I/O (Database access)
In that case, there is no need for reactive paradigm at all since you don't get any profit from that. First of all, Reactive Programming requires particular mind shifting, especially in the understanding of the usage of functional techniques with Reactive libraries such as RxJava or Project Reactor. In turn for non-prepared users, it gives more complexity and causes more "What ****** is going on here???". So, in case of blocking operations from both ends, you should think twice do you really need Reactive Programming here.
Also, there is no magic for free. Reactive Extensions comes with a lot of internal complexity and using all that magical .map, .flatMap, etc., you may lose in overall performance and memory consumption instead of winning like in case of end-to-end non-blocking, async communication.
That means that old good imperative programming will be more suitable here and it will much easier to control your application sizing in memory using old good Tomcat configuration management.
Can you try this :
public class AsyncConfig implements AsyncConfigurer {
public Executor getAsyncExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
return taskExecutor;
This works for async in spring 4 but I'm not sure it'll works in spring 5 with reactive.

Spring Integration message processing partitioned by header information

I want to be able to process messages with Spring Integration in parallel. The messages come from multiple devices and we need to process messages from the same device in sequential order but the devices can be processed in multiple threads. There can be thousands of devices so I'm trying to figure out how to assign processor based on mod of the device ID using Spring Integration's semantics as much as possible. What approach should I be looking at?
It's difficult to generalize without knowing other requirements (transaction semantics etc) but probably the simplest approach would be a router sending messages to a number of QueueChannels using some kind of hash algorithm on the device id (so all messages for a particular device go to the same channel).
Then, have a single-threaded poller pulling messages from each queue.
EDIT: (response to comment)
Again, difficult to generalize, but...
See AbstractMessageRouter.determineTargetChannels() - a router actually returns a physical channel object (actually a list, but in most cases a list of 1). So, yes, you can create the QueueChannels programmatically and have the router return the appropriate one, based on the message.
Assuming you want all the messages to then be handled by the same downstream flow, you would also need to create a <bridge/> for each queue channel to bridge it to the input channel of the next component in the flow.
create a QueueChannel
create a BridgeHandler (set the outputChannel to the input channel of the next component)
create a PollingConsumer (constructor takes the channel and handler; set trigger etc)
start() the consumer.
All of this can be done in your custom router initialization and implement determineTargetChannels() to select the queue.
Depending on the processing time for your events, I would generally recommend running the downstream flow on the poller thread rather than setting a taskExecutor to avoid issues with the next poll trying to schedule another task before this one's done. You might need to increase the default taskScheduler's pool size.
