How do you actually "manage" the max number of webthreads using Spring 5's Reactive Programming? - spring

When using a classical Tomcat approach, you can give your server a maximum number of threads it can use to handle web requests from users. Using the Reactive Programming paradigm, and Reactor in Spring 5, we are able to scale better vertically, making sure we are blocked minimally.
It seems to me that it makes this less manageable than the classical Tomcat approach, where you simply define the max number of concurrent requests. When you have a max number of concurrent requests, it's easier to estimate the maximum memory your application will need and scale accordingly. When you use Spring 5's Reactive Programming this seems like more of a hassle.
When I talk about these new technologies to sysadmin friends, they reply with worry about applications running out of RAM, or even threads on the OS level. So how can we deal with this better?

No blocking I/O at ALL
First of all, if you don't have any blocking operation then you should not worry at all about How much Thread should I provide for managing concurrency. In that case, we have only one worker which process all connections asynchronously and nonblockingly. And in that case, we may easily scale connection-servant workers which process all connections without contention and coherence (each worker has its own queue of received connections, each worker works on its own CPU) and we may scale application better in that case (shared nothing design).
Summary: in that case you manage max number of webthread identically as previously, by configuration application-container (Tomcat, WebSphere, etc) or similar in case of non-Servlet servers like Netty, or hybrid Undertow. The benefit - you may process muuuuuuch more users requests but with the same resources consumption.
Blocking Database and Non-Blocking Web API (such as WebFlux over Netty).
In case we should deal somehow with blocking I/O, for an instant communication with DB over blocking JDBC, the most appropriate way to keep your app scalable and efficient as possible we should use dedicated thread-pool for I/O.
Thread-pool requirements
First of all, we should create thread-pool with exactly the same amount of workers as available connections in JDBC connections-pool. Hence, we will have exactly the same amount of thread which will be blockingly wait for the response and we utilize our resources as efficiently as it possible, so no more memory will be consumed for Thread stack as it actually needed (In other word Thread per Connection model).
How to configure thread-pool accordingly to size of connection-pool
Since access to properties is varying for a particular database and JDBC driver, we may always externalize that configuration on a particular property, which in turn means that it may be configured by devops or sysadmin.
A configuration of Threadpool (in our example it is configuring of Scheduler of Project Reactor 3) may looks like next:
#Configuration
public class ReactorJdbcSchedulerConfig {
#Value("my.awasome.scheduler-size")
int schedulerSize;
#Bean
public Scheduler jdbcScheduler() {
return Schedulers.fromExecutor(new ForkJoinPool(schedulerSize));
// similarly
// ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
// taskExecutor.setCorePoolSize(schedulerSize);
// taskExecutor.setMaxPoolSize(schedulerSize);
// taskExecutor.setQueueCapacity(schedulerSize);
// taskExecutor.initialize();
// return Schedulres.fromExecutor(taskExecutor);
}
}
...
#Autowire
Scheduler jdbcScheduler;
public Mono myJdbcInteractionIsolated(String id) {
return Mono.fromCallable(() -> jpaRepo.findById(id))
.subscribeOn(jdbcScheduler)
.publishOn(Schedulers.single());
}
...
As it might be noted, with that technique, we may delegate our shared thread-pool configuration to an external team (sysadmins for an instance) and allows them to manage consumption of memory which is used for created Java Threads.
Keep your blocking I/O thread pool only for I/O work
This statement means that I/O thread should be only for operations which are blockingly waiting. In turn, it means that after the thread has done his awaiting the response, you should move result processing to another thread.
That is why in the above code-snippet I put .publishOn right after .subscribeOn.
So, to summarize, with that technique we may allow external team managing application sizing by controlling thread-pool size to connection-pool size accordingly. All results processing will be executed within one thread and there will be no redundant, uncontrolled memory consumption hence.
Finally, Blocking API (Spring MVC) and blocking I/O (Database access)
In that case, there is no need for reactive paradigm at all since you don't get any profit from that. First of all, Reactive Programming requires particular mind shifting, especially in the understanding of the usage of functional techniques with Reactive libraries such as RxJava or Project Reactor. In turn for non-prepared users, it gives more complexity and causes more "What ****** is going on here???". So, in case of blocking operations from both ends, you should think twice do you really need Reactive Programming here.
Also, there is no magic for free. Reactive Extensions comes with a lot of internal complexity and using all that magical .map, .flatMap, etc., you may lose in overall performance and memory consumption instead of winning like in case of end-to-end non-blocking, async communication.
That means that old good imperative programming will be more suitable here and it will much easier to control your application sizing in memory using old good Tomcat configuration management.

Can you try this :
public class AsyncConfig implements AsyncConfigurer {
#Override
public Executor getAsyncExecutor() {
ThreadPoolTaskExecutor taskExecutor = new ThreadPoolTaskExecutor();
taskExecutor.setCorePoolSize(15);
taskExecutor.setMaxPoolSize(100);
taskExecutor.setQueueCapacity(100);
taskExecutor.initialize();
return taskExecutor;
}
}
This works for async in spring 4 but I'm not sure it'll works in spring 5 with reactive.

Related

Quarkus Mutiny and Imperitave vs Reactive

TL;DR: which is more pattern? Using Mutiny + Imperative resteasy, or just use Reactive resteasy?
My understanding is Mutiny allows for me to pass Quarkus a longer running action, and have it handle the specifics of how that code gets run in the context. Does using Reactive provide equal or more benefit than Mutiny+imperative? If from a functional point of view from a thread handling perspective it's equal or better, then Reactive would be great as it requires less code to maintain (Creating Unis, etc). However, if passing a Uni back is significantly better, then it might make sense to use that.
https://quarkus.io/guides/getting-started-reactive#imperative-vs-reactive-a-question-of-threads
Mutiny + imperative:
#GET
#Path("/getState")
#Produces(MediaType.APPLICATION_JSON)
public Uni<State> getState() throws InterruptedException {
return this.serialService.getStateUni();
}
Reactive:
#GET
#Path("/getState")
#Produces(MediaType.APPLICATION_JSON)
public State getState() throws InterruptedException {
return this.serialService.getState();
}
As always, it depends.
First, I recommend you to read https://quarkus.io/blog/resteasy-reactive-smart-dispatch/, which explains the difference between the two approaches.
It's not about longer action (async != longer); it's about dispatching strategies.
When using RESTEasy Reactive, it uses the I/O thread (event loop) to process the request and switches to a worker thread only if the signature of the endpoint requires it. Using the I/O thread allows better concurrency (as you do not use worker threads), reduces memory usage (because you do not need to create the worker thread), and also tends to make the response time lower (as you save a few context switches).
Quarkus detects if your method can be called on the I/O thread or not. The heuristics are based on the signature of the methods (including annotations). To reuse the example from the question:
a method returning a State is considered blocking and so will be called on a worker thread
a method returning a Uni<State> is considered as non-blocking and so will be called on the I/O thread
a method returning a State but explicitly annotated with #NonBlocking is considered as non-blocking and so will be called on the I/O thread
So, the question is, which dispatching strategy should you use?
It really depends on your application and context. If you do not expect many concurrent requests (it's hard to give a general threshold, but it's often between 200 and 500 req/sec), it is perfectly fine to use a blocking/imperative approach. If your application acts as an API Gateway with potential peaks of requests, non-blocking will provide better results.
Remember that even if you choose the imperative/blocking approach, RESTEasy Reactive provides many benefits. As most of the heavy-lifting request/response processing is done on the I/O thread, you get faster and use less memory for... free.

Is it acceptable to use Mono<Object>.publishOn(Schedulers.elastic) for blocking operations?

I understand when using blocking operations in reactive streams we should use Publisher<Object>.publishOn(Schedulers.elastic).subscribe(//blocking operations go here)
I understand that it makes sense when my publisher publishes a list of items (For ex: Flux) the future items does not have to wait for the current item getting blocked by a blocking operation. But in case of Mono is it necessary ? Because there will be only one item flowing in my pipe.
PS. I am using spring boot 2 reactive flux controller something like this.
#RestController("/item")
public Mono<Response> saveItem(Mono<Item> item) {
return
item.publishOn(Schedulers.elastic()) **//Do I need this ?**
.map(blockingDB.save(item))
.map(item -> new Response(Item);
}
Yes, absolutely!
If you don't do it you are blocking on the main processing/event loop threads. Of these, you should have only as many as your machine has (effective) CPUs.
Let's say that's 8. This means with just 8 concurrent requests that are waiting for the blocking operation you bring your application to a full stop!
Also, make sure to move processing after the blocking operation back to a thread pool intended for CPU intense work.

Hard limit connections Spring Boot

I'm working on a simple micro service written in Spring Boot. This service will act as a proxy towards another resources that have a hard concurrent connection limit and the requests take a while to process.
I would like to impose a hard limit on concurrent connections allowed to my micro service and rejecting any with either a 503 or on tcp/ip level. I've tried to look into different configurations that can be made for Jetty/Tomcat/Undertow but haven't figured out yet something completely convincing.
I found some settings regulating thread pools:
server.tomcat.max-threads=0 # Maximum amount of worker threads.
server.undertow.io-threads= # Number of I/O threads to create for the worker.
server.undertow.worker-threads= # Number of worker threads.
server.jetty.acceptors= # Number of acceptor threads to use.
server.jetty.selectors= # Number of selector threads to use.
But if understand correctly these are all configuring thread pool sizes and will just result in connections being queued on some level.
This seems like really interesting, but this hasn't been merged in yet and is targeted for Spring Boot 1.5 , https://github.com/spring-projects/spring-boot/pull/6571
Am I out of luck using a setting for now? I could of course implement a filter but would rather block it on an earlier level and not have to reinvent the wheel. I guess using apache or something else in front is also an option, but still that feels like an overkill.
Try to look at EmbeddedServletContainerCustomizer
this gist could give you and idea how to do that.
TomcatEmbeddedServletContainerFactory factory = ...;
factory.addConnectorCustomizers(connector ->
((AbstractProtocol) connector.getProtocolHandler()).setMaxConnections(10000));

Big difference in throughput when using MSMQ as subscription storage for Nservicebus as opposed to RavenDB

Recently we noticed that our Nservicebus subscribers were not able to handle the increasing load. We have a fairly constant input stream of events (measurement data from embedded devices), so it is very important that the throughput follows the input.
After some profiling, we concluded that it was not the handling of the events that was taking a lot of time, but rather the NServiceBus process of retrieving and publishing events. To try to get a better idea of what goes on, I recreated the Pub/Sub sample (http://particular.net/articles/nservicebus-step-by-step-publish-subscribe-communication-code-first).
On my laptop, using all the NServiceBus defaults, the maximum throughput of the Ordering.Server is about 10 events/second. The only thing it does is
class PlaceOrderHandler : IHandleMessages<PlaceOrder>
{
public IBus Bus { get; set; }
public void Handle(PlaceOrder message)
{
Bus.Publish<OrderPlaced>
(e => { e.Id = message.Id; e.Product = message.Product; });
}
}
I then started to play around with configuration settings. None seem to have any impact on this (very low) performance:
Configure.With()
.DefaultBuilder()
.UseTransport<Msmq>()
.MsmqSubscriptionStorage();
With this configuration, the throughput instantly went up to 60 messages/sec.
I have two questions:
When using MSMQ as subscription storage, the performance is much better than RavenDB. Why does something as trivial as the storage for subscription data have such an impact?
I would have expected a much higher performance. Are there any other configuration settings that I should use to get at least one order of magnitude better than this? On our servers, the maximum throughput when running this sample is about 200 msg/s. This is far from spectacular for a system that doesn't even do anything useful yet.
MSMQ doesn't have native pub/sub capabilities so NServiceBus adds support this by storing the list of subscribers and then looping over that list sending a copy of the event to each of the subscribers. This translates to X message queuing operations where X is the number of subscribers. This explains why RabbitMQ is faster since it has native pub/sub so you would only need one operation against the broker.
The reason the storage based on a msmq queue is faster is that it's a local storage (can't be used if you need to scaleout the endpoint) and that means that we can cache the data since that can't be any other endpoint instances updating the storage. In short this means that we get away with a in memory lookup which as you can see is the fastest option.
There are plans to add native caching across all storages:
https://github.com/Particular/NServiceBus/issues/1320
200 msg/s sounds quite low, what number do you get if you skip the bus.Publish? (just to get a base line)
Possibility 1: distributed transactions
Distributed transactions are created when processing messages because of the combination Queue-Database.
Try measuring without transactional handling of the messages. How does that compare?
Possibility 2: msmq might not be the best queueing system for your needs
Ever considered switching to rabbitmq for transport? I have very good experiences with RabbitMq in combination with MassTransit. Way exceed the numbers you are mentioning in your question.

Can Netty efficiently handle scores of outgoing connections as a client?

I'm creating a client-server relationship whereby a single client will be connected to an arbitrary number of servers using persistent TCP connections. The actual number of servers is as-of-yet undetermined, but the design goal is to shoot for 1000.
I found an example using direct Java NIO that nearly completely matches my mental model of how this could work:
http://drdobbs.com/jvm/184406242
In general, it opens up all of the channels and adds them to a single thread monitoring java.nio.channels.Selector. The use of the Selector, in particular, is what allows this to scale far better than using the standard thread-per-channel.
I would rather use a (slightly) higher level socket framework like Netty, than direct Java NIO. Unfortunately, I have not been able to determine how Netty would handle a case like this. That is, the examples and discussions I've found all tend to center around the server side, with accepting scores of concurrent connections.
But what about doing this from the client side? If I create a large number of channels and just wait on their events, how is Netty going to handle this at the back-end?
This isn't a direct answer to your question but I hope it is helpful nonetheless. Below, I describe a way for you to determine the answer that you are looking for. This is something that I recently did myself for an upcoming project.
Compared to OIO (Old IO) the asynchronous nature of the Netty framework and NIO will indeed provide much better memory and CPU usage characteristics for your application. The way buffers are handled in Netty will also be of benefit as it will help you to avoid copying byte buffers. The point is that all of the thread pool and NIO details will be handled for you allowing you to focus on your business logic. You mentioned the NIO Selector and you will benefit from that; the nice thing about Netty is that you get the benefits without having to worry about that implementation yourself because it is already done for you.
My understanding of the client side is that it is very similar to the server side and should provide you with commensurate performance gains (as long as your business logic doesn't introduce any performance issues).
My advice would be to throw together a prototype that more or less does what you want. Leave out any time consuming details and just add in the basic Netty handlers that you need to make something that works.
Then I would use jmeter to invoke your client to apply load to the server and client. Using something like jconsole or jvisualvm will show you the performance characteristics of the client and server under load. You could also try jprobe. You can add a listener in jmeter that will indicate the throughput. I would advise to use jmeter in server mode, the client on another machine and the server on yet another. This is a bit of up front work but if you decide to move forward you will have these tools ready to go for further testing as your proceed.
I suspect a decent Netty implementation that doesn't introduce any extraneous poorly performing components will give you the performance characteristics you are looking for, but, the only way to know for sure is to measure the system under the expected load.
You need to define what the expected load looks like and the desired performance characteristics under such load. Given these inputs you can measure your system to find out if it will meet your expectations. I personally don't think anyone can tell you if it will behave in the desired manner. You have to measure it. It's the only reliable way to know if the system can meet your needs.
I would rather use a (slightly) higher level socket framework like Netty, than direct Java NIO.
This is the correct approach. You can try implementing your own NIO server and client but why do that when you have the benefit of a highly refined framework at your fingertips already?
Netty will use up to x worker threads that handle the work for you. Each worker thread will have one Selector that is used to register Channels to it. The number of used workers is configurable and by default 2 * cpu-count.
As you can see in the example from Netty's doc [http://netty.io/docs/stable/guide/html/#start.9][1] you can control exactly the number of worker threads (meaning the number of underlying selectors) on the Client side.
Netty solves a numbers of issues that are very hard to handle in a simple way such as NIO vs SSL, and have a lot of default encoder/decoder for Zip... etc.
I started using Netty a few week ago and it was quite fast to came into. (I recommend dowloading the project with all the example code inside, there is a lot of documentation in it that can not be found on the url above.
ChannelFactory factory = new NioClientSocketChannelFactory(
Executors.newCachedThreadPool(),
Executors.newCachedThreadPool());
ClientBootstrap bootstrap = new ClientBootstrap(factory);
bootstrap.setPipelineFactory(new ChannelPipelineFactory() {
public ChannelPipeline getPipeline() {
return Channels.pipeline(new TimeClientHandler());
}
});
bootstrap.setOption("tcpNoDelay", true);
bootstrap.setOption("keepAlive", true);
bootstrap.connect(new InetSocketAddress(host, port));
Good luck,
Renaud

Resources