Spring RSocket high CPU usage after client disconnect - spring

I've created a simple RSocket endpoint with Spring RSocket support:
#Controller
class SampleController {
#MessageMapping("sample")
fun sample(): Flux<ByteArray> {
return Flux
.fromIterable(generateSequence(1) { it + 1 }.asIterable())
.delayElements(Duration.ofSeconds(2))
.doOnNext { println(it) }
.map { it.toString().toByteArray() }
}
}
This works fine as long as a client is connected and pulling data:
rsc tcp://localhost:8888 --stream --route sample
Once I cancel this, CPU usage gets to 20%. If I run more rsc clients in parallel, CPU usage will be >50% when I cancel all of them.
I was trying to look for some samples where there is a clean-up on the server side, but didn't manage to find anything. Also, I've enabled global DEBUG logging level (logging.level.root=DEBUG), but there are is nothing in the logs after clients are canceled. I was thinking maybe there's some sort of automatic reconnection mechanism that kicks in and starts spamming requests.
So, am I missing something, and is there a way such behavior?

Related

Spring WebFlux: Refactoring blocking API with Reactive API, or should I?

I have a legacy Spring Boot REST app that interacts with downstream services that block. I'm new to reactive programming, and am unsure how to handle these blocking requests. Most Webflux examples I've seen are pretty trivial. Here's the flow-of-control of my app:
User queries MyApp at http://myapp.com
MyApp then queries partner REST API, which is BLOCKING.
Depending on account type, data from the blocking app needs to be queried to make another call to another blocking REST application.
All data is enriched and rendered by MyApp to the browser.
Where to start? I'm using WebClient currently, so that part's done. I know I should perform the blocking steps on a different scheduler (parallel or boundedElastic?) Should I use a Flux or Mono, since the partner APIs return the data all at once?
Both apps return thousands of rows of data, and the user just waits... Steps 1-2 take about 4 secs; add in step 3, and we're looking at over 30 seconds due to the inefficiency of the API. Can Flux help my users' wait time at all?
EDIT Below is a (long) example of what my application is doing. Notice that I block my first call to the API to get a count of what's being returned, then I fetch the rest in batches of TASK_QUERY_LIMIT.
#Bean
public WebClient authWebClient(WebClient.Builder builder) {
MultiValueMap<String, String> map = new LinkedMultiValueMap<>();
map.set(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE);
final int size = 48 * 1024 * 1024;
final ExchangeStrategies strategies = ExchangeStrategies.builder()
.codecs(codecs -> codecs.defaultCodecs().maxInMemorySize(size))
.build();
return builder.baseUrl(configProperties.getUrl())
.exchangeStrategies(strategies)
.defaultHeaders(httpHeaders -> httpHeaders.addAll(map))
.filters(exchangeFilterFunctions -> {
exchangeFilterFunctions.add(logResponseStatus());
exchangeFilterFunctions.add(logRequest());
})
.build();
}
public Mono<Task> getTasksMono() {
return getAuthWebClient()
.baseUrl("http://MyApp.com")
.accept(MediaType.APPLICATION_JSON)
.retrieve()
.onStatus(HttpStatus::isError, this::onHttpStatusError)
.bodyToMono(new ParameterizedTypeReference<Response<Task>>() {}));
}
// Service method
public List<Task> getTasksMono() {
Mono<Response<Task>> monoTasks = getTasksMono();
Task tasks = monoTasks.block();
int taskCount = tasks.getCount();
List<Task> returnTasks = new ArrayList<>(tasks.getData());
List<Mono<<Task>> tasksMonoList = new ArrayList<>();
// query API-ONE for all remaining tasks
if (taskCount > TASK_QUERY_LIMIT) {
retrieveAdditionalTasks(key, taskCount, tasksMonoList);
}
// Send out all of the calls at once, and subscribe to their results.
Flux.mergeSequential(tasksMonoList)
.map(Response::getData)
.doOnNext(returnTasks::addAll)
.blockLast();
return returnTasks.stream()
.map(this::transform) // This method performs business logic on the data before returning to user
.collect(Collectors.toList());
}
private void retrieveAdditionalTasks(String key, int taskCount,
List<Mono<Response<Task>>> tasksMonoList) {
int offset = TASK_QUERY_LIMIT;
int numRequests = (taskCount - offset) / TASK_QUERY_LIMIT + 1;
for (int i = 0; i < numRequests; i++) {
tasksMonoList.add(getTasksMono(processDefinitionKey, encryptedIacToken,
TASK_QUERY_LIMIT, offset));
offset += TASK_QUERY_LIMIT;
}
}
There are multiple questions here. Will try to highlight main points
1. Does it make sense refactoring to Reactive API?
From the first look your application is IO bound and typically reactive applications are much more efficient because all IO operations are async and non-blocking. Reactive application will not be faster but you will need less resources to The only caveat is that in order to get all benefits from the reactive API, your app should be reactive end-to-end (reactive drivers for DB, reactive WebClient, …). All reactive logic is executed on Schedulers.parallel() and you need small number of threads (by default, number of CPU cores) to execute non-blocking logic. It’s still possible use blocking API by “offloading” them to Schedulers.boundedElastic() but it should be an exception (not the rule) to make your app efficient. For more details, check Flight of the Flux 3 - Hopping Threads and Schedulers.
2. Blocking vs non-blocking.
It looks like there is some misunderstanding of the blocking API. It’s not about response time but about underlining API. By default, Spring WebFlux uses Reactor Netty as underlying Http Client library which itself is a reactive implementation of Netty client that uses Event Loop instead of Thread Per Request model. Even if request takes 30-60 sec to get response, thread will not be blocked because all IO operations are async. For such API reactive applications will behave much better because for non-reactive (thread per request) you would need large number of threads and as result much more memory to handle the same workload.
To quantify efficiency we could apply Little's Law to calculate required number of threads in a ”traditional” thread per request model
workers >= throughput x latency, where workers - number of threads
For example, to handle 100 QPS with 30 sec latency we would need 100 x 30 = 3000 threads. In reactive app the same workload could be handled by several threads only and, as result, much less memory. For scalability it means that for IO bound reactive apps you would typically scale by CPU usage and for “traditional” most probably by memory.
Sometimes it's not obvious what code is blocking. One very useful tool while testing reactive code is BlockHound that you could integrate into unit tests.
3. How to refactor?
I would migrate layer by layer but block only once. Moving remote calls to WebClient could be a first step to refactor app to reactive API. I would create all request/response logic using reactive API and then block (if required) at the very top level (e.g. in controller). Do’s and Don’ts: Avoiding First-Time Reactive Programmer Mines is a great overview of the common pitfalls and possible migration strategy.
4. Flux vs Mono.
Flux will not help you to improve performance. It’s more about downstream logic. If you process record-by-record - use Flux<T> but if you process data in batches - use Mono<List<T>>.
Your current code is not really reactive and very hard to understand mixing reactive API, stream API and blocking multiple times. As a first step try to rewrite it as a single flow using reactive API and block only once.
Not really sure about your internal types but here is some skeleton that could give you an idea about the flow.
// Service method
public Flux<Task> getTasks() {
return getTasksMono()
.flatMapMany(response -> {
List<Mono<Response<Task>>> taskRequests = new ArrayList<>();
taskRequests.add(Mono.just(response));
if (response.getCount() > TASK_QUERY_LIMIT) {
retrieveAdditionalTasks(key, response.getCount(), taskRequests);
}
return Flux.mergeSequential(taskRequests);
})
.flatMapIterable(Response::getData)
.map(this::transform); // use flatMap in case transform is async
}
As I mentioned before, try to keep internal API reactive returning Mono or Flux and block only once in the upper layer.

coordinating multiple outgoing requests in a reactive manner

this is more of a best practice question.
in my current system (monolith), a single incoming http api request might need to gather similarly structured data from to several backend sources, aggregate it and only then return the data to the client in the reponse of the API.
in the current implementation I simply use a threadpool to send all requests to the backend sources in parallel and a countdown latch of sorts to know all requests returned.
i am trying to figure out the best practice for transforming the described above using reactice stacks like vert.x/quarkus. i want to keep the reactiveness of the service that accepts this api call, calls multiple (similar) backend source via http, aggregates the data.
I can roughly guess I can use things like rest-easy reactive for the incoming request and maybe MP HTTP client for the backend requests (not sure its its reactive) but I am not sure what can replace my thread pool to execute things in parallel and whats the best way to aggregate the data that returns.
I assume that using a http reactive client I can invoke all the backend sources in a loop and because its reactive it will 'feel' like parralel work. and maybe the returned data should be aggragated via the stream API (to join streams of data)? but TBH I am not sure.
I know its a long long question but some pointers would be great.
thanks!
You can drop the thread pool, you don't need it to invoke your backend services in parallel.
Yes, the MP RestClient is reactive. Let's say you have this service which invokes a backend to get a comic villain:
#RegisterRestClient(configKey = "villain-service")
public interface VillainService {
#GET
#Path("/")
#NonBlocking
#CircuitBreaker
Uni<Villain> getVillain();
}
And a similar one for heroes, HeroService. You can inject them in your endpoint class, retrieve a villain and a hero, and then compute the fight:
#Path("/api")
public class Api {
#RestClient
VillainService villains;
#RestClient
HeroService heroes;
#Inject
FightService fights;
#GET
public Uni<Fight> fight() {
Uni<Villain> villain = villains.getVillain();
Uni<Hero> hero = heroes.getRandomHero();
return Uni.combine().all().unis(hero, villain).asTuple()
.chain(tuple -> {
Hero h = tuple.getItem1();
Villain v = tuple.getItem2();
return fights.computeResult(h, v);
});
}
}

Webflux - hanging requests when using bounded elastic Scheduler

I have a service written with webflux that has high load (40 request per second)
and I'm encountering a really bad latency and performance issues with behaviours I can't explain: at some point during peaks, the service hangs in random locations as if it doesn't have any threads to handle the request.
The service does however have several calls to different service that aren't reactive - using WebClient, and another call to a main service that retrieves the main data through an sdk wrapped in Mono.fromCallable(..).publishOn(Schedulers.boundedElastic()).
So the flow is:
upon request such as Mono<Request>
convert to internal object Mono<RequestAggregator>
call GCP to get JWT token and then call some service to get data using webclient
call the main service using Mono.fromCallable(MainService.getData(RequestAggregator)).publishOn(Schedulers.boundedElastic())
call another service to get more data (same as 3)
call another service to get more data (same as 3)
do some manipulation with all the data and return a Mono<Response>
the webclient calls look something like that:
Mono.fromCallable(() -> GoogleService.getToken(account, clientId)
.buildIapRequest(REQUEST_URL))
.map(httpRequest -> httpRequest.getHeaders().getAuthorization())
.flatMap(authToken -> webClient.post()
.uri("/call/some/endpoint")
.header(HttpHeaders.AUTHORIZATION, authToken)
.header(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE)
.header(HttpHeaders.ACCEPT, MediaType.APPLICATION_JSON_VALUE)
.body(BodyInserters.fromValue(countries))
.retrieve()
.onStatus(HttpStatus::isError, clientResponse -> {
log.error("{} got status code: {}",
ERROR_MSG_ERROR, clientResponse.statusCode());
return Mono.error(new SomeWebClientException(STATE_ABBREVIATIONS_ERROR));
})
.bodyToMono(SomeData.class));
sometimes step 6 hangs for more than 11 minutes, and this service does not have any issues. It's not reactive but responses take ~400ms
Another thing worth mentioning is that MainService is a heavy IO operation that might take 1 minute or more.
I feel like a lot of request hangs on MainService and theren't any threads left for the other operations, does that make sense? if so, how does one solve something like that?
Can someone suggest any reason for this issue? I'm all out of ideas
It's not possible to tell for sure without knowing the full application, but indeed the blocking IO operation is the most likely culprit.
Schedulers.boundedElastic(), as its name suggests, is bounded. By default the bound is "ten times the number of available CPU cores", so on a 2-core machine it would be 20. If you have more concurrent requests than the limit, the rest is put into a queue waiting for a free thread indefinitely. If you need more concurrency than that, you should consider setting up your own scheduler using Scheduler.fromExecutor with a higher limit.

ktor server - when to move to another coroutine context

This may be a question about coroutines in general, but in my ktor server (netty engine, default configuration) application I perform serveral asyncronous calls to a database and api endpoint and want to make sure I am using coroutines efficiently. My question are as follows:
Is there a tool or method to work out if my code is using coroutines effectively, or do I just need to use curl to spam my endpoint and measure the performance of moving processes to another context e.g. compute?
I don't want to start moving tasks/jobs to another context 'just in case' but should I treat the default coroutine context in my Route.route() similar to the Android main thread and perform the minimum amount of work on it?
Here is an rough example of the code that I'm using:
fun Route.route() {
get("/") {
call.respondText(getRemoteText())
}
}
suspend fun getRemoteText() : String? {
return suspendCoroutine { cont ->
val document = 3rdPartyLibrary.get()
if (success) {
cont.resume(data)
} else {
cont.resume(null)
}
}
}
You could use something like Apache Jmeter, but writing a script and spamming your server with curl seems also a good option to me
Coroutines are pretty efficient when it comes to context/thread switching, and with Dispatchers.Default and Dispatchers.IO you'll get a thread-pool. There are a couple of documentations around this, but I think you can definitely leverage these Dispatchers for heavy operations
There are few tools for testing endpoints. Jmeter is good, there are also command line tools like wrk, wrk2 and siege.
Of course context switching costs. The coroutine in routing is safe to run blocking operations unless you have the option shareWorkGroup set. However, usually it's good to use a separate thread pool because you can control it's size (max threads number) to not get you database down.

Spring Boot Webflux/Netty - Detect closed connection

I've been working with spring-boot 2.0.0.RC1 using the webflux starter (spring-boot-starter-webflux). I created a simple controller that returns a infinite flux. I would like that the Publisher only does its work if there is a client (Subscriber). Let's say I have a controller like this one:
#RestController
public class Demo {
#GetMapping(value = "/")
public Flux<String> getEvents(){
return Flux.create((FluxSink<String> sink) -> {
while(!sink.isCancelled()){
// TODO e.g. fetch data from somewhere
sink.next("DATA");
}
sink.complete();
}).doFinally(signal -> System.out.println("END"));
}
}
Now, when I try to run that code and access the endpoint http://localhost:8080/ with Chrome, then I can see the data. However, once I close the browser the while-loop continues since no cancel event has been fired. How can I terminate/cancel the streaming as soon as I close the browser?
From this answer I quote that:
Currently with HTTP, the exact backpressure information is not
transmitted over the network, since the HTTP protocol doesn't support
this. This can change if we use a different wire protocol.
I assume that, since backpressure is not supported by the HTTP protocol, it means that no cancel request will be made either.
Investigating a little bit further, by analyzing the network traffic, showed that the browser sends a TCP FIN as soon as I close the browser. Is there a way to configure Netty (or something else) so that a half-closed connection will trigger a cancel event on the publisher, making the while-loop stop?
Or do I have to write my own adapter similar to org.springframework.http.server.reactive.ServletHttpHandlerAdapter where I implement my own Subscriber?
Thanks for any help.
EDIT:
An IOException will be raised on the attempt to write data to the socket if there is no client. As you can see in the stack trace.
But that's not good enough, since it might take a while before the next chunk of data will be ready to send and therefore it takes the same amount of time to detect the gone client. As pointed out in Brian Clozel's answer it is a known issue in Reactor Netty. I tried to use Tomcat instead by adding the dependency to the POM.xml. Like this:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-tomcat</artifactId>
</dependency>
Although it replaces Netty and uses Tomcat instead, it does not seem reactive due to the fact that the browser does not show any data. However, there is no warning/info/exception in the console. Is spring-boot-starter-webflux as of this version (2.0.0.RC1) supposed to work together with Tomcat?
Since this is a known issue (see Brian Clozel's answer), I ended up using one Flux to fetch my real data and having another one in order to implement some sort of ping/heartbeat mechanism. As a result, I merge both together with Flux.merge().
Here you can see a simplified version of my solution:
#RestController
public class Demo {
public interface Notification{}
public static class MyData implements Notification{
…
public boolean isEmpty(){…}
}
#GetMapping(value = "/", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<ServerSentEvent<? extends Notification>> getNotificationStream() {
return Flux.merge(getEventMessageStream(), getHeartbeatStream());
}
private Flux<ServerSentEvent<Notification>> getHeartbeatStream() {
return Flux.interval(Duration.ofSeconds(2))
.map(i -> ServerSentEvent.<Notification>builder().event("ping").build())
.doFinally(signalType ->System.out.println("END"));
}
private Flux<ServerSentEvent<MyData>> getEventMessageStream() {
return Flux.interval(Duration.ofSeconds(30))
.map(i -> {
// TODO e.g. fetch data from somewhere,
// if there is no data return an empty object
return data;
})
.filter(data -> !data.isEmpty())
.map(data -> ServerSentEvent
.builder(data)
.event("message").build());
}
}
I wrap everything up as ServerSentEvent<? extends Notification>. Notification is just a marker interface. I use the event field from the ServerSentEvent class in order to separate between data and ping events. Since the heartbeat Flux sends events constantly and in short intervals, the time it takes to detect that the client is gone is at most the length of that interval. Remember, I need that because it might take a while before I get some real data that can be sent and, as a result, it might also take a while before it detects that the client is gone. Like this, it will detect that the client is gone as soon as it can’t sent the ping (or possibly the message event).
One last note on the marker interface, which I called Notification. This is not really necessary, but it gives some type safety. Without that, we could write Flux<ServerSentEvent<?>> instead of Flux<ServerSentEvent<? extends Notification>> as return type for the getNotificationStream() method. Or also possible, make getHeartbeatStream() return Flux<ServerSentEvent<MyData>>. However, like this it would allow that any object could be sent, which I don’t want. As a consequence, I added the interface.
I'm not sure why this behaves like this, but I suspect it is because of the choice of generation operator. I think using the following would work:
return Flux.interval(Duration.ofMillis(500))
.map(input -> {
return "DATA";
});
According to Reactor's reference documentation, you're probably hitting the key difference between generate and push (I believe a quite similar approach using generate would probably work as well).
My comment was referring to the backpressure information (how many elements a Subscriber is willing to accept), but the success/error information is communicated over the network.
Depending on your choice of web server (Reactor Netty, Tomcat, Jetty, etc), closing the client connection might result in:
a cancel signal being received on the server side (I think this is supported by Netty)
an error signal being received by the server when it's trying to write on a connection that's been closed (I believe the Servlet spec does not provide that that callback and we're missing the cancel information).
In short: you don't need to do anything special, it should be supported already, but your Flux implementation might be the actual problem here.
Update: this is a known issue in Reactor Netty

Resources