Running a Mono in background while returning a response when using Spring Webflux - spring-boot

This questions is related to Return immediately in spring web flux but I don't think it's the same (at least the answer there is not satisfactory for me).
I have a function returning a Mono that when invoked starts a long-running job. This function is invoked when a call is made to a Spring Webflux HTTP API. Here's an example:
#PutMapping("/{jobId}")
fun startNewJob(#PathVariable("jobId") jobId: String,
request: ServerHttpRequest): Mono<ResponseEntity<Unit>> {
val longRunningJob : Mono<Job> = startNewJob(jobId)
longRunningJob.map { job ->
val jobUri = generateJobUri(request, job.id)
ResponseEntity.created(jobURI).build<Unit>()
}
}
The problem with the code above is that "201 Created" is created after the long running job is completed. I want to kick-off the longRunningJob in the background and return "201 Created" immediately.
I could perhaps do something like this:
#PutMapping("/{jobId}")
fun startNewJob(#PathVariable("jobId") jobId: String,
request: ServerHttpRequest): Mono<ResponseEntity<Unit>> {
startNewJob(jobId)
.subscribeOn(Schedulers.newSingle("thread"))
.subscribe()
val jobUri = generateJobUri(request, job.id)
val response = ResponseEntity.created(jobURI).build<Unit>()
Mono.just(response)
}
But it doesn't seem very idiomatic to me to have to call subscribe() manually (e.g. intellij is complaining that I call subscribe() in non-blocking scope). Isn't there a better way to compose the two "streams" without using an explicit subscribe? If so how do I modify the startNewJob function above to achieve this?

AFAIK, using one of the subscribe methods is the only way to really start a job in the background with its own lifecycle (not tied to the returned publisher).
If you were to use one of the operators to combine the job publisher and the response publisher (e.g. zip or merge), then the lifecycle of the job publisher would be tied to the response publisher, which is not what you want for a background job.
One thing you might want to consider is kicking off the background job within the response publisher stream, rather than directly in the method body. e.g. via doOnSubscibe or from an operator upstream of the response.
This would tie the start of the background job to the onSubscribe events of the response publisher, but still allow it to complete in the background.
Also note, that if you want to be able to cancel the background job (e.g. maybe during application shutdown), you'll need to save the Disposable returned from subscribe so you can later call dispose on it. This might be better done from some type of BackgroundJobManager that could keep track of all the jobs running.

private static final Scheduler backgroundTaskScheduler = Schedulers.newParallel("backgroundTaskScheduler", 2);
backgroundTaskScheduler.schedule(() -> doBackgroundJob());

Related

FireAndForget call to WebApi from Azure Function

I want to be able to call an HTTP endpoint (that I own) from an Azure Function at the end of the Azure Function request.
I do not need to know the result of the request
If there is a problem in the HTTP endpoint that is called I will log it there
I do not want to hold up the return to the client calling the initial Azure Function
Offloading the call of the secondary WebApi onto a background job queue is considered overkill for this requirement
Do I simply call HttpClient.PutAsync without an await?
I realise that the dependencies I have used up until the point that the call is made may well not be available when the call returns. Is there a safe way to check if they are?
My answer may cause some controversy but, you can always start a background task and execute it that way.
For anyone reading this answer, this is far from recommended. The OP has been very clear that they don't care about exceptions or understanding what sort of result the request is returning ...
Task.Run(async () =>
{
using (var httpClient = new HttpClient())
{
await httpClient.PutAsync(...);
}
});
If you want to ensure that the call has fired, it may be worth waiting for a second or two after the call is made to ensure it's actually on it's way.
await Task.Delay(1000);
If you're worried about dependencies in the call, be sure to construct your payload (i.e. serialise it, etc.) external to the Task.Run, basically, minimise any work the background task does.

Project reactor - react to timeout happened downstream

Project Reactor has a variety of timeout() operators.
The very basic implementation raises TimeoutException in case no item arrives within the given Duration. The exception is propagated downstream , and to upstream it sends cancel signal.
Basically my question is: is it possible to somehow react (and do something) specifically to timeout that happened downstream, not just to cancelation that sent after timeout happened?
My question is based on the requirements of my real business case and also I'm wondering if there is a straight solution.
I'll simplify my code for better understanding what I want to achieve.
Let's say I have the following reactive pipeline:
Flux.fromIterable(List.of(firstClient, secondClient))
.concatMap(Client::callApi) // making API calls sequentially
.collectList() // collecting results of API calls for further processing
.timeout(Duration.ofMillis(3000)) // the entire process should not take more than duration specified
.subscribe();
I have multiple clients for making API calls. The business requirement is to call them sequantilly, so I call them with concatMap(). Then I should collect all the results and the entire process should not take more than some Duration
The Client interface:
interface Client {
Mono<Result> callApi();
}
And the implementations:
Client firstClient = () ->
Mono.delay(Duration.ofMillis(2000L)) // simulating delay of first api call
.map(__ -> new Result())
// !!! Pseudo-operator just to demonstrate what I want to achieve
.doOnTimeoutDownstream(() ->
log.info("First API call canceled due to downstream timeout!")
);
Client secondClient = () ->
Mono.delay(Duration.ofMillis(1500L)) // simulating delay of second api call
.map(__ -> new Result())
// !!! Pseudo-operator just to demonstrate what I want to achieve
.doOnTimeoutDownstream(() ->
log.info("Second API call canceled due to downstream timeout!")
);
So, if I have not received and collected all the results during the amount of time specified, I need to know which API call was actually canceled due to downstream timeout and have some callback for this "event".
I know I could put doOnCancel() callback to every client call (instead of pseudo-operator I demonstrated) and it would work, but this callback reacts to cancelation, which may happen due to any error.
Of course, with proper exception handling (onErrorResume(), for example) it would work as I expect, however, I'm interesting if there is some straight way to somehow react specifically to timeout in this case.

How to use Spring WebClient to make multiple calls sequentionaly?

I read the topic
How to use Spring WebClient to make multiple calls simultaneously? , but my case is a bit different. I'm calling 2 different external services using webclient, let's say from method Mono < Void > A() , followed by Mono < Void > B (). My goal is to extract data from A(), then pass it to B(). Is there correct way to avoid:
asynchronous call (which leads to Illegal arg exception, since B requesting args BEFORE A() complete);
blocking call, cause the system is reactive.
Is there are a standart way to achieve it?
First scenario:
Mono<> a = getFromAByWebClient();
and you want to send this data to call service B via a post or put request,
here, since mono is one object and you want to send it through api in a post or method, so must have that data with you, here you should wait until data is not come from first service, else it will hit the api with blank data or will result an exception.
Second scenario:
Since B is dependent on A, why not to call A service inside B service and get data.
Since in Spring reactive everything is stream, so can do operation with one data until others are on their way, but that operations which will be performed should have data.
Well, I was told how to refactor the code. The problem was fixed and for memorizing, here is the solution:
the original code returns
Mono.fromRunnable(()->apply(param param));
method 'apply' subscribes on call of external resource:
apply(param param) {
service.callRemote(val x).subscribe();
<---some bl --->
};
So,it seems Like when first beanA.process() followed beanB.process(), reactive pipeline falls apart, and lambda from runnable() branches into separate thread.
What was changed:
beanA and beanB methods apply return logic -
Mono.just.flatMap(service.callRemote(val x)).then();
apply() has been removed, remote call wrapped into flatMap() and integrated into pipeline. Now it works as expected, sequentionally calling remote resource.

Spring webflux how to return 200 response to client before processing large file

I am working on a Spring Webflux project,
I want to do something like, When client make API call, I want to send success message to client and perform large file operation in background.
So client does not have to wait till my entire file is process.
For try out I made sample code as below
REST controller
#GetMapping(value = "/{jobId}/process")
#ApiOperation("Start import job")
public Mono<Integer> process(#PathVariable("jobId") long jobId) {
return service.process(jobId);
}
File processing Service
public Mono<Integer> process(Integer jobId) {
return repository
.findById(jobId)
.map(
job -> {
File file = new File("read.csv");
return processFile(file);
});
}
Following is my stack
Spring Webflux 2.2.2.RELEASE
I try to make this call using WebClient, but till entire file is not processed I am not getting response.
As one of the options, you can run processing in a different thread.
For example:
Create an Event Listener Link
Enable #Async and #EnableAsync Link
Or use deferent types of Executors from Java concurrency package
Or manually run the thread
Also for Kotlin you can use Coroutines
You can use the subscribe method and start a job with its own scope in background.
Mono.delay(Duration.ofSeconds(10)).subscribeOn(Schedulers.newElastic("myBackgroundTask")).subscribe(System.out::println);
As long as you do not tie this to your response publisher using one of the zip/merge or similar operators your job will be run on background on its own scheduler pool.
subscribe() method returns a Disposable instance which can later be used cancel the background job by calling dispose() method.

Running Plone subscriber events asynchronously

In using Plone 4, I have successfully created a subscriber event to do extra processing when a custom content type is saved. This I accomplished by using the Products.Archetypes.interfaces.IObjectInitializedEvent interface.
configure.zcml
<subscriber
for="mycustom.product.interfaces.IRepositoryItem
Products.Archetypes.interfaces.IObjectInitializedEvent"
handler=".subscribers.notifyCreatedRepositoryItem"
/>
subscribers.py
def notifyCreatedRepositoryItem(repositoryitem, event):
"""
This gets called on IObjectInitializedEvent - which occurs when a new object is created.
"""
my custom processing goes here. Should be asynchronous
However, the extra processing can sometimes take too long, and I was wondering if there is a way to run it in the background i.e. asynchronously.
Is it possible to run subscriber events asynchronously for example when one is saving an object?
Not out of the box. You'd need to add asynch support to your environment.
Take a look at plone.app.async; you'll need a ZEO environment and at least one extra instance. The latter will run async jobs you push into the queue from your site.
You can then define methods to be executed asynchronously and push tasks into the queue to execute such a method asynchronously.
Example code, push a task into the queue:
from plone.app.async.interfaces import IAsyncService
async = getUtility(IAsyncService)
async.queueJob(an_async_task, someobject, arg1_value, arg2_value)
and the task itself:
def an_async_task(someobject, arg1, arg2):
# do something with someobject
where someobject is a persistent object in your ZODB. The IAsyncService.queueJob takes at least a function and a context object, but you can add as many further arguments as you need to execute your task. The arguments must be pickleable.
The task will then be executed by an async worker instance when it can, outside of the context of the current request.
Just to give more options, you could try collective.taskqueue for that, really simple and really powerful (and avoid some of the drawbacks of plone.app.async).
The description on PyPI already has enough to get you up to speed in no time, and you can use redis for the queue management which is a big plus.

Resources