Chaining multiple publishers transforming the responses - spring

I want to chain one mono after each flux event. The mono publisher will need information from each event published by the flux. The response should be a flux with data of the flux event and the mono response.
After digging, I end up with a map inside a flatMap. The code looks like this:
override fun searchPets(petSearch: PetSearch): Flux<Pet> {
return petRepository
.searchPets(petSearch) // returns Flux<pet>
.flatMap { pet ->
petService
.getCollarForMyPet() // returns Mono<collar>
.map { collar -> PetConverter.addCollarToPet(pet, collar) } //returns pet (now with with collar)
}
}
My main concerns are:
Is a code smell using a map inside a flatMap?
Will pet variable content suffer race conditions with multiple flux events coming, and also the mono events?
Is there any better way to approach this kind of behaviour?

This approach is perfectly fine.
The Reactive Streams specification mandates that onNext events don't overlap, so there won't be an issue with race conditions.
flatMap introduces concurrency though, so multiple calls to the PetService will run in parallel. This shouldn't be an issue, unless searchPets emits some instance of Pet twice.
Not that due to that concurrency, flatMap can kind of reorder pets in this scenario. Imagine the search returns petA then petB, but the petService call for petA takes longer. In the ouptut of the flatMap, petB would be emitted first (with its collar set), then petA.

Related

Using yield in nested object in Kotlin sequence

I want to stream result objects captured by Spring JDBC RowCallbackHandler using via a Kotlin Sequence.
The code looks basically like this:
fun findManyObjects(): Sequence<Thing> = sequence {
val rowHandler = object : RowCallbackHandler {
override fun processRow(resultSet: ResultSet) {
val thing = // create from resultSet
yield(thing) // ERROR! No coroutine scope
}
}
jdbcTemplate.query("select * from ...", rowHandler)
}
But I get the compilation error:
Suspension functions can be called only within coroutine body.
However, exactly this "coroutine body" should exist, because the whole block is wrapped in a sequence builder. But it doesn't seem to work with a nested object.
Minimal example to show that it doesn't compile with a nested object:
// compiles
sequence {
yield(1)
}
// doesn't compile
sequence {
object {
fun doit() {
yield(1) // Suspension functions can be called only within coroutine body.
}
}
}
How can I pass an object from the ResultSet into the Sequence?
Use Flow for asynchronous data streams
The reason you can't call yield inside your RowCallbackHandler object is twofold.
The processRow function isn't a suspending function (and can't be, because it's declared in and called by Java). A suspending function like yield can only be called by another suspending function.
A sequence always ends when the sequence { ... } builder returns. Even if you and I know that the query method will invoke the RowCallbackHandler before returning from the sequence, the Kotlin compiler has no way of knowing that. Yielding sequence values from functions and objects other than the body of the sequence itself is never allowed, because there's no way of knowing where or when they will run.
To solve this problem, we need to introduce a different kind of coroutine: one that can suspend itself while it waits for the RowCallbackHandler to be invoked.
Unfortunately, because we're talking about JDBC here, there may not be much to gain by introducing full-blown coroutines. Under the hood, calls to the database will always be made in a blocking way, removing a lot of the benefit. It might well be simpler not to try and 'stream' results, and just iterate over them in a boring, old-fashioned way. But let's explore the possibilities all the same.
The problem with sequences
Sequences are designed for on-demand computation, and are not asynchronous. They can't wait for other asynchronous operations, such as callbacks. The sequence builder's yield function simply suspends while waiting for the caller to retrieve the next item, and it's the only suspending function a sequence is ever allowed to call. You can demonstrate this if you try to use a simple suspending call like delay inside a sequence. You'll get a compile error letting you know that you're operating in a restricted coroutine scope.
sequence<String> { delay(1000) } // doesn't compile
Without the ability to call suspending functions, there's no way to wait for a callback to be invoked. Recognising this limitation, Kotlin provides an alternative mechanism for streams of on-demand values that do provide data in an asynchronous way. It's called a Flow.
Callback flows
The mechanism for using Flows to provide values from a callback interface is described very nicely by Roman Elizarov in his Medium article Callbacks and Kotlin Flows.
If you did want to use a callback flow, you'd simply replace sequence with callbackFlow, and replace yield with sendBlocking.
Your code might look something like this:
fun findManyObjects(): Flow<Thing> = callbackFlow {
val rowHandler = object : RowCallbackHandler {
override fun processRow(resultSet: ResultSet) {
val thing = // create from resultSet
sendBlocking(thing)
}
}
jdbcTemplate.query("select * from ...", rowHandler)
close() // the query is finished, so there are no more rows
}
A simpler flow
While that's the idiomatic way to stream values provided by a callback, it might not be the simplest approach to this problem. By avoiding callbacks altogether, you can use the much more common flow builder, passing each value to its emit function. But now that you have asynchrony in the form of coroutines, you can't just return a flow and then allow Spring to immediately close the result set. You need to be able to delay the closing of the result set until the flow has actually been consumed. That means peeling back the abstractions provided by RowCallbackHandler or ResultSetExtractor, which expect to process all the results in a blocking way, and instead providing your own implementation.
fun Connection.findManyObjects(): Flow<Thing> = flow {
prepareStatement("select * from ...").use { statement ->
statement.executeQuery().use { resultSet ->
while (resultSet.next()) {
val thing = // create from resultSet
emit(thing)
}
}
}
}
Note the use blocks, which will deal with closing the statement and result set. Because we don't reach the end of the use blocks until the while loop has completed and all the values have been emitted, the flow is free to suspend while the result set remains open.
So why use a flow at all?
You might notice that if you do it this way, you can actually replace flow and emit with sequence and yield. So have we come full circle? Well, sort of. The difference is that a flow can only be consumed from a coroutine, whereas with sequence, you can iterate over the resulting values without suspending at all. In this particular case, it's a hard call to make, because JDBC operations are always blocking.
If you use a sequence, the calling thread will block as it waits to receive the data. Values in a sequence are always computed by the thing consuming the sequence, so if the sequence invokes a blocking function, the consumer's thread will block waiting for the value. In a non-coroutine application, that might be okay, but if you're using coroutines, you really want to avoid hiding blocking calls inside innocuous-looking sequences.
If you use a flow, you can at least isolate the blocking calls by having the flow run on a particular dispatcher. For example, you could use the built-in IO dispatcher to perform the JDBC call, then switch back to the default dispatcher for any further processing. If you definitely want to stream values, I think this is a better approach than using a sequence.
With all this in mind, you'll need to be careful with your use of coroutines and dispatchers if you do choose one of these solutions. If you'd rather not worry about that, there's nothing wrong with using a regular ResultSetExtractor and forgetting about both sequences and flows for now.

Are hot non completing database observables a Rx usecase? Side-effect writing issue

I have more of a opinions question, asi if this, what many people do, should be a Rx use case.
In apps there is usually sql database, which is queried by UI as a observable, which emits after the query is loaded + anytime data changes (Room / SqlDelight etc)
Reads sound okay, however, is it possible to have "pure" writes to the database?
Writing to the database might look like this
fun sync() = Completable.fromCallable {
// do something
database.writeSomethingSynchronously()
}
SomeUi {
init {
database.someQueryObservable()
.subscribe { show list }
}
}
Imagine you want to display progressbar while this Completable is in flight.
What is effectively happening here is sideffecting to the database. Which means the opened database observable will re-emit when the data is written, but still before the sync() returns (assuming single threaded for simplicity)
Now there is point in time where there is new data in the UI and the progressbar is shown. (and worse with multithreading timings) This is invalid state.
In imperative world, sync would provide a completion callback, in which one would reload the query manually + show/hide progressbar synchronously. (And somehow block the database change listener for duration of the sync writes?)
Is there a way around this at all?

Running a Mono in background while returning a response when using Spring Webflux

This questions is related to Return immediately in spring web flux but I don't think it's the same (at least the answer there is not satisfactory for me).
I have a function returning a Mono that when invoked starts a long-running job. This function is invoked when a call is made to a Spring Webflux HTTP API. Here's an example:
#PutMapping("/{jobId}")
fun startNewJob(#PathVariable("jobId") jobId: String,
request: ServerHttpRequest): Mono<ResponseEntity<Unit>> {
val longRunningJob : Mono<Job> = startNewJob(jobId)
longRunningJob.map { job ->
val jobUri = generateJobUri(request, job.id)
ResponseEntity.created(jobURI).build<Unit>()
}
}
The problem with the code above is that "201 Created" is created after the long running job is completed. I want to kick-off the longRunningJob in the background and return "201 Created" immediately.
I could perhaps do something like this:
#PutMapping("/{jobId}")
fun startNewJob(#PathVariable("jobId") jobId: String,
request: ServerHttpRequest): Mono<ResponseEntity<Unit>> {
startNewJob(jobId)
.subscribeOn(Schedulers.newSingle("thread"))
.subscribe()
val jobUri = generateJobUri(request, job.id)
val response = ResponseEntity.created(jobURI).build<Unit>()
Mono.just(response)
}
But it doesn't seem very idiomatic to me to have to call subscribe() manually (e.g. intellij is complaining that I call subscribe() in non-blocking scope). Isn't there a better way to compose the two "streams" without using an explicit subscribe? If so how do I modify the startNewJob function above to achieve this?
AFAIK, using one of the subscribe methods is the only way to really start a job in the background with its own lifecycle (not tied to the returned publisher).
If you were to use one of the operators to combine the job publisher and the response publisher (e.g. zip or merge), then the lifecycle of the job publisher would be tied to the response publisher, which is not what you want for a background job.
One thing you might want to consider is kicking off the background job within the response publisher stream, rather than directly in the method body. e.g. via doOnSubscibe or from an operator upstream of the response.
This would tie the start of the background job to the onSubscribe events of the response publisher, but still allow it to complete in the background.
Also note, that if you want to be able to cancel the background job (e.g. maybe during application shutdown), you'll need to save the Disposable returned from subscribe so you can later call dispose on it. This might be better done from some type of BackgroundJobManager that could keep track of all the jobs running.
private static final Scheduler backgroundTaskScheduler = Schedulers.newParallel("backgroundTaskScheduler", 2);
backgroundTaskScheduler.schedule(() -> doBackgroundJob());

Angular/RxJS Should I unsubscribe on every ajax call?

Should I unsubscribe on every ajax call? According to the RxJS contract, I should. Because AJAX calls are not streams or events, once they are done they are done. What is the reason of using RxJS at all in this particular case? Overtime it becomes the mess (I know about takeUntil, that's not the point here).
public remove(data: IData): void {
// unsubscribe from the previous possible call
if (this.dataSubscription &&
this.dataSubscription.unsubscribe) {
this.dataSubscription.unsubscribe();
}
this.dataSubscription = this.dataService
.delete(data.id)
.subscribe(() => {
this.refresh();
});
}
public ngOnDestroy(): void {
// unsubscribe on deletion
if (this.dataSubscription &&
this.dataSubscription.unsubscribe) {
this.dataSubscription.unsubscribe();
}
}
What is the advantage over simple promise, that looks cleaner and destroyed right after execution?
public remove(data: IData): void {
this.dataService
.delete(data.id)
.then(() => {
this.refresh();
});
}
This is DataService code
#Injectable()
export class DataService {
constructor(private _httpClient: HttpClient) { }
public delete(id: number): Observable<IModel> {
return this._httpClient.delete<IModel>(`${this._entityApiUrl}/${id}`);
}
}
Finite, cold Observables usually don't need to be unsubscribed. They work just like Promises in this regard. Assuming you're using Angular's HttpClient in your service, no unsubscription is necessary--it's much like a Promise in that situation.
First off, to clear some things up -- in your Promise example, you are imperatively managing the Promise by assigning it to this.dataSubscription. After that call is made, anything that calls this.dataSubscription.then() an arbitrary amount of time after the HTTP call will receive a Promise.resolve() and invoke that .then() function. The new Promise returned by Promise.resolve() will be cleaned up after it executes, but it's only until your class is destroyed that your this.dataSubscription Promise will be cleaned up.
However, not assigning that Promise as a property is even cleaner:
public remove(data: IData): void {
this.dataService
.delete(data.id)
.then(() => {
this.refresh();
});
}
plus, the Promise will be cleaned up at the end of its scope, not on the destruction of the class.
Observables, at least finite 'Promise-like' ones like this, work in much the same way. You don't need to manage the Subscription returned buy the .subscribe() method imperitavely, as it will execute and then be cleaned up as it's not assigned as a property:
public remove(data: IData): void {
this.dataService
.delete(data.id)
.subscribe(() => {
this.refresh();
});
}
It's a finite Observable and completes after the subscription, so calling subscribe again will return a new Subscription and re-call the Observable's function.
Imperitavely managing those subscriptions is indeed messy and usually a sign things could be done better.
The difference with RXJS's subscription management is that RXJS can become an incredibly powerful tool, one that is useful for way, way more than managing async AJAX calls. You can have hot Observables that publish data to hundreds of subscribers, Subjects that manage their own stream to many subscribers, infinite Observables that never stop emitting, higher-order Observables that manage state and return other Observables, etc. In this case unsubscribing is best practice, but honestly not going to cause performance issues outside of extreme cases.
A good comparison is the Observable.fromEvent() property. Just like it's best practice to use removeEventListener correctly after addEventListener, you should unsubscribe from this Observable correctly. However, just like removeEventListener,...it's not really done all the time and usually doesn't cause issues with today's platforms.
Also, in reference to the 'RxJS contract' that was stated: here's an excerpt from the same doc:
When an Observable issues an OnError or OnComplete notification to its observers, this ends the subscription. Observers do not need to issue an Unsubscribe notification to end subscriptions that are ended by the Observable in this way.
Finite Observables complete themselves after their emissions and don't need to be unsubscribed.
Usually you don't need to unsubscribe from HttpClient calls since all HttpClient calls complete the stream once they receive response from the server. Once an observable stream completes or errors, its the responsibility of the producer to release resources. For more information, read Insider’s guide into interceptors and HttpClient mechanics in Angular. You should unsubscribe only if you want to cancel the request.
Because AJAX calls are not streams or events, once they are done they are done... What is the advantage over simple promise, that looks cleaner and
destroyed right after execution?
AJAX calls are not just one time event. For example, you can have multiple progress events with XHR. Promise resolves with only one value, while HttpClient can emit multiple HTTP events:
export type HttpEvent<T> =
HttpSentEvent | HttpHeaderResponse | HttpResponse<T>| HttpProgressEvent | HttpUserEvent<T>
You don't need to unsubscribe on every ajax call. But then you are losing one of the core benefits of Observables - being able to cancel it.
You really need to think about what your code does and what is your standard workflow. What happens if the delete response takes a long time and user clicks it again, or clicks back, or goes to some other page?
Would you like refresh to still happen (since observable will still keep the callback in the memory) or would you rather cancel it?
It's up to you and your application at the end. By using unsubscribe you save yourself from unplanned side effects.
In your case, it's just a refresh so it's not a big deal. Then again, you will keep it in the memory and it might cause some side effects.

Spring Integration Usage and Approach Validation

I am testing out using Spring Integration to tie together disperate modules within the same Spring-Boot application, for now, and services into a unified flow starting with a single-entry point.
I am looking for the following clarifications with Spring Integration if possible:
Is the below code the right way to structure flows using the DSL?
In "C" below, can i bubble up the result to the "B" flow?
Is using the DSL vs. the XML the better approach?
I am confused as to how to correctly "terminate" a flow?
Flow Overview
In the code below, I am just publishing a page to a destination. The overall flow goes like this.
Publisher flow listens for the payload and splits it into parts.
Content flow filters out pages and splits them into parts.
AWS flow subscribes and handles the part.
File flow subscribes and handles the part.
Eventually, there may be additional and very different types of consumers to the Publisher flow which are not content which is why I split the publisher from the content.
A) Publish Flow (publisher.jar):
This is my "main" flow initiated through a gateway. The intent, is that this serves as the entry point to begin trigger all publishing flows.
Receive the message
Preprocess the message and save it.
Split the payload into individual entries contained in it.
Enrich each of the entries with the rest of the data
Put each entry on the output channel.
Below is the code:
#Bean
IntegrationFlow flowPublish()
{
return f -> f
.channel(this.publishingInputChannel())
//Prepare the payload
.<Package>handle((p, h) -> this.save(p))
//Split the artifact resolved items
.split(Package.class, Package::getItems)
//Find the artifact associated to each item (if available)
.enrich(
e -> e.<PackageEntry>requestPayload(
m ->
{
final PackageEntry item = m.getPayload();
final Publishable publishable = this.findPublishable(item);
item.setPublishable(publishable);
return item;
}))
//Send the results to the output channel
.channel(this.publishingOutputChannel());
}
B) Content Flow (content.jar)
This module's responsibility is to handle incoming "content" payloads (i.e. Page in this case) and split/route them to the appropriate subscriber(s).
Listen on the publisher output channel
Filter the entries by Page type only
Add the original payload to the header for later
Transform the payload into the actual type
Split the page into its individual elements (blocks)
Route each element to the appropriate PubSub channel.
At least for now, the subscribed flows do not return any response - they should just fire and forget but i would like to know how to bubble up the result when using the pub-sub channel.
Below is the code:
#Bean
#ContentChannel("asset")
MessageChannel contentAssetChannel()
{
return MessageChannels.publishSubscribe("assetPublisherChannel").get();
//return MessageChannels.queue(10).get();
}
#Bean
#ContentChannel("page")
MessageChannel contentPageChannel()
{
return MessageChannels.publishSubscribe("pagePublisherChannel").get();
//return MessageChannels.queue(10).get();
}
#Bean
IntegrationFlow flowPublishContent()
{
return flow -> flow
.channel(this.publishingChannel)
//Filter for root pages (which contain elements)
.filter(PackageEntry.class, p -> p.getPublishable() instanceof Page)
//Put the publishable details in the header
.enrichHeaders(e -> e.headerFunction("item", Message::getPayload))
//Transform the item to a Page
.transform(PackageEntry.class, PackageEntry::getPublishable)
//Split page into components and put the type in the header
.split(Page.class, this::splitPageElements)
//Route content based on type to the subscriber
.<PageContent, String>route(PageContent::getType, mapping -> mapping
.resolutionRequired(false)
.subFlowMapping("page", sf -> sf.channel(this.contentPageChannel()))
.subFlowMapping("image", sf -> sf.channel(this.contentAssetChannel()))
.defaultOutputToParentFlow())
.channel(IntegrationContextUtils.NULL_CHANNEL_BEAN_NAME);
}
C) AWS Content (aws-content.jar)
This module is one of many potential subscribers to the content specific flows. It handles each element individually based off of the routed channel published to above.
Subscribe to the appropriate channel.
Handle the action appropriately.
There can be multiple modules with flows that subscribe to the above routed output channels, this is just one of them.
As an example, the the "contentPageChannel" could invoke the below flowPageToS3 (in aws module) and also a flowPageToFile (in another module).
Below is the code:
#Bean
IntegrationFlow flowAssetToS3()
{
return flow -> flow
.channel(this.assetChannel)
.publishSubscribeChannel(c -> c
.subscribe(s -> s
.<PageContent>handle((p, h) ->
{
return this.publishS3Asset(p);
})));
}
#Bean
IntegrationFlow flowPageToS3()
{
return flow -> flow
.channel(this.pageChannel)
.publishSubscribeChannel(c -> c
.subscribe(s -> s
.<Page>handle((p, h) -> this.publishS3Page(p))
.enrichHeaders(e -> e.header("s3Command", Command.UPLOAD.name()))
.handle(this.s3MessageHandler())));
}
First of all there are a lot of content in your question: it's to hard to keep all the info during read. That is your project, so you should be very confident in the subject. But for us that is something new and may just give up even reading not talking already with attempt to answer.
Anyway I'll try to answer to your questions in the beginning, although I feel like you're going to start a long discussion "what?, how?, why?"...
Is the below code the right way to structure flows using the DSL?
It really depends of your logic. That is good idea to distinguish it between logical component, but that might be overhead to sever separate jar on the matter. Looking to your code that seems for me like you still collect everything into single Spring Boot application and just #Autowired appropriate channels to the #Configuration. So, yes, separate #Configuration is good idea, but separate jar is an overhead. IMHO.
In "C" below, can i bubble up the result to the "B" flow?
Well, since the story is about publish-subscribe that is really unusual to wait for reply. How many replies are you going to get from those subscribers? Right, that is the problem - we can send to many subscribers, but we can't get replies from all of them to single return. Let's come back to Java code: we can have several method arguments, but we have only one return. The same is applied here in Messaging. Anyway you may take a look into Scatter-Gather pattern implementation.
Is using the DSL vs. the XML the better approach?
Both are just a high-level API. Underneath there are the same integration components. Looking to your app you'd come to the same distributed solution with the XML configuration. Don't see reason to step back from the Java DSL. At least it is less verbose, for you.
I am confused as to how to correctly "terminate" a flow?
That's absolutely unclear having your big description. If you send to S3 or to File, that is a termination. There is no reply from those components, so no where to go, nothing to do. That is just stop. The same we have with the Java method with void. If you worry about your entry point gateway, so just make it void and don't wait for any replies. See Messaging Gateway for more info.

Resources