How to enforce only 1 subscriber per multiple instances of a same Single/Observable? - rxjs

I have this sync modeled as a Single, and only 1 sync can be running at a time.
I'm trying to subscribe the "job" on a Schedulers.single() which mostly works, but inside the chain there are schedulers hops (to db writes scheduler), which unblocks the natural queue created by single()
Then I looked at flatMap(maxConcurrency=1) but this won't work, as that requires always the same instance. I.e. from what I understand, some sort of a Subject of sync requests, which however is uncomposable as my usecase mostly looks like this
fun someAction1AndSync(): Single<Unit> {
return someAction1()
.flatMap { sync() }
}
fun someAction2AndSync(): Single<Unit> {
return someAction2()
.flatMap { sync() }
}
...
as you can see, its separate sync Single instances :/
Also note someActionXAndSync should not emit until the sync is also done
Basically I'm looking for coroutines Semaphore

I can think of three ways:
use a single thread for whole sync operation (decoupling through queue)
use semaphore to protect sync method from entering multiple times (not recommended, because will block callee)
fast return, when sync is in progress (AtomicBoolean)
There might by other solutions, which I am not aware of.
fast return, when sync is in progress
Also note someActionXAndSync should not emit until the sync is also done
This solution will not queue up sync requests, but will fail fast. The callee must handle the error appropriately
SyncService
class SyncService {
val isSync: AtomicBoolean = AtomicBoolean(false)
fun sync(): Completable {
return if (isSync.compareAndSet(false, true)) {
Completable.fromCallable { "" }.doOnEvent { isSync.set(false) }
} else {
Completable.error(IllegalStateException("whatever"))
}
}
}
Handling
When sync process is already happening, you will receive an onError. This issue must be handled somehow, because the onError will be emitted to the subscriber. Either you are fine with it, or you could just ignore it with onErrorComplete
fun someAction1AndSync(): Completable {
return Single.just("")
.flatMapCompletable {
sync().onErrorComplete()
}
}
use a single thread for whole sync operation
You have to make sure, that the whole sync-process is processed in a single job. When the sync-process is composed of multiple reactive steps on other threads, it could happen, that another sync process is started, while one sync process is already in progress.
How?
You have to have a scheduler with one thread. Each sync invocation must be invoked from given scheduler. The sync operation must complete sync in one running job.

I would use this:
fun Observable<Unit>.forceOnlyOneSubscriber(): Observable<Unit> {
val subscriberCount = AtomicInteger(0)
return doOnSubscribe { subscriberCount.incrementAndGet() }
.doFinally { subscriberCount.decrementAndGet() }
.doOnSubscribe { check(subscriberCount.get() <= 1) }
}
You can always generify Unit using generics if you need.

Related

Springboot coroutine bean scope or local scope

I have a requirement, where we want to asynchronously handle some upstream request/payload via coroutine. I see that there are several ways to do this, but wondering which is the right approach -
Provide explicit spring service class that implements CoroutineScope
Autowire singleton scope-context backed by certain defined thread-pool dispatcher.
Define method local CoroutineScope object
Following on this question, I'm wondering whats the trade-off if we define method local scopes like below -
fun testSuspensions(count: Int) {
val launchTime = measureTimeMillis {
val parentJob = CoroutineScope(Dispatchers.IO).launch {
repeat(count) {
this.launch {
process() //Some lone running process
}
}
}
}
}
Alternative approach to autowire explicit scope object backed by custom dispatcher -
#KafkaListener(
topics = ["test_topic"],
concurrency = "1",
containerFactory = "someListenerContainerConfig"
)
private fun testKafkaListener(consumerRecord: ConsumerRecord<String, ByteArray>, ack: Acknowledgment) {
try {
this.coroutineScope.launch {
consumeRecordAsync(consumerRecord)
}
} finally {
ack.acknowledge()
}
}
suspend fun consumeRecordAsync(record: ConsumerRecord<String, ByteArray>) {
println("[${Thread.currentThread().name}] Starting to consume record - ${record.key()}")
val statusCode = initiateIO(record) // Add error-handling depending on kafka topic commit semantics.
// Chain any-other business logic (depending on status-code) as suspending functions.
consumeStatusCode(record.key(), statusCode)
}
suspend fun initiateIO(record: ConsumerRecord<String, ByteArray>): Int {
return withContext(Dispatchers.IO) { // Switch context to IO thread for http.
println("[${Thread.currentThread().name}] Executing network call - ${record.key()}")
delay(1000 * 2) // Simulate IO call
200 // Return status-code
}
}
suspend fun consumeStatusCode(recordKey: String, statusCode: Int) {
delay(1000 * 1) // Simulate work.
println("[${Thread.currentThread().name}] consumed record - $recordKey, status-code - $statusCode")
}
Autowiring bean as follows in some upstream config class -
#Bean(name = ["testScope"])
fun defineExtensionScope(): CoroutineScope {
val threadCount: Int = 4
return CoroutineScope(Executors.newFixedThreadPool(threadCount).asCoroutineDispatcher())
}
It depends on what your goal is. If you just want to avoid the thread-per-request model, you can use Spring's support for suspend functions in controllers instead (by using webflux), and that removes the need from even using an external scope at all:
suspend fun testSuspensions(count: Int) {
val execTime = measureTimeMillis {
coroutineScope {
repeat(count) {
launch {
process() // some long running process
}
}
}
}
// all child coroutines are done at this point
}
If you really want your method to return immediately and schedule coroutines that outlive it, you indeed need that extra scope.
Regarding option 1), making custom classes implement CoroutineScope is not encouraged anymore (as far as I understood). It's usually suggested to use composition instead (declare a scope as a property instead of implementing the interface by your own classes). So I would suggest your option 2.
I would say option 3) is out of the question, because there is no point in using CoroutineScope(Dispatchers.IO).launch { ... }. It's no better than using GlobalScope.launch(Dispatchers.IO) { ... } (it has the same pitfalls) - you can read about the pitfalls of GlobalScope in its documentation.
The main problem being that you run your coroutines outside structured concurrency (your running coroutines are not children of a parent job and may accumulate and hold resources if they are not well behaved and you forget about them). In general it's better to define a scope that is cancelled when you no longer need any of the coroutines that are run by it, so you can clean rogue coroutines.
That said, in some circumstances you do need to run coroutines "forever" (for the whole life of your application). In that case it's ok to use GlobalScope, or a custom application-wide scope if you need to customize things like the thread pool or exception handler. But in any case don't create a scope on the spot just to launch a coroutine without keeping a handle to it.
In your case, it seems you have no clear moment when you wouldn't care about the long running coroutines anymore, so you may be ok with the fact that your coroutines can live forever and are never cancelled. In that case, I would suggest a custom application-wide scope that you would wire in your components.

Kotlin coroutines running sequentially even with keyword async

Hi guys i'm trying to improve performance of some computation in my system. Basically I want to generate a series of actions based on some data. This doesn't scale well and I want to try doing this in parallel and getting a result after (a bit like how futures work)
I have an interface with a series of implementations that get a collection of actions. And want to call all these in parallel and await the results at the end.
The issue is that, when I view the logs its clearly doing this sequentially and waiting on each action getter before going to the next one. I thought the async would do this asynchronously, but its not.
The method the runBlocking is in, is within a spring transaction. Maybe that has something to do with it.
runBlocking {
val actions = actionsReportGetters.map { actionReportGetter ->
async {
getActions(actionReportGetter, abstractUser)
}
}.awaitAll().flatten()
allActions.addAll(actions)
}
private suspend fun getActions(actionReportGetter: ActionReportGetter, traderUser: TraderUser): List<Action> {
return actionReportGetter.getActions(traderUser)
}
interface ActionReportGetter {
fun getActions(traderUser: TraderUser): List<Action>
}
Looks like you are doing some blocking operation in ActionReportGetter.getActions in a single threaded environment (probably in the main thread).
For such IO operations you should launch your coroutines in Dispatchers.IO which provides a thread pool with multiple threads.
Update your code to this:
async(Dispatchers.IO) { // Switch to IO dispatcher
getActions(actionReportGetter, abstractUser
}
Also getActions need not be a suspending function here. You can remove the suspend modifier from it.

Sequential execution of Reactive tasks in reactor Java

I'm working on converting a blocking sequential orchestration framework to reactive. Right now, these tasks are dynamic and are fed into the engine by a JSON input. The engine pulls classes and executes the run() method and saves the state with the responses from each task.
How do I achieve the same chaining in reactor? If this was a static DAG, I would have chained it with flatMap or then operators but since it is dynamic, How do I proceed with executing a reactive task and collecting the output from each task?
Examples:
Non reactive interface:
public interface OrchestrationTask {
OrchestrationContext run(IngestionContext ctx);
}
Core Engine
public Status executeDAG(String id) {
IngestionContext ctx = ContextBuilder.getCtx(id);
List<OrchestrationTask> tasks = app.getEligibleTasks(id);
for(OrchestrationTask task : tasks) {
// Eligible tasks are executed sequentially and results are collected.
OrchestrationContext stepContext = task.run(ctx);
if(!evaluateResult(stepContext)) break;
}
return Status.SUCCESS;
}
Following the above example, if I convert tasks to return Mono<?> then, how do I wait or chain other tasks to operate on the result on previous tasks?
Any help is appreciated. Thanks.
Update::
Reactive Task example.
public class SampleTask implements OrchestrationTask {
#Override
public Mono<OrchestrationContext> run(OrchestrationContext context) {
// Im simulating a delay here. treat this as a long running task (web call) But the next task needs the response from the below call.
return Mono.just(context).delayElements(Duration.ofSeconds(2));
}
So i will have a series of tasks that accomplish various things but the response from each task is dependent on the previous and is stored in the Orchestration Context. Anytime an error is occurred, the orchestration context flag will be set to false and the flux should stop.
Sure, we can:
Create the flux from the task list (if it's appropriate to generate the task list reactively then you can replace that arraylist with the flux directly, if not then keep as-is);
flatMap() each task to your task.run() method (which as per the question now returns a Mono;
Ensure we only consume elements while evaluateResult() is true;
...then finally just return the SUCCESS status as before.
So putting all that together, just replace your loop & return statement with:
Flux.fromIterable(tasks)
.flatMap(task -> task.run(ctx))
.takeWhile(stepContext -> evaluateResult(stepContext))
.then(Mono.just(Status.SUCCESS));
(Since we've made it reactive, your method will obviously need to return a Mono<Status> rather than just Status too.)
Update as per the comment - if you just want this to execute "one at a time" rather than with multiple concurrently, you can use concatMap() instead of flatMap().

Reactor Flux conditional emit

Is it possible to allow emitting values from a Flux conditionally based on a global boolean variable?
I'm working with Flux delayUntil(...) but not able to fully grasp the functionality or my assumptions are wrong.
I have a global AtomicBoolean that represents the availability of a downstream connection and only want the upstream Flux to emit if the downstream is ready to process.
To represent the scenario, created a (not working) test sample
//Randomly generates a boolean value every 5 seconds
private Flux<Boolean> signalGenerator() {
return Flux.range(1, Integer.MAX_VALUE)
.delayElements(Duration.ofMillis(5000))
.map(integer -> new Random().nextBoolean());
}
and
Flux.range(1, Integer.MAX_VALUE)
.delayElements(Duration.ofMillis(1000))
.delayUntil(evt -> signalGenerator()) // ?? Only proceed when signalGenerator returns true
.subscribe(System.out::println);
I have another scenario where a downstream process can accept only x messages a second. In the current non-reactive implementation we have a Semaphore of x permits and the thread is blocked if no more permits are available, with Semaphore permits resetting every second.
In both scenarios I want upstream Flux to emit only when there is a demand from the downstream process, and I do not want to Buffer.
You might consider using Mono.fromRunnable() as an input to delayUntil() like below;
Helper class;
public class FluxCondition {
CountDownLatch latch = new CountDownLatch(10); // it depends, might be managed somehow
Runnable r = () -> { latch.await(); }
public void lock() { Mono.fromRunnable(r) };
public void release() { latch.countDown(); }
}
Usage;
FluxCondition delayCondition = new FluxCondition();
Flux.range(1, 10).delayUntil(o -> delayCondition.lock()).subscribe();
.....
delayCondition.release(); // shall call this for each element
I guess there might be a better solution by using sink.emitNext but this might also require a condition variable for controlling Flux flow.
According my understanding, in reactive programming, your data should be considered in every operator step. So it might be better for you to design your consumer as a reactive processor. In my case I had no chance and followed the way as I described above

Creating the instance of Kotlin Coroutine's flow similar to channel or broadcast channel

Similar to channels and broadcast channels, can flows also be instantiated and reused at multiple places?
General usage of creating flows is wrapping the logic to emit the data inside the flow's body and is returned.
Snippet :
fun listenToDataChanges() : Flow<T>
{
return flow {
dataSource.querySomeInfo()?.consumeEach {
data->
if (someCondition) {
emit(data)
}
}
}
}
Everytime listenToDataChanges() is called, a new flow instance is created and multiple subscriptions would be made. Instead is it possible to create and reuse the instance to avoid multiple subscriptions?
Yes, you just need to store it in a variable instead of recreating the flow each time
By the way it seems like you could simplify this way:
val customFlow = dataSource.querySomeInfo()?.filter { someCondition }

Resources