what is a `Scheduler` in RxJS - rxjs

I'v seen the term Scheduler very frequently in the documentation.
But, what does this term mean? I even don't know how to use a so called Scheduler. The official documentation didn't tell me what a Scheduler exactly is. Is this just a common concept or a specific concept in RxJS?

Rx schedulers provide an abstraction that allows work to be scheduled to run, possibly in the future, without the calling code needing to be aware of the mechanism used to schedule the work.
Whenever an Rx method needs to generate a notification, it schedules the work on a scheduler. By supplying a scheduler to the Rx method instead of using the default, you can subtly control how those notifications are sent out.
In server-side implementations of Rx (such as Rx.NET), schedulers play an important role. They allow you to schedule heavy duty work on the thread pool or dedicated threads, and run the final subscription on the UI thread so you can update your UI.
When using RxJs, it is actually pretty rare that you need to worry about the scheduler argument to most methods. Since JavaScript is essentially single-threaded, there are not a lot of options for scheduling and the default schedulers are usually the right choice.
The only real choices are:
immediateScheduler - Runs the work synchronously and immediately. Sort of like not using a scheduler at all. Work scheduled thus is guaranteed to run synchronously.
currentThreadScheduler - Similar to immediateScheduler in that the work is run immediately. However, it does not run work recursively. So, if the work is running and schedules more work, then that additional work is put into queue to be run after the current work finishes. Thus work sometimes runs synchronously and sometimes asynchronously. This scheduler is useful to avoid stack overflows or infinite recursion. For example Rx.Observable.of(42).repeat().subscribe() would cause infinite recursion if it ran on the immediate scheduler, but since return runs on the currentThread scheduler by default, infinite recursion is avoided.
timeoutScheduler - The only scheduler that supports work scheduled to be run in the future. Essentially uses setTimeout to schedule all work (though if you schedule the work to be run "now", then it uses other faster asynchronous methods to schedule the work). Any work scheduled on this scheduler is guaranteed to be run asynchronously.
There may be some more now, such as a scheduler that schedules work on the browser animation frames, etc.
If you are trying to write testable code, then you almost always want to supply the scheduler argument. This is because in your unit tests, you will be creating testScheduler instances, which will let your unit test control the clock used by your Rx code (and thus control the exact timing of the operations).

Related

Invoke Mono.block() through "nioEventloopGroup-*" threads would end up leading all the threads hang

The project I am working for is using Spring WebFlux. I came across a very odd issue.
The detail is that some of pieces of code are purely wrote in Reactor style (couples of Flux/Mono pipelines), however, in a inner publishers, I have to call a method where there is "Mono.block()" inside.
The weird thing I aware is that the whole service would become totally stuck, and when I captured a thread dump, I saw all those "nioEventLoopGroup-*" threads were hung.
A fun fact is that if I leverage a "simple" thread (new Thread(..)) to call the method (there is .block inside), everything works fine.
So my question is that, are those "nioEventLoopGroup-*" threads not allowed to call any blocking code.
Sorry for asking a dumb question, but it's blocking issue for now, so I am looking forward your insight.
Reactor, by default, uses a fixed size thread pool. When you use block(), the actual work needs to be done in some thread or another, which depends on the nature of the subscription and the Mono/Flux. Most likely a set of new tasks will be scheduled on the same scheduler, but block() will suspend its thread, waiting for those tasks to complete, so there is one fewer thread for those other tasks to be scheduled on. Evidently you have enough of these calls to exhauast the entire thread pool. All your block() calls are waiting for other tasks to complete, but there are no threads available for them to be run on.
There's no reason to call block() inside a mapping in a reactive stream. There are always other ways of achieving the same goal without blocking - flatMap(), zip() etc etc.

How to keep webserver responsive while executing many asynchronous background tasks

I am working on a web application that provides its users to optionally execute long-running processes 'in background'. An example would be some long-running report generation, or deleting thousands of objects simultaneously.
I've implemented this using an ExecutorService defined as FixedThreadPool using a ThreadFactory. The ThreadFactory is built like this:
ThreadFactoryBuilder()
.setNameFormat(clientId + "-BackgroundTask-%d")
.setDaemon(true)
.setPriority(Thread.MIN_PRIORITY)
.build()
I execute the task like this:
Future<TaskStatus> future = clientExecutors.get(clientId).submit(
backgroundTask::execute);
taskFutures.put(backgroundTask.getTaskId(), future);
How can I enforce my webserver to always priorize handling new incoming requests (as fast as possible) over executing background tasks?
In other words: It should never ever happen, that a user has to wait long time while browsing the site, just because there are a lot of background-tasks executing. As you can see from above, I tried to do this by setting .setPriority(Thread.MIN_PRIORITY). However that does not seem to be sufficient.
Furthermore, as for now, I've set some arbitrary value for the FixedThreadPool size (10) and use it globally for the entire background-handling of the application (and all its customers).
Instead I would like to define a threadpool for each customer, to make sure each customer has the same privilege to run a certain amount of tasks in the background. Say, each customer has a FixedThreadPool of size 5, and on the server I'll have a max. of 50 different customers. That would add up to 250 running background tasks at the same time.
The most important requirement here is: it does not matter, how long these background-tasks need to execute (say 2 minutes, or 20 minutes). What is important, is that each customer has the ability to send 5 tasks to be executed in background, and each of those are worked on equally.
I've tested running 30 cpu-intensive background tasks and it turns out that while these are running and cpu is near 100%, new incoming requests take a very long time to be handled.
So obviously, I am doing it wrong.
Update 12.09.2017
I've read about microservices and while it sounds great I see a great challenge in splitting the necessary parts from our monolithic application. Mostly because nearly every operation might turn into a long running process given a big enough data selection.
Furthermore, wouldn't I run into the same problem with my microservice, i.e. the server running the microservice would suffer the same performance degradation. Well the only good thing would, that the rest of the web app would not suffer from it anymore.
I've read some posts about introducing Thread.sleep(1) or Thread.sleep in general into CPU-heavy operations to reduce the amount of CPU used in these operations. I've also read about someone who introduced this as an aspect so that he can even change the amount of time waited dynamically in order to have some control about how much cpu would be used.
However, my gut tells me that ain't right either. What do you think about introducing Thread.sleep to lower the amount of CPU used for a task? Is this common practice? If not, what would be the right approach?
I would highly consider changing your system architecture to offload these long-running requests to a separate instance instead of running them in-process with the general request-service application. In general I think it is an anti-pattern to handle both batch / online (or long / short running) processing in the same application instance.
Ideally you'd build a standalone microservice to handle these requests, but you could also simply just deploy X instances of your existing application, and configure your load balancer to route requests to the long running invocation paths (e.g. POST /myapp/longrunningjob) only to the instances dedicated to running these long-running processes.

Is it safe to call the Sidekiq API from inside perform?

Nothing seems to prevent a perform method to use the Sidekiq API. It should be safe in read-only mode.
What if it calls a "write" methods ? Especially when this method acts on the current job itself.
We would like to reschedule a job without creating a new job because we need to track the job completion with the sidekiq-status gem from another worker.
Using MyWorker.perform_in or MyWorker.perform_at to reschedule the job from inside the worker creates a new job, making it difficult to track the total completion. We're thinking of using Sidekiq::ScheduledSet.new.find and the reschedule method but it seems awkward and potentially dangerous to reschedule a job that is about to complete.
Does Sidekiq and its API support this use case ?
You might be able to hack something together but it'll be really slow if you try to modify the Sets and Lists in Redis directly. They aren't designed to be used that way.
The official Sidekiq solution to this problem is a Batch.
https://github.com/mperham/sidekiq/wiki/Batches#status
You create a one-job batch. If the job needs to be rescheduled, it adds a new job to the Batch to be executed later. Your other worker just checks the status of the overall Batch and if it is 100% complete.

How is wait_for_completion different from wakeup_interruptible

How is wait_for_completion different from wakeup_interruptible?
Actually the question is how completion chains is different from wait queues ?
It looks the same concept to me
completion structure internally uses the wait queues and locks.
completion structure was introduced to address a very common occurring scenario, where multiple threads are waiting on some event. Once that event happens, you want only one of the waiting thread to start running.
The key here is that kernel developers don't have to implement and maintain the waiting queue , which makes life of a kernel developer easy.
Adding on Harman answer, I would also say that those two functions are called in different context: wakeup_interruptible() will wake up all threads waiting on a wait_queue, whereas wait_for_completion() will wait until a specific task completes. Those are two different things to me.

Clarification on Threads and Run Loops In Cocoa

I'm trying to learn about threading and I'm thoroughly confused. I'm sure all the answers are there in the apple docs but I just found it really hard to breakdown and digest. Maybe somebody could clear a thing or 2 up for me.
1)performSelectorOnMainThread
Does the above simply register an event in the main run loop or is it somehow a new thread even though the method says "mainThread"? If the purpose of threads is to relieve processing on the main thread how does this help?
2) RunLoops
Is it true that if I want to create a completely seperate thread I use
"detachNewThreadSelector"? Does calling start on this initiate a default run loop for the thread that has been created? If so where do run loops come into it?
3) And Finally , I've seen examples using NSOperationQueue. Is it true to say that If you use performSelectorOnMainThread the threads are in a queue anyway so NSOperation is not needed?
4) Should I forget about all of this and just use the Grand Central Dispatch instead?
Run Loops
You can think of a Run Loop to be an event processing for-loop associated to a thread. This is provided by the system for every thread, but it's only run automatically for the main thread.
Note that running run loops and executing a thread are two distinct concepts. You can execute a thread without running a run loop, when you're just performing long calculations and you don't have to respond to various events.
If you want to respond to various events from a secondary thread, you retrieve the run loop associated to the thread by
[NSRunLoop currentRunLoop]
and run it. The events run loops can handle is called input sources. You can add input sources to a run-loop.
PerformSelector
performSelectorOnMainThread: adds the target and the selector to a special input source called performSelector input source. The run loop of the main thread dequeues that input source and handles the method call one by one, as part of its event processing loop.
NSOperation/NSOperationQueue
I think of NSOperation as a way to explicitly declare various tasks inside an app which takes some time but can be run mostly independently. It's easier to use than to detach the new thread yourself and maintain various things yourself, too. The main NSOperationQueue automatically maintains a set of background threads which it reuses, and run NSOperations in parallel.
So yes, if you just need to queue up operations in the main thread, you can do away with NSOperationQueue and just use performSelectorOnMainThread:, but that's not the main point of NSOperation.
GCD
GCD is a new infrastructure introduced in Snow Leopard. NSOperationQueue is now implemented on top of it.
It works at the level of functions / blocks. Feeding blocks to dispatch_async is extremely handy, but for a larger chunk of operations I prefer to use NSOperation, especially when that chunk is used from various places in an app.
Summary
You need to read Official Apple Doc! There are many informative blog posts on this point, too.
1)performSelectorOnMainThread
Does the above simply register an event in the main run loop …
You're asking about implementation details. Don't worry about how it works.
What it does is perform that selector on the main thread.
… or is it somehow a new thread even though the method says "mainThread"?
No.
If the purpose of threads is to relieve processing on the main thread how does this help?
It helps you when you need to do something on the main thread. A common example is updating your UI, which you should always do on the main thread.
There are other methods for doing things on new secondary threads, although NSOperationQueue and GCD are generally easier ways to do it.
2) RunLoops
Is it true that if I want to create a completely seperate thread I use "detachNewThreadSelector"?
That has nothing to do with run loops.
Yes, that is one way to start a new thread.
Does calling start on this initiate a default run loop for the thread that has been created?
No.
I don't know what you're “calling start on” here, anyway. detachNewThreadSelector: doesn't return anything, and it starts the thread immediately. I think you mixed this up with NSOperations (which you also don't start yourself—that's the queue's job).
If so where do run loops come into it?
Run loops just exist, one per thread. On the implementation side, they're probably lazily created upon demand.
3) And Finally , I've seen examples using NSOperationQueue. Is it true to say that If you use performSelectorOnMainThread the threads are in a queue anyway so NSOperation is not needed?
These two things are unrelated.
performSelectorOnMainThread: does exactly that: Performs the selector on the main thread.
NSOperations run on secondary threads, one per operation.
An operation queue determines the order in which the operations (and their threads) are started.
Threads themselves are not queued (except maybe by the scheduler, but that's part of the kernel, not your application). The operations are queued, and they are started in that order. Once started, their threads run in parallel.
4) Should I forget about all of this and just use the Grand Central Dispatch instead?
GCD is more or less the same set of concepts as operation queues. You won't understand one as long as you don't understand the other.
So what are all these things good for?
Run loops
Within a thread, a way to schedule things to happen. Some may be scheduled at a specific date (timers), others simply “whenever you get around to it” (sources). Most of these are zero-cost when idle, only consuming any CPU time when the thing happens (timer fires or source is signaled), which makes run loops a very efficient way to have several things going on at once without any threads.
You generally don't handle a run loop yourself when you create a scheduled timer; the timer adds itself to the run loop for you.
Threads
Threads enable multiple things to happen at the exact same time on different processors. Thing 1 can happen on thread A (on processor 1) while thing 2 happens on thread B (on processor 0).
This can be a problem. Multithreaded programming is a dance, and when two threads try to step in the same place, pain ensues. This is called contention, and most discussion of threaded programming is on the topic of how to avoid it.
NSOperationQueue and GCD
You have a thing you need done. That's an operation. You can't have it done on the main thread, or you'd simply send a message like normal; you need to run it in the background, on a secondary thread.
To achieve this, express it as either an NSOperation object (you create a subclass of NSOperation and instantiate it) or a block (or both), then add it to either an NSOperationQueue (NSOperations, including NSBlockOperation) or a dispatch queue (bare block).
GCD can be used to make things happen on the main thread, as well; you can create serial queues and add blocks to them. A serial queue, as its name suggests, will run exactly one block at a time, rather than running a bunch of them in parallel.
So what should I do?
I would not recommend creating threads directly. Use NSOperationQueue or GCD instead; they force you into better thinking habits that will reduce the risk of your threaded code inducing headaches.
For things that run periodically, not fitting into the “thing I need done” model of NSOperations and GCD blocks, consider just using the run loop on the main thread. Chances are, you don't need to put it on a thread after all. A rendering loop in a 3D game, for example, can be a simple timer.

Resources