Control scheduling priority of asyncio coroutines possible? - python-asyncio

Is there any way to control the scheduling priority among all coroutines that are ready to run?
Specifically, I have several coroutines handling streaming I/O from the network into several queues, a second set of coroutines that ingest the data from the queues into a data structure. These ingestion coroutines signal a third set of coroutines that analyze that data structure whenever new data is ingested.
Data arrival from the network is an infinite stream with a non-deterministic message rate. I want the analysis step to run as soon as new data arrives, but not before all pending data is processed. The problem I see is that depending on the order of scheduling, an analysis coroutine could run before a reader coroutine that also had data ready, so the analysis coroutine can't even check the ingestion queues for pending data because it may not have been read off the network yet, even though those reader coroutines were ready to run.
One solution might be to structure the coroutines into priority groups so that the reader coroutines would always be scheduled before the analysis coroutines if they were both able to run, but I didn't see a way to do this.
Is there a feature of asyncio that can accomplish this prioritization? Or perhaps I'm asking the wrong question and I can restructure the coroutines such that this can't happen (but I don't see it).
-- edit --
Basically I have a N coroutines that look something like this:
while True:
data = await socket.get()
ingestData(data)
self.event.notify()
So the problem I'm running into is that there's no way for me to know that any of the other N-1 sockets have data ready while executing this coroutine so I can't know whether or not I should notify the event. If I could prioritize these coroutines above the analysis coroutine (which is awaiting self.event.wait()) then I could be sure none of them were runnable when the analysis coroutine is scheduled.

asyncio doesn't support explicitly specifying coroutine priorities, but it is straightforward to achieve the same effect them with the tools provided by the library. Given the example in your question:
async def process_pending():
while True:
data = await socket.get()
ingestData(data)
self.event.notify()
You could await the sockets directly using asyncio.wait, and then you would know which sockets are actionable, and only notify the analyzers after all have been processed. For example:
def _read_task(self, socket):
loop = asyncio.get_event_loop()
task = loop.create_task(socket.get())
task.__process_socket = socket
return task
async def process_pending_all(self):
tasks = {self._read_task(socket) for socket in self.sockets}
while True:
done, not_done = await asyncio.wait(
tasks, return_when=asyncio.FIRST_COMPLETED)
for task in done:
ingestData(task.result())
not_done.add(self._read_task(task.__process_socket))
tasks = not_done
self.event.notify()

Related

Long running async method vs firing an event upon completion

I have to create a library that communicates with a device via a COM port.
In the one of the functions, I need to issue a command, then wait for several seconds as it performs a test (it varies from 10 to 1000 seconds) and return the result of the test:
One approach is to use async-await pattern:
public async Task<decimal> TaskMeasurementAsync(CancellationToken ctx = default)
{
PerformTheTest();
// Wait till the test is finished
await Task.Delay(_duration, ctx);
return ReadTheResult();
}
The other that comes to mind is to just fire an event upon completion.
The device performs a test and the duration is specified prior to performing it. So in either case I would either have to use Task.Delay() or Thread.Sleep() in order to wait for the completion of the task on the device.
I lean towards async-await as it easy to build in the cancellation and for the lack of a better term, it is self contained, i.e. I don't have to declare an event, create a EventArgs class etc.
Would appreciate any feedback on which approach is better if someone has come across a similar dilemma.
Thank you.
There are several tools available for how to structure your code.
Events are a push model (so is System.Reactive, a.k.a. "LINQ over events"). The idea is that you subscribe to the event, and then your handler is invoked zero or more times.
Tasks are a pull model. The idea is that you start some operation, and the Task will let you know when it completes. One drawback to tasks is that they only represent a single result.
The coming-soon async streams are also a pull model - one that works for multiple results.
In your case, you are starting an operation (the test), waiting for it to complete, and then reading the result. This sounds very much like a pull model would be appropriate here, so I recommend Task<T> over events/Rx.

Run async function in specific thread

I would like to run specific long-running functions (which execute database queries) on a separate thread. However, let's assume that the underlying database engine only allows one connection at a time and the connection struct isn't Sync (I think at least the latter is true for diesel).
My solution would be to have a single separate thread (as opposed to a thread pool) where all the database-work happens and which runs as long as the main thread is alive.
I think I know how I would have to do this with passing messages over channels, but that requires quite some boilerplate code (e.g. explicitly sending the function arguments over the channel etc.).
Is there a more direct way of achieving something like this with rust (and possibly tokio and the new async/await notation that is in nightly)?
I'm hoping to do something along the lines of:
let handle = spawn_thread_with_runtime(...);
let future = run_on_thread!(handle, query_function, argument1, argument2);
where query_function would be a function that immediately returns a future and does the work on the other thread.
Rust nightly and external crates / macros would be ok.
If external crates are an option, I'd consider taking a look at actix, an Actor Framework for Rust.
This will let you spawn an Actor in a separate thread that effectively owns the connection to the DB. It can then listen for messages, execute work/queries based on those messages, and return either sync results or futures.
It takes care of most of the boilerplate for message passing, spawning, etc. at a higher level.
There's also a Diesel example in the actix documentation, which sounds quite close to the use case you had in mind.

How to implement a scatter-gather pattern in an ASP.NET web application?

Suppose an ASP.NET WebAPI request arrives at a controller method.
Suppose the request represents an 'event' which needs processed. The event has multiple operations associated with it that should be performed in parallel. For example, each operation may need to call out to a particular REST endpoint on other servers, which are I/O bound operations that should get started as soon as possible and should not wait for one to return before starting the next one.
What is the most correct/performant way to implement this pattern?
I've read that using Task.Run is a bad idea, because it just grabs additional ThreadPool threads, leaving the main request thread idle/blocked. While that makes sense if I was running a single task, I'm not sure that advice applies in this case.
For example, if the event has 4 operations that needed completed (each having possibly multiple I/O bound calls of their own), I would call Task.Run in a loop 4 times to initialize each operation, then wait on the resulting tasks with Task.WaitAll.
Q1: Would the main request thread be returned to the ThreadPool for use by another request while waiting for Task.WaitAll to return, or would it just hog the main thread leaving it idle until Task.WaitAll completes?
Q2: If it hogs the main thread, could that be resolved by marking the controller method with the async keyword, and using an await Task.WhenAll call instead? I'd imaging that this would return the main thread to the pool while waiting, allowing it to be used for other requests or event operations.
Q3: Since Task.Run queues up a work item that could be blocked on an I/O bound call, would performance improve if the operations were all implemented with async and used await calls on Task-based asynchronous I/O methods?
Regarding the whole approach of using Task.Run for the event's operations, the goal is just get all of the operation's I/O bound calls started as soon as possible. I suppose if (as in Q3) all operations were async methods, I could just get them all started on the main request thread in a loop, but I'm not sure that would be better than starting them with separate Task.Run calls. Maybe there's a completely different approach that I'm unaware of.

Is a blocking function on an asynchronous api idiomatic?

Is it more idiomatic to have an async api, with a blocking function as the synchronous api that simply calls the async api and waits for an answer before returning, rather than using a non-concurrent api and let the caller run it in their own goroutine if they want it async?
In my current case I have a worker goroutine that reads from a request channel and sends the return value down the response channel (that it got in a request struct from the request channel).
This seems to differ from the linked question since I need the return values, or to synchronize so that I can be sure the api call finishes before I do something else, to avoid race conditions.
For golang, I recommend Effective Go-concurrency. Especially I think everyone using golang need to known the basics of goroutine and parallelization:
Goroutines are multiplexed onto multiple OS threads so if one should block, such as while waiting for I/O, others continue to run. Their design hides many of the complexities of thread creation and management.
The current implementation of the Go runtime dedicates only a single core to user-level processing. An arbitrary number of goroutines can be blocked in system calls, but by default only one can be executing user-level code at any time.

How would you implement a save thread cooperation with signals in ruby 2.0?

I just began to work with threads. I know the theory and understand the main aspects of it, but I've got only a little practice on this topic.
I am looking for a good solution (or pattern, if available) for the following problem.
Assume there should be a transaction component which holds a pool of threads processing tasks from a queue, which is also part of this transaction component.
Each thread of this pool waits until there's a task to do, pops it from the queue, processes it and then waits for the next turn.
Assume also, there are multiple threads adding tasks to this queue. Then I want these threads to suspend until their tasks are processed.
If a task is processed, the thread, which enqueued the processed task, should be made runnable again.
The ruby class Thread provides the methods Thread#stop and Thread#run. However, I read, that you should not use these methods, if you want a stable implementation. And to use some kind of signalling mechanism.
In ruby, there are some classes which deal with synchronization and thread cooperation in general like Thread, Mutex, Monitor, ConditionVariable, etc.
Maybe ConditionVariable could be my friend, because it allows to emit signals, but I'm just not sure.
How would you implement this?
Ruby provides a threadsafe Queue class that will handles some of this for you:
queue.pop
Will block until a value is pushed to the queue. You can have as many threads as you want waiting on the queue in this fashion. If one of the things you push onto the queue is another queue or a condition variable then you could use that to signal task completion.
Threads are notoriously hard to reason about effectively. You may find that an alternative higher level approach such as celluloid easier to work with.

Resources