i am new to asyncio coding, i had a question about how to handle IO in the asyncio event loop.
Given some sort of asynchronous IO stream or session (aiologger, asynchronous sqlalchemy), whenever I await it to send or commit, the event loop jumps out of that local function and into some other function. Cant this lead to IO collisions, like I start a sqlalchemy session, await to add something and the event loop might hop on to another command which begins another session but the other session is already started. Can this even happen in a application because I doubt it does, and if it does how is it usually handled?
Related
Python complains that RuntimeWarning: coroutine 'Queue.put' was never awaited
I have searched and seen that a library like Janus exists to solve such a problem. But on 3.8 does there exist a better way?
update. I was able to use create_task to put the item in the queue but it's either blocking on get or put until some other async event occurs in the system before it stops blocking even though there should now be an item in the queue for it not to need to block. Any ideas why that could be happening? It takes about 10-20s before it automatically unblocks itself but if i send another event it immediately unblocks on the previous event and yet there will be a delay for the current one just the same unless i send another event through.
You are calling create_task from outside the thread that runs the event loop. You should use asyncio.run_coroutine_threadsafe instead:
if result:
# tell asyncio to enqueue the result
fut = asyncio.run_coroutine_threadsafe(
tasks.completed.put(result), loop)
# wait for the result to be enqueued
fut.result()
(You should retrieve the loop while in the main thread and pass it to the thread.)
If your queue is unbounded and you don't need to handle backpressure, you can call put_nowait using call_soon_threadsafe:
if result:
# tell asyncio to enqueue the result
loop.call_soon_threadsafe(
tasks.completed.put_nowait, result)
I was able to use create_task to put the item in the queue but it's either blocking on get or put until some other async event occurs in the system before it stops blocking even though there should now be an item in the queue for it not to need to block.
This is because loop.create_task is not thread-safe, so it doesn't correctly notify the event loop that something has happened.
I noticed the asyncio library has a loop.add_signal_handler(signum, callback, *args) method.
So far I have just been catching unix signals in the main file using the signals module in with my asynchronous code like this:
signal.signal(signal.SIGHUP, callback)
async def main():
...
Is that an oversight on my part?
The add_signal_handler documentation is sparse1, but looking at the source, it appears that the main added value compared to signal.signal is that add_signal_handler will ensure that the signal wakes up the event loop and allow the loop to invoke the signal handler along with other queued callbacks and runnable coroutines.
So far I have just been catching unix signals in the main file using the signals module [...] Is that an oversight on my part?
That depends on what the signal handler is doing. Printing a message or updating a global is fine, but if it is invoking anything in any way related to asyncio, it's most likely an oversight. A signal can be delivered at (almost) any time, including during execution of an asyncio callback, a coroutine, or even during asyncio's own bookkeeping.
For example, the implementation of asyncio.Queue freely assumes that the access to the queue is single-threaded and non-reentrant. A signal handler adding something to a queue using q.put_nowait() would be disastrous if it interrupted an on-going invocation of q.put_nowait() on the same queue. Similar to typical race conditions experienced in multi-threaded code, an interruption in the middle of assignment to _unfinished_tasks might well cause it to get incremented only once instead of twice (once for each put_nowait).
Asyncio code is designed for cooperative multi-tasking, where the points where a function may suspend defined are clearly denoted by the await and related keywords. The add_signal_handler function ensures that your signal handler gets invoked at such a point, and that you're free to implement it as you'd implement any other asyncio callback.
1 When this answer was originally written, the add_signal_handler documentation was briefer than today and didn't cover the difference to signal.signal at all. This question prompted it getting expanded in the meantime.
Is there any way to control the scheduling priority among all coroutines that are ready to run?
Specifically, I have several coroutines handling streaming I/O from the network into several queues, a second set of coroutines that ingest the data from the queues into a data structure. These ingestion coroutines signal a third set of coroutines that analyze that data structure whenever new data is ingested.
Data arrival from the network is an infinite stream with a non-deterministic message rate. I want the analysis step to run as soon as new data arrives, but not before all pending data is processed. The problem I see is that depending on the order of scheduling, an analysis coroutine could run before a reader coroutine that also had data ready, so the analysis coroutine can't even check the ingestion queues for pending data because it may not have been read off the network yet, even though those reader coroutines were ready to run.
One solution might be to structure the coroutines into priority groups so that the reader coroutines would always be scheduled before the analysis coroutines if they were both able to run, but I didn't see a way to do this.
Is there a feature of asyncio that can accomplish this prioritization? Or perhaps I'm asking the wrong question and I can restructure the coroutines such that this can't happen (but I don't see it).
-- edit --
Basically I have a N coroutines that look something like this:
while True:
data = await socket.get()
ingestData(data)
self.event.notify()
So the problem I'm running into is that there's no way for me to know that any of the other N-1 sockets have data ready while executing this coroutine so I can't know whether or not I should notify the event. If I could prioritize these coroutines above the analysis coroutine (which is awaiting self.event.wait()) then I could be sure none of them were runnable when the analysis coroutine is scheduled.
asyncio doesn't support explicitly specifying coroutine priorities, but it is straightforward to achieve the same effect them with the tools provided by the library. Given the example in your question:
async def process_pending():
while True:
data = await socket.get()
ingestData(data)
self.event.notify()
You could await the sockets directly using asyncio.wait, and then you would know which sockets are actionable, and only notify the analyzers after all have been processed. For example:
def _read_task(self, socket):
loop = asyncio.get_event_loop()
task = loop.create_task(socket.get())
task.__process_socket = socket
return task
async def process_pending_all(self):
tasks = {self._read_task(socket) for socket in self.sockets}
while True:
done, not_done = await asyncio.wait(
tasks, return_when=asyncio.FIRST_COMPLETED)
for task in done:
ingestData(task.result())
not_done.add(self._read_task(task.__process_socket))
tasks = not_done
self.event.notify()
I'm trying to figure out how to handle events using coroutines (in Lua). I see that a common way of doing it seems to be creating wrapper functions that yield the current coroutine and then resume it when the thing you're waiting for has occured. That seems like a nice solution, but what about these problems? :
How do you wait for multiple events at the same time, and branch depending on which one comes first? Or should the program be redesigned to avoid such situations?
How to cancel the waiting after a certain period? The event loop can have timeout parameters in its socket send/receive wrappers, but what about custom events?
How do you trigger the coroutine to change its state from outside? For example, I would want a function that when called, would cause the coroutine to jump to a different step, or start waiting for a different event.
EDIT:
Currently I have a system where I register a coroutine with an event, and the coroutine gets resumed with the event name and info as parameters every time the event occurs. With this system, 1 and 2 are not issues, and 3 can solved by having the coro expect a special event name that makes it jump to the different step, and resuming it with that name as an arg. Also custom objects can have methods to register event handlers the same way.
I just wonder if this is considered the right way to use coroutines for event handling. For example, if I have a read event and a timer event (as a timeout for the read), and the read event happens first, I have to manually cancel the timer. It just doesn't seem to fit the sequential nature or handling events with coroutines.
How do you wait for multiple events at the same time, and branch depending on which one comes first?
If you need to use coroutines for this, rather than just a Lua function that you register (for example, if you have a function that does stuff, waits for an event, then does more stuff), then this is pretty simple. coroutine.yield will return all of the values passed to coroutine.resume when the coroutine is resumed.
So just pass the event, and let the script decide for itself if that's the one it's waiting for or not. Indeed, you could build a simple function to do this:
function WaitForEvents(...)
local events = {...}
assert(#... ~= 0, "You must pass at least one parameter")
do
RegisterForAnyEvent(coroutine.running()) --Registers the coroutine with the system, so that it will be resumed when an event is fired.
local event = coroutine.yield()
for i, testEvt in ipairs(events) do
if(event == testEvt) then
return
end
end
until(false)
end
This function will continue to yield until one of the events it is given has been fired. The loop assumes that RegisterForAnyEvent is temporary, registering the function for just one event, so you need to re-register every time an event is fired.
How to cancel the waiting after a certain period?
Put a counter in the above loop, and leave after a certain period of time. I'll leave that as an exercise for the reader; it all depends on how your application measures time.
How do you trigger the coroutine to change its state from outside?
You cannot magic a Lua function into a different "state". You can only call functions and have them return results. So if you want to skip around within some process, you must write your Lua function system to be able to be skippable.
How you do that is up to you. You could have each set of non-waiting commands be a separate Lua function. Or you could just design your wait states to be able to skip ahead. Or whatever.
I'm implementing my app as a drag source. When I call DoDragDrop (Win32 call, not MFC) it enters into a modal loop and I don't get repaint messages in my main window until DoDragDrop returns. Unfortunately if I do a drop in the shell (a file) and the filename is already there the shell asks if I want to replace the file. But since me app is blocked because DoDragDrop hasn't returned it isn't repainting and looks 'frozen'.
Any clues ?
Have you tried a timer? I ran into the same problem with DoDragDrop() and other blocking calls like SHFileOperation() and solved it using a call to SetTimer().
EDIT: If you want more control over DoDragDrop() then a worker thread works well. You can try calling DoDragDrop() in the worker thread, as someone suggested, but I couldn't get the mouse capture to work properly. An easier solution is to call DoDragDrop() in the main thread and have the worker thread periodically post a WM_USER message to the main thread's queue. DoDragDrop() will then retrieve the message and dispatch it to your window's WndProc(), at which time you can perform idle processing for as long as the queue remains empty. If you give the worker thread a lower priority than the main thread, then it will execute and post the WM_USER message as soon as the main thread becomes idle (i.e., as soon as DoDragDrop() finishes processing all user input and calls MsgWaitForMultipleObjects() internally). This method is better than the SetTimer() method because it gives your application full control over the CPU. You don't have to wait up to 10ms (the minimum frequency that SetTimer() allows) after returning from your WM_TIMER handler returns before the next WM_TIMER message arrives.
I suggest running the drag-and-drop operation on a different thread. That way, DoDragDrop() will block the message loop in the new thread rather than the message loop in your UI thread. If you take this approach, you should also consider (off the top of my head):
Any code that might be run from both your main thread and your drag-and-drop thread will need to be re-entrant. As a corollary, you will need to protect any data structures used by both your main thread and your drag-and-drop thread. If your application is already multi-threaded, you should be familiar with these concerns.
You should think about what happens if your user never responds to the shell's dialog box. Can he continue to interact with your UI? Can he invalidate the data that would have been 'dropped' in the pending operation? Can he quit your application?
Seems like the real answer is to implement IAsyncOperation in my data object.