How to convert event-based communication into async/await using asyncio.Future - python-asyncio

I'm trying to expose an event-based communication as a coroutine. Here is an example:
class Terminal:
async def start(self):
loop = asyncio.get_running_loop()
future = loop.create_future()
t = threading.Thread(target=self.run_cmd, args=future)
t.start()
return await future
def run_cmd(self, future):
time.sleep(3) # imitating doing something
future.set_result(1)
But when I run it like this:
async def main():
t = Terminal()
result = await t.start()
print(result)
asyncio.run(main())
I get the following error: RuntimeError: await wasn't used with future
Is it possible to achieve the desired behavior?

There are two issues with your code. One is that the args argument to the Thread constructor requires a sequence or iterable, so you need to write wrap the argument in a container, e.g. args=(future,). Since future is iterable (for technical reasons unrelated to this use case), args=future is not immediately rejected, but leads to the misleading error later down the line.
The other issue is that asyncio objects aren't thread-safe, so you cannot just call future.set_result from another thread. This causes the test program to hang even after fixing the first issue. The correct way to resolve the future from another thread is through the call_soon_threadsafe method on the event loop:
class Terminal:
async def start(self):
loop = asyncio.get_running_loop()
future = loop.create_future()
t = threading.Thread(target=self.run_cmd, args=(loop, future,))
t.start()
return await future
def run_cmd(self, loop, future):
time.sleep(3)
loop.call_soon_threadsafe(future.set_result, 1)
If your thread is really just calling a blocking function whose result you're interested in, consider using run_in_executor instead of manually spawning threads:
class Terminal:
async def start(self):
loop = asyncio.get_running_loop()
return await loop.run_in_executor(None, self.run_cmd)
# Executed in a different thread; `run_in_executor` submits the
# callable to a thread pool, suspends the awaiting coroutine until
# it's done, and transfers the result/exception back to asyncio.
def run_cmd(self):
time.sleep(3)
return 1

Related

Python asyncio: awaiting a future you don't have yet

Imagine that I have a main program which starts many async activities which all wait on queues to do jobs, and then on ctrl-C properly closes them all down: it might look something like this:
async def run_act1_forever():
# this is the async queue loop
while True:
job = await inputQueue1.get()
# do something with this incoming job
def run_activity_1(loop):
# run the async queue loop as a task
coro = loop.create_task(run_act1_forever())
return coro
def mainprogram():
loop = asyncio.get_event_loop()
act1 = run_activity_1(loop)
# also start act2, act3, etc here
try:
loop.run_forever()
except KeyboardInterrupt:
pass
finally:
act1.cancel()
# also act2.cancel(), act3.cancel(), etc
loop.close()
This all works fine. However, starting up activity 1 is actually more complex than this; it happens in three parts. Part 1 is to wait on the queue until a particular job comes in, one time; part 2 is a synchronous part which has to run in a thread with run_in_executor, one time, and then part 3 is the endless waiting on the queue for jobs as above. How do I structure this? My initial thought was:
async def run_act1_forever():
# this is the async queue loop
while True:
job = await inputQueue1.get()
# do something with this incoming job
async def run_act1_step1():
while True:
job = await inputQueue1.get()
# good, we have handled that first task; we're done
break
def run_act1_step2():
# note: this is sync, not async, so it's in a thread
# do whatever, here, and then exit when done
time.sleep(5)
def run_activity_1(loop):
# run step 1 as a task
step1 = loop.create_task(run_act1_step1())
# ERROR! See below
# now run the sync step 2 in a thread
self.loop.run_in_executor(None, run_act1_step2())
# finally, run the async queue loop as a task
coro = loop.create_task(run_act1_forever())
return coro
def mainprogram():
loop = asyncio.get_event_loop()
act1 = run_activity_1(loop)
# also start act2, act3, etc here
try:
loop.run_forever()
except KeyboardInterrupt:
pass
finally:
act1.cancel()
# also act2.cancel(), act3.cancel(), etc
loop.close()
but this does not work, because at the point where we say "ERROR!", we need to await the step1 task and we never do. We can't await it, because run_activity_1 is not an async function. So... what should I do here?
I thought about getting the Future back from calling run_act1_step1() and then using future.add_done_callback to handle running steps 2 and 3. However, if I do that, then run_activity_1() can't return the future generated by run_act1_forever(), which means that mainprogram() can't cancel that run_act1_forever() task.
I thought of generating an "empty" Future in run_activity_1() and returning that, and then making that empty Future "chain" to the Future returned by run_act1_forever(). But Python asyncio doesn't support chaining Futures.
You say that things are difficult because run_activity_1 is not an async function, but don't really detail why it can't be async.
async def run_activity_1(loop):
await run_act1_step1()
await loop.run_in_executor(None, run_act1_step2)
await run_act1_forever()
The returned coroutine won't be the same as the one returned by run_act1_forever(), but cancellation should propagate if you've got as far as executing that step.
With this change, run_activity_1 is no longer returning a task, so the invocation inside mainprogram would need to change to:
act1 = loop.create_task(run_activity_1(loop))
I think you were on the right track when you said, "I thought about getting the Future back from calling run_act1_step1() and then using future.add_done_callback to handle running steps 2 and 3." That's the logical way to structure this application. You have to manage the various returned objects correctly, but a small class solves this problem.
Here is a program similar to your second code snippet. It runs (tested with Python3.10) and handles Ctrl-C gracefully.
Python3.10 issues a deprecation warning when the function asyncio.get_event_loop() is called without a running loop, so I avoided doing that.
Activities.run() creates task1, then attaches a done_callback that starts task2 and the rest of the activities. The Activities object keeps track of task1 and task2 so they can be cancelled. The main program keeps a reference to Activities, and calls cancel_gracefully() to do the right thing, depending on how far the script progressed through the sequence of start-up activities.
Some care needs to be taken to catch the CancelledExceptions; otherwise stuff gets printed on the console when the program terminates.
The important difference between this program and your second code snippet is that this program immediately stores task1 and task2 in variables so they can be accessed later. Therefore they can be cancelled any time after their creation. The done_callback trick is used to launch all the steps in the proper order.
#! python3.10
import asyncio
import time
async def run_act1_forever():
# this is the async queue loop
while True:
await asyncio.sleep(1.0)
# job = await inputQueue1.get()
# do something with this incoming job
print("Act1 forever")
async def run_act1_step1():
while True:
await asyncio.sleep(1.0)
# job = await inputQueue1.get()
# good, we have handled that first task; we're done
break
print("act1 step1 finished")
def run_act1_step2():
# note: this is sync, not async, so it's in a thread
# do whatever, here, and then exit when done
time.sleep(5)
print("Step2 finished")
class Activities:
def __init__(self, loop):
self.loop = loop
self.task1: asyncio.Task = None
self.task2: asyncio.Task = None
def run(self):
# run step 1 as a task
self.task1 = self.loop.create_task(run_act1_step1())
self.task1.add_done_callback(self.run2)
# also start act2, act3, etc here
def run2(self, fut):
try:
if fut.exception() is not None: # do nothing if task1 failed
return
except asyncio.CancelledError: # or if it was cancelled
return
# now run the sync step 2 in a thread
self.loop.run_in_executor(None, run_act1_step2)
# finally, run the async queue loop as a task
self.task2 = self.loop.create_task(run_act1_forever())
async def cancel_gracefully(self):
if self.task2 is not None:
# in this case, task1 has already finished without error
self.task2.cancel()
try:
await self.task2
except asyncio.CancelledError:
pass
elif self.task1 is not None:
self.task1.cancel()
try:
await self.task1
except asyncio.CancelledError:
pass
# also act2.cancel(), act3.cancel(), etc
def mainprogram():
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
acts = Activities(loop)
loop.call_soon(acts.run)
try:
loop.run_forever()
except KeyboardInterrupt:
pass
loop.run_until_complete(acts.cancel_gracefully())
if __name__ == "__main__":
mainprogram()
You can do this with a combination of threading events and asyncio events. You'll need two events, one to signal the first item has arrived. The thread will wait on this event, so it needs to be a threading Event. You'll also need one to signal the thread is finished. Your run_act1_forever coroutine will await this, so it will need to be an asyncio Event. You can then return the task for run_act1_forever normally and cancel it as you need.
Note that when setting the asyncio event from the separate thread you'll need to use loop.call_soon_threadsafe as asyncio Events are not thread safe.
import asyncio
import time
import threading
import functools
from asyncio import Queue, AbstractEventLoop
async def run_act1_forever(inputQueue1: Queue,
thread_done_event: asyncio.Event):
await thread_done_event.wait()
print('running forever')
while True:
job = await inputQueue1.get()
async def run_act1_step1(inputQueue1: Queue,
first_item_event: threading.Event):
print('Waiting for queue item')
job = await inputQueue1.get()
print('Setting event')
first_item_event.set()
def run_act1_step2(loop: AbstractEventLoop,
first_item_event: threading.Event,
thread_done_event: asyncio.Event):
print('Waiting for event...')
first_item_event.wait()
print('Got event, processing...')
time.sleep(5)
loop.call_soon_threadsafe(thread_done_event.set)
def run_activity_1(loop):
inputQueue1 = asyncio.Queue(loop=loop)
first_item_event = threading.Event()
thread_done_event = asyncio.Event(loop=loop)
loop.create_task(run_act1_step1(inputQueue1, first_item_event))
inputQueue1.put_nowait('First item to test the code')
loop.run_in_executor(None, functools.partial(run_act1_step2,
loop,
first_item_event,
thread_done_event))
return loop.create_task(run_act1_forever(inputQueue1, thread_done_event))
def mainprogram():
loop = asyncio.new_event_loop()
act1 = run_activity_1(loop)
# also start act2, act3, etc here
try:
loop.run_forever()
except KeyboardInterrupt:
pass
finally:
act1.cancel()
# also act2.cancel(), act3.cancel(), etc
loop.close()
mainprogram()

Pytest a function which invokes asyncio coroutines

I am trying to write a unit testcase using mock and pytest-asyncio. I have a normal function launching an asyncio coroutine using asyncio.run. [Using python3.7]
import asyncio
async def sample_async(arg2):
# do something
proc = await asyncio.create_subprocess_shell(arg2)
# some more asyncio calls
rc = proc.returncode
return rc
def launcher(arg1, arg2):
if arg1 == "no_asyncio":
# do something
print("No asyncio")
return 0
else:
return_code = asyncio.run(sample_async(arg2))
# do something
return return_code
I was able to write unittest for the asyncio function sample_async but not for launcher. Here is what I have tried:
class AsyncMock(MagicMock):
async def __call__(self, *args, **kwargs):
return super(AsyncMock, self).__call__(*args, **kwargs)
#patch("asyncio.create_subprocess_shell", new_callable=AsyncMock)
def test_launcher(async_shell):
arg1 = "async"
arg2 = "/bin/bash ls"
sample.launcher(arg1, arg2)
async_shell.assert_called_once()
When I try to run the pytest, I keep getting RuntimeError: There is no current event loop in thread 'MainThread'. I can't use #pytest.mark.asyncio for this test as the function launcher is not an asyncio coroutine. What am I doing wrong here?
See here about explicitly creating an event loop:
RuntimeError: There is no current event loop in thread in async + apscheduler
See the documentation for get_event_loop() in particular here:
https://docs.python.org/3/library/asyncio-eventloop.html
If there is no current event loop set in the current OS thread, the OS thread is main, and set_event_loop() has not yet been called, asyncio will create a new event loop and set it as the current one.
Likely, because your OS thread is not main, you need to do some manual init'ing of the loop. Make sure you have a clear mental picture of the event loop and what these functions do in its lifecycle, think of the loop as nothing more than an object with methods and other objects (tasks) it works on. As the docs say, the main OS thread does some "magic" for you that isn't really magic just "behind the scenes".

FastAPI awaits on task behaviour

Why FastAPI doesn't complain about the following task
app.state["my_task"] = asyncio.create_task(my_task())
not being awaited in any part of the code?
app is an instance of FastAPI() whereas app.state a simple dictionary.
I can even app.state["my_task"].cancel() in a shutdown callback.
From the asyncio docs:
When a coroutine function is called, but not awaited (e.g. coro() instead of await coro()) or the coroutine is not scheduled with asyncio.create_task(), asyncio will emit a RuntimeWarning
Thus, it is expected that, when using create_task(), no warning is emmited. This is no exclusive behavior to FastAPI. You can check it in the minimal snippet below:
import asyncio
async def task():
print("foobar")
async def main():
# task() # Emits a RuntimeWarning
# await task() # Does not emmit
x = asyncio.create_task(task()) # Doesn't either
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Edit PS: Calling the coroutine function as-is does not schedule the underlying task (and returns a coroutine object), while calling create_task() does (and returns a Task object). For more info, again, check the asyncio docs.

How to break out of an (asyncio) websocket fetch loop that doesn't have any incoming messages?

This code prints all messages from a websocket connection:
class OrderStreamer:
def __init__(ᬑ):
ᬑ.terminate_flag = False
# worker thread to receive data stream
ᬑ.worker_thread = threading.Thread(
target=ᬑ.worker_thread_func,
daemon=True
)
def start_streaming(ᬑ, from_scheduler = False):
ᬑ.worker_thread.start()
def terminate(ᬑ):
ᬑ.terminate_flag = True
def worker_thread_func(ᬑ):
asyncio.run(ᬑ.aio_func()) # blocks
async def aio_func(ᬑ):
async with \
aiohttp.ClientSession() as session, \
session.ws_connect(streams_url) as wsock, \
anyio.create_task_group() as tg:
async for msg in wsock:
print(msg.data)
if ᬑ.terminate_flag:
await wsock.close()
The problem is that if no messages arrive, the loop never gets the chance to check terminate_flag and never exits.
I tried creating an external reference to the runloop and websocket:
async with \
aiohttp.ClientSession() as session, \
session.ws_connect(streams_url) as wsock, \
anyio.create_task_group() as tg:
ᬑ.wsock = wsock
ᬑ.loop = asyncio.get_event_loop()
... and modifying my terminate function:
def terminate(ᬑ):
# ᬑ.loop.stop()
asyncio.set_event_loop(ᬑ.loop)
async def kill():
await ᬑ.wsock.close()
asyncio.run(kill())
... but it does not work.
I can't afford to rearchitect my entire application to use asyncio at this point in time.
How to break out of the loop?
You should use asyncio.wait_for or asyncio.wait and call wsock.__anext__() directly instead of using async for loop.
The loop with asyncio.wait should look something like this:
next_message = asyncio.create_task(wsock.__anext__())
while not self.terminate_flag:
await asyncio.wait([next_message], timeout=SOME_TIMEOUT,)
if next_message.done():
try:
msg = next_message.result()
except StopAsyncIteration:
break
else:
print(msg.data)
next_message = asyncio.create_task(wsock.__anext__())
SOME_TIMEOUT should be replaced with the amount of seconds you want to wait continuously for the next incoming message
Here is the documentation for asyncio.wait
P.S. I replaced ᬑ with self, but I hope you get the idea
Note that to read data you should not create a new task as mentioned here:
Reading from the WebSocket (await ws.receive()) must only be done inside the request handler task;
You can simply use timeout.
async def handler(request):
ws = web.WebSocketResponse() # or web.WebSocketResponse(receive_timeout=5)
await ws.prepare(request)
while True:
try:
msg = await ws.receive(timeout=5)
except asyncio.TimeoutError:
print('TimeoutError')
if your_terminate_flag is True:
break
aiohttp/web_protocol.py/_handle_request() will dump errors if you don't write try/except or don't catch the right exception. Try testing except Exception as err: or check its source code.

Why does using asyncio.ensure_future for long jobs instead of await run so much quicker?

I am downloading jsons from an api and am using the asyncio module. The crux of my question is, with the following event loop as implemented as this:
loop = asyncio.get_event_loop()
main_task = asyncio.ensure_future( klass.download_all() )
loop.run_until_complete( main_task )
and download_all() implemented like this instance method of a class, which already has downloader objects created and available to it, and thus calls each respective download method:
async def download_all(self):
""" Builds the coroutines, uses asyncio.wait, then sifts for those still pending, loops """
ret = []
async with aiohttp.ClientSession() as session:
pending = []
for downloader in self._downloaders:
pending.append( asyncio.ensure_future( downloader.download(session) ) )
while pending:
dne, pnding= await asyncio.wait(pending)
ret.extend( [d.result() for d in dne] )
# Get all the tasks, cannot use "pnding"
tasks = asyncio.Task.all_tasks()
pending = [tks for tks in tasks if not tks.done()]
# Exclude the one that we know hasn't ended yet (UGLY)
pending = [t for t in pending if not t._coro.__name__ == self.download_all.__name__]
return ret
Why is it, that in the downloaders' download methods, when instead of the await syntax, I choose to do asyncio.ensure_future instead, it runs way faster, that is more seemingly "asynchronously" as I can see from the logs.
This works because of the way I have set up detecting all the tasks that are still pending, and not letting the download_all method complete, and keep calling asyncio.wait.
I thought that the await keyword allowed the event loop mechanism to do its thing and share resources efficiently? How come doing it this way is faster? Is there something wrong with it? For example:
async def download(self, session):
async with session.request(self.method, self.url, params=self.params) as response:
response_json = await response.json()
# Not using await here, as I am "supposed" to
asyncio.ensure_future( self.write(response_json, self.path) )
return response_json
async def write(self, res_json, path):
# using aiofiles to write, but it doesn't (seem to?) support direct json
# so converting to raw text first
txt_contents = json.dumps(res_json, **self.json_dumps_kwargs);
async with aiofiles.open(path, 'w') as f:
await f.write(txt_contents)
With full code implemented and a real API, I was able to download 44 resources in 34 seconds, but when using await it took more than three minutes (I actually gave up as it was taking so long).
When you do await in each iteration of for loop it will await to download every iteration.
When you do ensure_future on the other hand it doesn't it creates task to download all the files and then awaits all of them in second loop.

Resources