How does work varibles with tasks in asyncio? - python-asyncio

During learning asyncio, I maked misstake, and make link to variable "task1" three times.
import asyncio
import time
async def say_after(delay, what):
await asyncio.sleep(delay)
print(what)
async def main():
print(f"start at {time.strftime('%X')}")
task1 = asyncio.create_task(say_after(1, 'a'))
task1 = asyncio.create_task(say_after(2, 'aba'))
task1 = asyncio.create_task(say_after(3, 'faf'))
await task1
#await task2
print(f"finished at {time.strftime('%X')}")
asyncio.run(main())
But when I ran code, I got this output
start at 19:05:19
a
aba
faf
finished at 19:05:22
Why it don't print only last created task?
Why first and second tasks don't cleared?
In clear python, if I write some like this:
page = 0
page = 1
page = 2
print(page)
I will get "2", because previous links to the objects will removed from the memmory.
Why there it's different? How it works?

You are starting three tasks and waiting for the slowest of the three to complete.
await task1 is not running the task, it is waiting for the third task object assigned to task1 to finish. The other two tasks still exist in the event loop and are still running, these finish first because they are faster.
Another way to look at this is to slightly modify your code. This will do what you expect to start with, and then throw errors because the program finishes before the slower tasks.
task1 = asyncio.create_task(say_after(3, 'a')) # make slowest
task1 = asyncio.create_task(say_after(2, 'aba'))
task1 = asyncio.create_task(say_after(1, 'faf')) # make fastest
Rewriting your code is to remove task1= entirely to make it clear that await task1 has more to do with waiting for the task to finish and little to do with making it run.
asyncio.create_task(say_after(1, 'a'))
asyncio.create_task(say_after(2, 'aba'))
asyncio.create_task(say_after(3, 'faf'))
while len(asyncio.all_tasks()) > 1:
await asyncio.sleep(0.1)
# await task1
A more usual way is to collect references to all tasks and await the list of tasks being complete.
tasks = []
tasks.append(asyncio.create_task(say_after(3, 'a')))
tasks.append(asyncio.create_task(say_after(2, 'aba')))
tasks.append(asyncio.create_task(say_after(1, 'faf')))
await asyncio.gather(*tasks)

Related

Python asyncio: awaiting a future you don't have yet

Imagine that I have a main program which starts many async activities which all wait on queues to do jobs, and then on ctrl-C properly closes them all down: it might look something like this:
async def run_act1_forever():
# this is the async queue loop
while True:
job = await inputQueue1.get()
# do something with this incoming job
def run_activity_1(loop):
# run the async queue loop as a task
coro = loop.create_task(run_act1_forever())
return coro
def mainprogram():
loop = asyncio.get_event_loop()
act1 = run_activity_1(loop)
# also start act2, act3, etc here
try:
loop.run_forever()
except KeyboardInterrupt:
pass
finally:
act1.cancel()
# also act2.cancel(), act3.cancel(), etc
loop.close()
This all works fine. However, starting up activity 1 is actually more complex than this; it happens in three parts. Part 1 is to wait on the queue until a particular job comes in, one time; part 2 is a synchronous part which has to run in a thread with run_in_executor, one time, and then part 3 is the endless waiting on the queue for jobs as above. How do I structure this? My initial thought was:
async def run_act1_forever():
# this is the async queue loop
while True:
job = await inputQueue1.get()
# do something with this incoming job
async def run_act1_step1():
while True:
job = await inputQueue1.get()
# good, we have handled that first task; we're done
break
def run_act1_step2():
# note: this is sync, not async, so it's in a thread
# do whatever, here, and then exit when done
time.sleep(5)
def run_activity_1(loop):
# run step 1 as a task
step1 = loop.create_task(run_act1_step1())
# ERROR! See below
# now run the sync step 2 in a thread
self.loop.run_in_executor(None, run_act1_step2())
# finally, run the async queue loop as a task
coro = loop.create_task(run_act1_forever())
return coro
def mainprogram():
loop = asyncio.get_event_loop()
act1 = run_activity_1(loop)
# also start act2, act3, etc here
try:
loop.run_forever()
except KeyboardInterrupt:
pass
finally:
act1.cancel()
# also act2.cancel(), act3.cancel(), etc
loop.close()
but this does not work, because at the point where we say "ERROR!", we need to await the step1 task and we never do. We can't await it, because run_activity_1 is not an async function. So... what should I do here?
I thought about getting the Future back from calling run_act1_step1() and then using future.add_done_callback to handle running steps 2 and 3. However, if I do that, then run_activity_1() can't return the future generated by run_act1_forever(), which means that mainprogram() can't cancel that run_act1_forever() task.
I thought of generating an "empty" Future in run_activity_1() and returning that, and then making that empty Future "chain" to the Future returned by run_act1_forever(). But Python asyncio doesn't support chaining Futures.
You say that things are difficult because run_activity_1 is not an async function, but don't really detail why it can't be async.
async def run_activity_1(loop):
await run_act1_step1()
await loop.run_in_executor(None, run_act1_step2)
await run_act1_forever()
The returned coroutine won't be the same as the one returned by run_act1_forever(), but cancellation should propagate if you've got as far as executing that step.
With this change, run_activity_1 is no longer returning a task, so the invocation inside mainprogram would need to change to:
act1 = loop.create_task(run_activity_1(loop))
I think you were on the right track when you said, "I thought about getting the Future back from calling run_act1_step1() and then using future.add_done_callback to handle running steps 2 and 3." That's the logical way to structure this application. You have to manage the various returned objects correctly, but a small class solves this problem.
Here is a program similar to your second code snippet. It runs (tested with Python3.10) and handles Ctrl-C gracefully.
Python3.10 issues a deprecation warning when the function asyncio.get_event_loop() is called without a running loop, so I avoided doing that.
Activities.run() creates task1, then attaches a done_callback that starts task2 and the rest of the activities. The Activities object keeps track of task1 and task2 so they can be cancelled. The main program keeps a reference to Activities, and calls cancel_gracefully() to do the right thing, depending on how far the script progressed through the sequence of start-up activities.
Some care needs to be taken to catch the CancelledExceptions; otherwise stuff gets printed on the console when the program terminates.
The important difference between this program and your second code snippet is that this program immediately stores task1 and task2 in variables so they can be accessed later. Therefore they can be cancelled any time after their creation. The done_callback trick is used to launch all the steps in the proper order.
#! python3.10
import asyncio
import time
async def run_act1_forever():
# this is the async queue loop
while True:
await asyncio.sleep(1.0)
# job = await inputQueue1.get()
# do something with this incoming job
print("Act1 forever")
async def run_act1_step1():
while True:
await asyncio.sleep(1.0)
# job = await inputQueue1.get()
# good, we have handled that first task; we're done
break
print("act1 step1 finished")
def run_act1_step2():
# note: this is sync, not async, so it's in a thread
# do whatever, here, and then exit when done
time.sleep(5)
print("Step2 finished")
class Activities:
def __init__(self, loop):
self.loop = loop
self.task1: asyncio.Task = None
self.task2: asyncio.Task = None
def run(self):
# run step 1 as a task
self.task1 = self.loop.create_task(run_act1_step1())
self.task1.add_done_callback(self.run2)
# also start act2, act3, etc here
def run2(self, fut):
try:
if fut.exception() is not None: # do nothing if task1 failed
return
except asyncio.CancelledError: # or if it was cancelled
return
# now run the sync step 2 in a thread
self.loop.run_in_executor(None, run_act1_step2)
# finally, run the async queue loop as a task
self.task2 = self.loop.create_task(run_act1_forever())
async def cancel_gracefully(self):
if self.task2 is not None:
# in this case, task1 has already finished without error
self.task2.cancel()
try:
await self.task2
except asyncio.CancelledError:
pass
elif self.task1 is not None:
self.task1.cancel()
try:
await self.task1
except asyncio.CancelledError:
pass
# also act2.cancel(), act3.cancel(), etc
def mainprogram():
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
acts = Activities(loop)
loop.call_soon(acts.run)
try:
loop.run_forever()
except KeyboardInterrupt:
pass
loop.run_until_complete(acts.cancel_gracefully())
if __name__ == "__main__":
mainprogram()
You can do this with a combination of threading events and asyncio events. You'll need two events, one to signal the first item has arrived. The thread will wait on this event, so it needs to be a threading Event. You'll also need one to signal the thread is finished. Your run_act1_forever coroutine will await this, so it will need to be an asyncio Event. You can then return the task for run_act1_forever normally and cancel it as you need.
Note that when setting the asyncio event from the separate thread you'll need to use loop.call_soon_threadsafe as asyncio Events are not thread safe.
import asyncio
import time
import threading
import functools
from asyncio import Queue, AbstractEventLoop
async def run_act1_forever(inputQueue1: Queue,
thread_done_event: asyncio.Event):
await thread_done_event.wait()
print('running forever')
while True:
job = await inputQueue1.get()
async def run_act1_step1(inputQueue1: Queue,
first_item_event: threading.Event):
print('Waiting for queue item')
job = await inputQueue1.get()
print('Setting event')
first_item_event.set()
def run_act1_step2(loop: AbstractEventLoop,
first_item_event: threading.Event,
thread_done_event: asyncio.Event):
print('Waiting for event...')
first_item_event.wait()
print('Got event, processing...')
time.sleep(5)
loop.call_soon_threadsafe(thread_done_event.set)
def run_activity_1(loop):
inputQueue1 = asyncio.Queue(loop=loop)
first_item_event = threading.Event()
thread_done_event = asyncio.Event(loop=loop)
loop.create_task(run_act1_step1(inputQueue1, first_item_event))
inputQueue1.put_nowait('First item to test the code')
loop.run_in_executor(None, functools.partial(run_act1_step2,
loop,
first_item_event,
thread_done_event))
return loop.create_task(run_act1_forever(inputQueue1, thread_done_event))
def mainprogram():
loop = asyncio.new_event_loop()
act1 = run_activity_1(loop)
# also start act2, act3, etc here
try:
loop.run_forever()
except KeyboardInterrupt:
pass
finally:
act1.cancel()
# also act2.cancel(), act3.cancel(), etc
loop.close()
mainprogram()

Python asyncio, aiohttp: dynamically updating tasks loop should work until particular condition

I have simple script for work with api. I create number_of_user users. I go thru loop, create random user create_random_user(), for every user I create task and append task to loop. Task for user creation async def fetch_user_create after getting response create another task for user log in async def fetch_user_log and add it to all tasks. My question: how can I wait until, for example, len(tasks_user) == 2 * number_of_user.
I tried to place await asyncio.sleep(1) - it's work but depends of number_of_user.
Target: wait until condition Is it possible? Or what I did wrong?
async def fetch_user_log(session, data_user):
async with session.post(url_user_login, data=data_user) as response:
async def fetch_user_create(session, data_user, tasks_user_create):
async with session.post(url_user_create, data=data_user) as response:
task2 = asyncio.ensure_future(fetch_user_log(session, data_user))
tasks_user.append(task2)
#await asyncio.sleep(1)
await response.read()
#await asyncio.gather(*tasks_user) - tried
async def run():
tasks_user = []
async with ClientSession() as session:
for i in range(number_of_user):
data_user = create_random_user()
task = asyncio.ensure_future(fetch_user_create(session, data_user, tasks_user_create))
tasks_user.append(task)
await asyncio.wait(tasks_user)
#await asyncio.gather(*tasks_user) - tried
loop = asyncio.get_event_loop()
future = asyncio.ensure_future(run())
loop.run_until_complete(future)
wait until condition can be implemented with asyncio.Condition:
import asyncio
async def test_wait_for(cond, tasks):
print(".")
async with cond:
await cond.wait_for(lambda: (len(tasks)>3))
print("!")
async def test_add_task(cond, tasks):
for i in range(6):
print(i)
await asyncio.sleep(1)
async with cond:
tasks.append(i)
cond.notify_all()
async def run():
cond = asyncio.Condition()
tasks = []
asyncio.ensure_future(test_wait_for(cond, tasks))
await test_add_task(cond, tasks)
loop = asyncio.get_event_loop()
future = asyncio.ensure_future(run())
loop.run_until_complete(future)
Condition is Event+Lock. Event sent with notify_all unlocks all wait and wait_for coroutines. Lock here also locks tasks array.
Also it can be implemented with asyncio.Event if you move condition to task that send notify.

Asyncio task vs coroutine

Reading the asyncio documentation, I realize that I don't understand a very basic and fundamental aspect: the difference between awaiting a coroutine directly, and awaiting the same coroutine when it's wrapped inside a task.
In the documentation examples the two calls to the say_after coroutine are running sequentially when awaited without create_task, and concurrently when wrapped in create_task. So I understand that this is basically the difference, and that it is quite an important one.
However what confuses me is that in the example code I read everywhere (for instance showing how to use aiohttp), there are many places where a (user-defined) coroutine is awaited (usually in the middle of some other user-defined coroutine) without being wrapped in a task, and I'm wondering why that is the case. What are the criteria to determine when a coroutine should be wrapped in a task or not?
What are the criteria to determine when a coroutine should be wrapped in a task or not?
You should use a task when you want your coroutine to effectively run in the background. The code you've seen just awaits the coroutines directly because it needs them running in sequence. For example, consider an HTTP client sending a request and waiting for a response:
# these two don't make too much sense in parallel
await session.send_request(req)
resp = await session.read_response()
There are situations when you want operations to run in parallel. In that case asyncio.create_task is the appropriate tool, because it turns over the responsibility to execute the coroutine to the event loop. This allows you to start several coroutines and sit idly while they execute, typically waiting for some or all of them to finish:
dl1 = asyncio.create_task(session.get(url1))
dl2 = asyncio.create_task(session.get(url2))
# run them in parallel and wait for both to finish
resp1 = await dl1
resp2 = await dl2
# or, shorter:
resp1, resp2 = asyncio.gather(session.get(url1), session.get(url2))
As shown above, a task can be awaited as well. Just like awaiting a coroutine, that will block the current coroutine until the coroutine driven by the task has completed. In analogy to threads, awaiting a task is roughly equivalent to join()-ing a thread (except you get back the return value). Another example:
queue = asyncio.Queue()
# read output from process in an infinite loop and
# put it in a queue
async def process_output(cmd, queue, identifier):
proc = await asyncio.create_subprocess_shell(cmd)
while True:
line = await proc.readline()
await queue.put((identifier, line))
# create multiple workers that run in parallel and pour
# data from multiple sources into the same queue
asyncio.create_task(process_output("top -b", queue, "top")
asyncio.create_task(process_output("vmstat 1", queue, "vmstat")
while True:
identifier, output = await queue.get()
if identifier == 'top':
# ...
In summary, if you need the result of a coroutine in order to proceed, you should just await it without creating a task, i.e.:
# this is ok
resp = await session.read_response()
# unnecessary - it has the same effect, but it's
# less efficient
resp = await asyncio.create_task(session.read_reponse())
To continue with the threading analogy, creating a task just to await it immediately is like running t = Thread(target=foo); t.start(); t.join() instead of just foo() - inefficient and redundant.

Does await guarantee execution order?

Consider a single-threaded Python program. A coroutine named "first" is blocked on I/O. The subsequent instruction is "await second." Is the coroutine "second" guaranteed to execute immediately until it blocks on I/O? Or, can "first" resume executing (due to the I/O operation completing) before "second" is invoked?
Asyncio implemented a way that second would start executing until it would return control to event loop (it usually happens when it reaches some I/O operation) and only after it first can be resumed. I don't think it somehow guaranteed to you, but hardly believe this implementation will be changed either.
If for some reason you don't want first to resume executing until some part of second reached, it's probably better explicitly to use Lock to block first from executing before moment you want.
Example to show when control returns to event loop and execution flow can be changed:
import asyncio
async def async_print(text):
print(text)
async def first():
await async_print('first 1')
await async_print('first 2')
await asyncio.sleep(0) # returning control to event loop
await async_print('first 3')
async def second():
await async_print('second 1')
await async_print('second 2')
await asyncio.sleep(0) # returning control to event loop
await async_print('second 3')
async def main():
asyncio.ensure_future(first())
asyncio.ensure_future(second())
await asyncio.sleep(1)
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
loop.run_until_complete(main())
finally:
loop.run_until_complete(loop.shutdown_asyncgens())
loop.close()
Output:
first 1
first 2
second 1
second 2
first 3
second 3

Why does using asyncio.ensure_future for long jobs instead of await run so much quicker?

I am downloading jsons from an api and am using the asyncio module. The crux of my question is, with the following event loop as implemented as this:
loop = asyncio.get_event_loop()
main_task = asyncio.ensure_future( klass.download_all() )
loop.run_until_complete( main_task )
and download_all() implemented like this instance method of a class, which already has downloader objects created and available to it, and thus calls each respective download method:
async def download_all(self):
""" Builds the coroutines, uses asyncio.wait, then sifts for those still pending, loops """
ret = []
async with aiohttp.ClientSession() as session:
pending = []
for downloader in self._downloaders:
pending.append( asyncio.ensure_future( downloader.download(session) ) )
while pending:
dne, pnding= await asyncio.wait(pending)
ret.extend( [d.result() for d in dne] )
# Get all the tasks, cannot use "pnding"
tasks = asyncio.Task.all_tasks()
pending = [tks for tks in tasks if not tks.done()]
# Exclude the one that we know hasn't ended yet (UGLY)
pending = [t for t in pending if not t._coro.__name__ == self.download_all.__name__]
return ret
Why is it, that in the downloaders' download methods, when instead of the await syntax, I choose to do asyncio.ensure_future instead, it runs way faster, that is more seemingly "asynchronously" as I can see from the logs.
This works because of the way I have set up detecting all the tasks that are still pending, and not letting the download_all method complete, and keep calling asyncio.wait.
I thought that the await keyword allowed the event loop mechanism to do its thing and share resources efficiently? How come doing it this way is faster? Is there something wrong with it? For example:
async def download(self, session):
async with session.request(self.method, self.url, params=self.params) as response:
response_json = await response.json()
# Not using await here, as I am "supposed" to
asyncio.ensure_future( self.write(response_json, self.path) )
return response_json
async def write(self, res_json, path):
# using aiofiles to write, but it doesn't (seem to?) support direct json
# so converting to raw text first
txt_contents = json.dumps(res_json, **self.json_dumps_kwargs);
async with aiofiles.open(path, 'w') as f:
await f.write(txt_contents)
With full code implemented and a real API, I was able to download 44 resources in 34 seconds, but when using await it took more than three minutes (I actually gave up as it was taking so long).
When you do await in each iteration of for loop it will await to download every iteration.
When you do ensure_future on the other hand it doesn't it creates task to download all the files and then awaits all of them in second loop.

Resources