asyncio.gather with selective return_exceptions - python-asyncio

I want for asyncio.gather to immediately raise any exception except for some particular exception class, which should be instead returned in the results list. Right now, I just slightly modified the canonical implementation of asyncio.gather in CPython and use that, but I wonder if there is not a more canonical way to do it.

You can implement such semantics using the more powerful asyncio.wait primitive and its return_when=asyncio.FIRST_EXCEPTION option:
async def xgather(*coros, allowed_exc):
results = {}
pending = futures = list(map(asyncio.ensure_future, coros))
while pending:
done, pending = await asyncio.wait(
pending, return_when=asyncio.FIRST_EXCEPTION)
for fut in done:
try:
results[fut] = fut.result()
except allowed_exc as e:
results[fut] = e
return [results[fut] for fut in futures]
The idea is to call wait until either all futures are done or an exception is observed. The exception is in turn either stored or propagated, depending on whether it matches allowed_exc. If all the results and allowed exceptions have been successfully collected, they are returned in the correct order, as with asyncio.gather.
The approach of modifying the implementation of asyncio.gather might easily fail on a newer Python version, since the code accesses private attributes of Future objects. Also, alternative event loops like uvloop could make their gather and wait more efficient, which would automatically benefit an xgather based on the public API.
Test code:
import asyncio
async def fail():
1/0
async def main():
print(await xgather(asyncio.sleep(1), fail(), allowed_exc=OSError))
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
When run, the code raises immediately, which is expected ZeroDivisionError doesn't match the allowed OSError exception. Changing OSError to ZeroDivisionError causes the code to sleep for 1 second and output [None, ZeroDivisionError('division by zero',)].

Related

How do I control an event loop?

I can't figure out how to handle an event loop such that I can run other code concurrently. I want to make it so when the handler receives data, it prints it without effecting anything else the program is doing.
I have tried wrapping trading_stream.run in an asyncio task, but this produces an error and isn't what I really want. It's like once I run the stream, my program is stuck inside the update_handler function.
from alpaca.trading.stream import TradingStream
trading_stream = TradingStream('api-key', 'secret-key', paper=True)
async def update_handler(data):
# trade updates will arrive in our async handler
print(data)
# subscribe to trade updates and supply the handler as a parameter
trading_stream.subscribe_trade_updates(update_handler)
# start our websocket streaming
trading_stream.run()
Premise: it would probably be best to understand what event loop is TradingStream using and, if possible, schedule tasks on that loop once retrieved, e.g.
trading_stream = TradingStream('api-key', 'secret-key', paper=True)
evt_loop = trading_stream.some_evt_loop_getter()
evt_loop.create_task(my_concurrent_task)
if TradingStream is using asyncio.get_event_loop() under the hood, then the following is also possible.
import asycio
trading_stream = TradingStream('api-key', 'secret-key', paper=True)
evt_loop = asyncio.get_event_loop()
evt_loop.create_task(my_concurrent_task)
Not being able to assess whether either of the above is the case, the following hack does solve your problem, but I would not resort to this unless the alternatives are not viable.
OTHER_LOGIC_FLAG = True
async def my_other_async_logic():
# Concurrent logic here
async def update_handler(data):
global OTHER_LOGIC_FLAG
if OTHER_LOGIC_FLAG:
asyncio.create_task(my_other_async_logic()
OTHER_LOGIC_FLAG = False
# trade updates will arrive in our async handler
print(data)
Again, do try to get a handle to the event loop first.

When and how to use asyncio queues?

I have multiple api routes which return data by querying database individually.
Now I'm trying to build dashboard which queries above api's. How should I put api calls in the queue so that they are executed asynchronously?
I tried
await queue.put({'response_1': await api_1(**kwargs), 'response_2': await api_2(**kwargs)})
It seems as though data is returned while task is being put in the queue.
Now I'm using
await queue.put(('response_1', api_1(**args_dict)))
in producer and in consumer I'm parsing tuple and making api calls which I think I'm doing wrong .
Question1
Is there a better way to do it?
This is code I'm using to create tasks
producers = [create_task(producer(**args_dict, queue)) for row in stats]
consumers = [create_task(consumer(queue)) for row in stats]
await gather(*producers)
await queue.join()
for con in consumers:
con.cancel()
Question2 Should I use create_task or ensure_future? Sorry if it's repetitive but I can't understand the difference and after searching online I became more confused.
I'm using FastAPI, databases(async) packages.
I'm using tuple instead of dictionary like await queue.put('response_1', api_1(**kwargs))
./app/dashboard.py:90: RuntimeWarning: coroutine 'api_1' was never awaited
item: Tuple = await queue.get_nowait()
My code for consumer is
async def consumer(return_obj: dict, que: Queue):
item: Tuple = await queue.get_nowait()
print(f'consumer took {item[0]} from queue')
return_obj.update({f'{item[0]}': await item[1]})
await queue.task_done()
if I don't use get_nowait consumer gets stuck because queue may be empty,
but if I use get_nowait above error is shown.
I haven't defined max queue length
-----------EDIT-----------
Producer
async def producer(queue: Queue, **kwargs):
await queue.put('response_1', api_1(**kwargs))
You can drop the await from your first snippet and send the coroutine object in the queue. A coroutine object is a coroutine that was called, but not yet awaited.
# producer:
await queue.put({'response_1': api_1(**kwargs),
'response_2': api_2(**kwargs)})
...
# consumer:
while True:
dct = await queue.get()
for name, api_coro in dct:
result = await api_coro
print('result of', name, ':', result)
Should I use create_task or ensure_future?
If the argument is the result of invoking a coroutine function, you should use create_task (see this comment by Guido for explanation). As the name implies, it will return a Task instance that drives that coroutine. The task can also be awaited, but it continues to run in the background.
ensure_future is a much more specialized function that converts various kinds of awaitable objects to their corresponding futures. It is useful when implementing functions like asyncio.gather() which accept different kinds of awaitable objects for convenients, and need to convert them into futures before working with them.

How to return a value to a function then starts a thread in the same function?

Is there a way for return a value to a function, then invoking a thread in that function? For example:
def foo
return fast_function
Thread.new do
slow_function
end
end
The reason behind this is that both fast_function and slow_function write to the same resource. But I want to ensure that fast_function runs and complete first, and return its value to foo before slow_function writes to the shared resource. There are some cases where slow_function completes before fast_function and I am hit with a race condition.
EDIT:
More context on the problem. This is related to server-side events I am trying to implement. I am trying to get fast_function to compute an event id and return and html. While slow_function is responsible for notifying the client via event id that the process is done. However, in some cases, slow_function notifies the client before the client event know where to listen, because fast_function did not return the event id yet.
No, a return will exit the function, it would also exit the function in a yield block. In my opinions there are multiple solutions to this problem.
Actually it would be a perfect fit for a Promise of Concurrent Ruby (https://github.com/ruby-concurrency/concurrent-ruby)
you could use it somewhat like this:
def foo
fast = Concurrent::Promise.execute{ fast_function }
slow = promises[:fast].then{ slow_function }
.on_fullfill{ notify_client }
return fast.value
end
As you can guess it will return the value of your fast function.
But it will also call the on_fullfill function (Or a proc) if the slow function has finished. And the most important, it will guarante order.
NOTE: I am not sure if I understood you correctly, if you want to start booth threads at the same time, but ensure that the fast one has finished first. you can do something like this:
fast = Concurrent::Promise.execute{ fast_function }
slow = Concurrent::Promise.execute{ slow_function }
render fast.value # Or what you ever do with the html.
#.value will wait for the Promise to finish.
result slow = slow.value
This way you would start booth functions parallel, but be sure you would get the answer first for the fast one.
Edit 1: I thougth about this, and I am not really sure if you want to have an asynchronous task at all. It is hard to tell since you posted a minimal example (what is correct of coruse).
If you just want to have a function which returns botth function returns in the right order, you could just do a yield:
def foo
yield fast_function
yield slow_function
end

EventMachine and looping

Here is my code:
EventMachine.run {
conn = EM::Protocols::HttpClient2.connect request.host, 80
req = conn.get(request.query)
req.callback { |response|
p(response.status)
p(response.headers)
p(response.content)
}
}
The callbacks fire, that is, I get the string outputs of the status, etc.
But what I want it to do is fire the callbacks, then repeat. There is more logic I plan to implement, such as tweaking the URL each time, but for now, I just want it to:
Retrieve the URL
Fire the callbacks
Repeat...
My understanding about this pattern was that everything in that loop fires, then returns, then goes on forever until I do an EM.stop.
Right now, it retrieves the URL data, and just seems to hang.
Do I need to do a return of some sort to continue here? Why is it hanging, and not looping over and over?
If I surround the entire above code block with a loop do ... end it works as expected.. is that the correct way to implement this? I suppose I am confused as I thought everything within EM.run repeats when it completes.
The run block you give runs only once. The event loop is not exposed directly to you but is something that's intended to be invisible. Don't confuse the run block with a while loop. It's run once and once only, but it is run while the event loop is executing.
If you want to repeat an operation you need to create some kind of a stack and work through that, with each callback checking the stack if there's more work to do and then issuing another call. EventMachine applications are built using this callback-chaining method.
You will need to implement something like:
def do_stuff(queue, request = nil)
request ||= queue.pop
return unless (request)
conn = EM::Protocols::HttpClient2.connect request.host, 80
req = conn.get(request.query)
req.callback { |response|
p(response.status)
p(response.headers)
p(response.content)
EventMachine.next_tick do
# This schedules an operation to be performed the next time through
# the event-loop. Usually this is almost immediate.
do_stuff(queue)
end
}
end
Inside your event loop you kick of this chain:
EventMachine.run do
queue = [ ... ] # List of things to do
do_stuff(queue)
end
You can probably find a more elegant way to implement this once you get a better sense of how EventMachine works.

MonitorMixin condition variable -> deadlock

I have a synchronized queue that provides a condition variable.
That condition variable signals when data is added to the queue.
I have 5 threads:
Thread.new do
loop do
#queue.synchronize {
cond.wait_until { #queue.has_data? || #queue.finished? }
}
# some processing code that can also call #queue.enqueue
end
end
Then I do:
#queue.enqueue some_data
#threads.each(&:join)
MyQueue#enqueue looks like this:
def enqueue(data)
synchronize do
#pending << v unless queued?(data) || processed?(data) || processing?(data)
data_cond.signal
end
end
def finished?
#started && #processing.empty? && #pending.empty?
end
def has_data?
!#pending.empty?
end
And I get on #join
deadlock detected
How exactly does this cause a deadlock and how would one fix it?
I wonder if this is a problem that all of the threads are blocked on the same condition variable, and there isnt a thread available to enqueue data, which would release the other threads.
Based on the comment in this code:
Thread.new do
loop do
#queue.synchronize {
cond.wait_until { #queue.has_data? || #queue.finished? }
}
# some processing code that can also call #queue.enqueue
end
end
Your comment that mentions "some processing code that can also call #queue.enqueue", is this the only place where #queue.enqueue is called? If so, then all of the threads will be blocked on the condition variable and none will be able to get to the point to be able to call enqueue. Im sure Ruby can detect that all threads are locked on the same entity and none are available to release it, thus deadlock.
If you do indeed have a separate thread that only enqueues (which would be a typical producer/consumer situation) make sure that it doesnt also wait on the condition variable, which could also cause deadlock.
It's a little hard to help you because you are only posting code fragments...
You should try the work_queue gem, or at least take a look at the source code.
There is no need to wait for has_data? || finished? in synchronize block. The code should look like:
Thread.new do
loop do
cond.wait_until { #queue.has_data? || #queue.finished? }
enq = nil
#queue.synchronize {
enq = #queue.pop
}
# some processing code that can also call #queue.enqueue
end
end
In that case you lock other threads only when operating with queue content. What you need to do is to synchronize on queue state change, like finished
A better solution is to wrap all thread critical variables with mutex, like here in rails. It'll make code a little bit slower since it eliminate simultaneous variable access.

Resources