What is the use case for future.add_done_callback()? - python-asyncio

I understand how to add a callback method to a future and have it called when the future is done. But why is this helpful when you can already call functions from inside coroutines?
Callback version:
def bar(future):
# do stuff using future.result()
...
async def foo(future):
await asyncio.sleep(3)
future.set_result(1)
loop = asyncio.get_event_loop()
future = loop.create_future()
future.add_done_callback(bar)
loop.run_until_complete(foo(future))
Alternative:
async def foo():
await asyncio.sleep(3)
bar(1)
loop = asyncio.get_event_loop()
loop.run_until_complete(foo())
When would the second version not be available/suitable?

In the code as shown, there is no reason to use an explicit future and add_done_callback, you could always await. A more realistic use case is if the situation were reversed, if bar() spawned foo() and needed access to its result:
def bar():
fut = asyncio.create_task(foo())
def when_finished(_fut):
print("foo returned", fut.result())
fut.add_done_callback(when_finished)
If this reminds you of "callback hell", you are on the right track - Future.add_done_callback is a rough equivalent of the then operator of pre-async/await JavaScript promises. (Details differ because then() is a combinator that returns another promise, but the basic idea is the same.)
A large part of asyncio is implemented in this style, using non-async functions that orchestrate async futures. That basic layer of transports and protocols feels like a modernized version of Twisted, with the coroutines and streams implemented as a separate layer on top of it, a higher-level sugar. Application code written using the basic toolset looks like this.
Even when working with non-coroutine callbacks, there is rarely a good reason to use add_done_callback, other than inertia or copy-paste. For example, the above function could be trivially transformed to use await:
def bar():
async def coro():
ret = await foo()
print("foo returned", ret)
asyncio.create_task(coro())
This is more readable than the original, and much much easier to adapt to more complex awaiting scenarios. It is similarly easy to plug coroutines into the lower-level asyncio plumbing.
So, what then are the use cases when one needs to use the Future API and add_done_callback? I can think of several:
Writing new combinators.
Connecting coroutines code with code written in the more traditional callback style, such as this or this.
Writing Python/C code where async def is not readily available.
To illustrate the first point, consider how you would implement a function like asyncio.gather(). It must allow the passed coroutines/futures to run and wait until all of them have finished. Here add_done_callback is a very convenient tool, allowing the function to request notification from all the futures without awaiting them in series. In its most basic form that ignores exception handling and various features, gather() could look like this:
async def gather(*awaitables):
loop = asyncio.get_event_loop()
futs = list(map(asyncio.ensure_future, awaitables))
remaining = len(futs)
finished = loop.create_future()
def fut_done(fut):
nonlocal remaining
remaining -= 1
if not remaining:
finished.set_result(None) # wake up
for fut in futs:
fut.add_done_callback(fut_done)
await finished
# all awaitables done, we can return the results
return tuple(f.result() for f in futs)
Even if you never use add_done_callback, it's a good tool to understand and know about for that rare situation where you actually need it.

Related

How to detect break during yield

For a gem I intend to publish, I want to create an enumerable interface wrapping an external library. (Called via FFI)
I have this code (stripped for clarity)
def each_shape(&block)
callback = lambda do |*args|
yield
end
cpSpaceEachShape(callback) # FFI function, calls callback
end
which is called with a block like this
space.each_shape do
# ...
end
cpSpaceStep() # Other FFI function
The cpSpaceEachShape is an external C library function, which in returns calls callback a number of times (synchronously, no way to cancel it)
This works great, until I use break in the top-level block, like this:
space.each_shape do
break
end
cpSpaceStep() # Error from C library because iteration is still running
I'm not sure how Ruby, FFI and the C library are interacting here. From the Ruby perspective it looks like the break "jumps" out of the top-level block, the callback block, from cbSpaceEachShape and out of each_shape.
Side question: What happens here from a low-level perspective, e.g. what happens to the stack?
Main question: Can I capture the break from the top-level block?
I was hoping for something like this: (not working, pseudo code)
def each_shape(&block)
#each_shape_cancelled = false
callback = lambda do |*args|
yield unless #each_shape_cancelled
rescue StopIteration
#each_shape_cancelled = true
end
cpSpaceEachShape(callback)
end
EDIT: This is for a gem I intend to publish. I want my users to be able to use regular Ruby. If they were required to use throw... How would they know...? If there is no good solution I will - begrudgingly - construct an array beforehand and collect/cache all callbacks before yielding its contents.

Ruby Async Gem: What is a basic usage example?

With Ruby 3.0 the async gem is now compatible with blocking IO in standard library functions and I wanted to understand the basic functionality but am already confused by a simple example:
require 'async'
n = 10
n.times do |i|
Async do
HTTParty.get("https://httpbin.org/delay/1.6")
end
end
This doesn't show any parallelism. Looking in the Gem's documentation about Kernel#async says:
Run the given block of code in a task, asynchronously, creating a
reactor if necessary.
But the project documentation seems to clear it up:
When invoked at the top level, will create and run a reactor, and
invoke the block as an asynchronous task. Will block until the reactor
finishes running.
So to make the example from above work:
require 'async'
n = 10
Async do
n.times do |i|
Async do
HTTParty.get("https://httpbin.org/delay/1.6")
end
end
end
This works, but seems confusing to the reader. How would we know as readers that the first Async do is blocking while the others are not?
Thus the question: What is the canonical basic usage of the async gem?
Further reading:
Async Gem Project documentation: Getting started
Blog Post about Async Ruby

In Ruby, why wrap "yield" in a call you are making anyway?

I am new to Ruby. I am confused by something I am reading here:
http://alma-connect.github.io/techblog/2014/03/rails-pub-sub.html
They offer this code:
# app/pub_sub/publisher.rb
module Publisher
extend self
# delegate to ActiveSupport::Notifications.instrument
def broadcast_event(event_name, payload={})
if block_given?
ActiveSupport::Notifications.instrument(event_name, payload) do
yield
end
else
ActiveSupport::Notifications.instrument(event_name, payload)
end
end
end
What is the difference between doing this:
ActiveSupport::Notifications.instrument(event_name, payload) do
yield
end
versus doing this:
ActiveSupport::Notifications.instrument(event_name, payload)
yield
If this were another language, I might assume that we first call the method instrument(), and then we call yield so as to call the block. But that is not what they wrote. They show yield being nested inside of ActiveSupport::Notifications.instrument().
Should I assume that ActiveSupport::Notifications.instrument() is returning some kind of iterable, that we will iterate over? Are we calling yield once for every item returned from ActiveSupport::Notifications.instrument()?
While blocks are frequently used for iteration they have many other uses. One is to ensure proper resource cleanup, for example
ActiveRecord::Base.with_connection do
...
end
Checks out a database connection for the thread, yields to the block and then checks the connection back in.
In the specific case of the instrument method you found what it does is add to the event data it is about to broadcast information about the time it's block took to execute. The actual implementation is more complicated but in broad terms it's not so different to
event = Event.new(event_name, payload)
event.start = Time.now
yield
event.end = Time.now
event
The use of yield allows it to wrap the execution of your code with some timing code. In your second example no block is passed to instrument, which detects this and will record it as an event having no duration
The broadcast_event method has been designed to accept an optional block (which allows you to pass a code block to the method).
ActiveSupport::Notifications.instrument also takes an optional block.
Your first example simply takes the block passed in to broadcast_event and forwards it along to ActiveSupport::Notifications.instrument. If there's no block, you can't yield anything, hence the different calls.

How do I create asynchronous HTTP requests, while still offering an object interface?

What is a good way to provide an easy to use object-style interface to an HTTP API, while still allowing asynchronous HTTP requests to be made? For example, given the following code:
user = User.find(1) # /api/v1/users/1
f = user.foo # /api/v1/users/1/foo
b = user.bar # /api/v1/users/1/bar
Calls to the foo and bar methods could logically be called in parallel, but if possible I'd like to let the calling user have a clean way to declare their intent for parallel calls without getting into the details of the underlying HTTP lib.
I don't think invisibly automating the parallelization is a good idea, so that the calling code is explicit about its expectations. But, I do think something like the following block syntax could be extremely useful for front-end developers to be able to use when they know that a set of requests do not depend on each other.
# possible implementation?
user = User.find(1)
User.parallel do
f = user.foo
b = user.bar
end
Is this possible? How can I accomplish this using Ruby 1.9.2?
Transparent parallelization is generally accomplished through something called a future. In ruby, both the lazy and promise gems have implementations of futures. Here's an example with the promise gem:
require 'future'
user = User.find(1)
f = future{user.foo}
b = future{user.bar}
A future will run the computation in a block in a background thread, and any attempt to operate on f or b will block unless the background thread has already run to completion.
As a library designer, so long as your library is threadsafe, this should be fine in many cases, though it probably doesn't scale all that well in ruby.
The way this works using asynchronous SQL is something like this:
User.where(:id => 1).async_each do |user|
Foo.where(:id => user.foo_id).async_each do |foo|
# ... Do stuff with foo ...
end
Bar.where(:id => user.bar_id).async_each do |bar|
# ... Do stuff with foo ...
end
end
This is my experience with asynchronous Sequel, which is like ActiveRecord.
Keep in mind this is a lot of callbacks stacked together, so things can get sideways if you're not careful.

May a Recursive Function Release its own Mutex?

I have some code, a class method on a Ruby class FootballSeries.find(123), which performs an API call… owing to concerns about thread safety, only one thread may enter this method at a time. Due to some recent changes on the API, I have also support the following… FootballSeries.find('Premiership'), the second variety (see implementation below) simply makes an interim call to see if there's an ID can be found, then recursively calling itself using the ID.
class FootballSeries
#find_mutes = Mutex.new
class << self
def find(series_name_or_id)
#find_mutex.synchronize do
if series_name_or_id.is_a?(String)
if doc = search_xml_document(series_name_or_id)
if doc.xpath('//SeriesName').try(:first).try(:content) == series_name_or_id
#find_mutex.unlock
series = find(doc.xpath('//seriesid').first.content.to_i)
#find_mutex.lock
return series
end
end
elsif series_name_or_id.is_a?(Integer)
if doc = xml_document(series_name_or_id)
Series.new(doc)
end
end
end
end
end
end
Without lines 9 and 11, there's a recursive mutex lock: deadlock error (which makes enough sense… therefore my question is, may I release and re-lock the mutex. (I re-lock, so that when synchronize exits, I won't get an error unlocking a mutex that I don't own… but I haven't tested if this is required)
Is this a sane implementation, or would I be better served having find() call two individual methods, each protected with their own mutex? (example find_by_id, and find_by_name)
What I have now works (or at least appears to work).
Finally, bonus points for - how would I test such a method for safety?
This doesn't look good to me, as #find_mutex.unlock will allow another method(s) to enter at the same time. Also, I don't think using recursion is usual for this kind of method dispatch - actually you have two methods stuffed into one. I would certainly separate these two, and if you want to be able to call one method with different argument types, just check the argument's type and invoke one or the other. If you don't need to expose find_by_id and find_by_name, you can make them private, and put mutex.synchronize only in find.

Resources