Non-blocking socket connect()'s within asyncio

Non-blocking socket connect()'s within asyncio - python-asyncio

Here is an example how to do non-blocking socket connects (as client) within asyncore. Since this module is deprecated with recomendation 'Deprecated since version 3.6: Please use asyncio instead.' How does it possible within asyncio? Creating socket and it's connect inside coroutine is working syncronious and create problem like it described in linked question.

A connect inside a coroutine appears synchronous to that coroutine, but is in fact asynchronous with respect to the event loop. This means that you can create any number of coroutines working in parallel without blocking each other, and yet all running inside a single thread.
If you are doing http, look at examples of parallel downloads using aiohttp. If you need low-level TCP connections, look at the examples in the documentation and use asyncio.gather to run them in parallel:
async def talk(host):
# wait until connection is established, but without blocking
# other coroutines
r, w = await asyncio.open_connection(host, 80)
# use the streams r, w to talk to the server - for example, echo:
while True:
line = await r.readline()
if not line:
break
w.write(line)
w.close()
async def talk_many(hosts):
coros = [talk(host) for host in hosts]
await asyncio.gather(*coros)
asyncio.run(talk_many(["host1", "host2", ...])

Related

Using asyncio.run, is it safe to run multiple times?

The documentation for asyncio.run states:
This function always creates a new event loop and closes it at the end.
It should be used as a main entry point for asyncio programs, and should
ideally only be called once.
But it does not say why. I have a non-async program that needs to invoke something async. Can I just use asyncio.run every time I get to the async portion, or is this unsafe/wrong?
In my case, I have several async coroutines I want to gather and run in parallel to completion. When they are all completed, I want move on with my synchronous code.
async my_task(url):
# request some urls or whatever
integration_tasks = [my_task(url1), my_task(url2)]
async def gather_tasks(*integration_tasks):
return await asyncio.gather(*integration_tasks)
def complete_integrations(*integration_tasks):
return asyncio.run(gather_tasks(*integration_tasks))
print(complete_integrations(*integration_tasks))

Can I use asyncio.run() to run coroutines multiple times?
This actually is an interesting and very important question.
As a documentation of asyncio (python3.9) says:
This function always creates a new event loop and closes it at the end. It should be used as a main entry point for asyncio programs, and should ideally only be called once.
It does not prohibit calling it multiple times. And moreover, an old way of calling coroutines from synchronous code, which was:
loop = asyncio.get_event_loop()
loop.run_until_complete(coroutine)
Is now deprecated because of get_event_loop() method, which documentation says:
Consider also using the asyncio.run() function instead of using lower level functions to manually create and close an event loop.
Deprecated since version 3.10: Deprecation warning is emitted if there is no running event loop. In future Python releases, this function will be an alias of get_running_loop().
So in future releases it will not spawn new event loop if already running one is not present! Docs are proposing usage of asyncio.run() if You want to automatically spawn new loop if there is no new one.
There is a good reason for such decision. Even if You have an event loop and You will successfully use it to execute coroutines, there is few more things You must remember to do:
closing an event loop
consuming unconsumed generators (most important in case of failed coroutines)
...probably more, which I do not even attempt to refer here
What is exactly needed to be done to properly finalize event loop You can read in this source code.
Managing an event loop manually (if there is no running one) is a subtle procedure, and it is better to not doing that, unless one know what he is doing.
So Yes, I think that proper way of runing async function from synchronous code is calling asyncio.run(). But it is only suitable from a fully synchronous application. If there is already running event loop, it will probably fail (not tested). In such case, just await it or use get_runing_loop().run_untilcomplete(coro).
And for such synchronous apps, using asyncio.run() it is safe way and actually the only safe way of doing this, and it can be invoked multiple times.
The reason docs says that You should call it only once is that usually there is one single entrypoint to whole asynchronous application. It simplifies things and actually improves performance, because setting thins up for an event loop also takes some time. But if there is no single loop available in Your application, You should use multiple calls to asyncio.run() to run coroutines multiple times.
Is there is any performance gain?
Beside discussing multiple calls to asyncio.run(), I want to address one more concern. In comments, #jwal says:
asyncio is not parallel processing. Says so in the docs. [...] If you want parallel, run in a separate processes on a computer with a separate CPU core, not a separate thread, not a separate event loop.
Suggesting that asyncio is not suitable for parallel processing, which can be misunderstood and misleading to a conclusion, that it will not result in a performance gain, which is not always true. Moreover it is usually false!
So, any time You can delegate a job to an external process (not only a python process, it can be a database worker process, http call, ideally any TCP socket call) You can utilize a performance gain using asyncio. In huge majority of cases, when You are using a library which exposes async interface, the author of that library made an effort to eventually await for a result from a network/socket/process call. While response from such socket is not ready, event loop is completely free to do any other tasks. If loop has more than one such tasks, it will gain a performance.
A canonical example of such case is making a calls to a HTTP endpoints. At some point, there will be a network call, so python thread is free to do other work while awaiting for a data to appear on a TCP socket buffer. I have an example!
The example uses httpx library to compare performance of doing multiple calls to a OpenWeatherMap API. There are two functions:
get_weather_async()
get_weather_sync()
The first one does 8 request to an http API, but schedules those request to
run cooperatively (not concurrently!) on an event loop using asyncio.gather().
The second one performs 8 synchronous request in sequence.
To call the asynchronous function, I am actually using asyncio.run() method. And moreover, I am using timeit module to perform such call to asyncio.run() 4 times. So in a single python application, asyncio.run() was called 4 times, just to challenge my previous considerations.
from time import time
import httpx
import asyncio
import timeit
from random import uniform
class AsyncWeatherApi:
def __init__(
self, base_url: str = "https://api.openweathermap.org/data/2.5"
) -> None:
self.client: httpx.AsyncClient = httpx.AsyncClient(base_url=base_url)
async def weather(self, lat: float, lon: float, app_id: str) -> dict:
response = await self.client.get(
"/weather",
params={
"lat": lat,
"lon": lon,
"appid": app_id,
"units": "metric",
},
)
response.raise_for_status()
return response.json()
class SyncWeatherApi:
def __init__(
self, base_url: str = "https://api.openweathermap.org/data/2.5"
) -> None:
self.client: httpx.Client = httpx.Client(base_url=base_url)
def weather(self, lat: float, lon: float, app_id: str) -> dict:
response = self.client.get(
"/weather",
params={
"lat": lat,
"lon": lon,
"appid": app_id,
"units": "metric",
},
)
response.raise_for_status()
return response.json()
def get_random_locations() -> list[tuple[float, float]]:
"""generate 8 random locations in +/-europe"""
return [(uniform(45.6, 52.3), uniform(-2.3, 29.4)) for _ in range(8)]
async def get_weather_async(locations: list[tuple[float, float]]):
api = AsyncWeatherApi()
return await asyncio.gather(
*[api.weather(lat, lon, api_key) for lat, lon in locations]
)
def get_weather_sync(locations: list[tuple[float, float]]):
api = SyncWeatherApi()
return [api.weather(lat, lon, api_key) for lat, lon in locations]
api_key = "secret"
def time_async_job(repeat: int = 1):
locations = get_random_locations()
def run():
return asyncio.run(get_weather_async(locations))
duration = timeit.Timer(run).timeit(repeat)
print(
f"[ASYNC] In {duration}s: done {len(locations)} API calls, all"
f" repeated {repeat} times"
)
def time_sync_job(repeat: int = 1):
locations = get_random_locations()
def run():
return get_weather_sync(locations)
duration = timeit.Timer(run).timeit(repeat)
print(
f"[SYNC] In {duration}s: done {len(locations)} API calls, all repeated"
f" {repeat} times"
)
if __name__ == "__main__":
time_sync_job(4)
time_async_job(4)
At the end, a comparison of performance was printed. It says:
[SYNC] In 5.5580058859995916s: done 8 API calls, all repeated 4 times
[ASYNC] In 2.865574334995472s: done 8 API calls, all repeated 4 times
Those 4 repetitions was just to show that You can safely run a asyncio.run() multiple times. It had actualy destructive impact on measuring performance of asynchronous http calls, because all 32 request was actually run in four synchronous batches of 8 asynchronous tasks. Just to compare performance of one batch of 32 request:
[SYNC] In 4.373898585996358s: done 32 API calls, all repeated 1 times
[ASYNC] In 1.5169846520002466s: done 32 API calls, all repeated 1 times
So yes, it can, and usually will result in performance gain, if only proper async library is used (if library exposes an async API, it usually does it intentianally, knowing that there will be a network call somewhere).

Correct use of streamz with websocket

I am trying to figure out a correct way of processing streaming data using streamz. My streaming data is loaded using websocket-client, after which I do this:
# open a stream and push updates into the stream
stream = Stream()
# establish a connection
ws = create_connection("ws://localhost:8765")
# get continuous updates
from tornado import gen
from tornado.ioloop import IOLoop
async def f():
while True:
await gen.sleep(0.001)
data = ws.recv()
stream.emit(data)
IOLoop.current().add_callback(f)
While this works, I find that my stream is not able to keep pace with the streaming data (so the data I see in the stream is several seconds behind the streaming data, which is both high volume and high frequency). I tried setting the gen.sleep(0.001) to a smaller value (removing it completely halts the jupyter lab), but the problem remains.
Is this a correct way of connecting streamz with streaming data using websocket?

I don't think websocket-client provides an async API and, so, it's blocking the event loop.
You should use an async websocket client, such as the one Tornado provides:
from tornado.websocket import websocket_connect
ws = websocket_connect("ws://localhost:8765")
async def f():
while True:
data = await ws.read_message()
if data is None:
break
else:
await stream.emit(data)
# considering you're receiving data from a localhost
# socket, it will be really fast, and the `await`
# statement above won't pause the while-loop for
# enough time for the event loop to have chance to
# run other things.
# Therefore, sleep for a small time to suspend the
# while-loop.
await gen.sleep(0.0001)
You don't need to sleep if you're receiving/sending data from/to a remote connection which will be slow enough to suspend the while loop at await statements.

Gathering coin volumes - Is my code running asynchronously?

I'm fairly new to programming in python, I've been programming for about half a year. I've decided to try to build a functional trading bot. While trying to code this bot, I stumbled upon the asyncio module. I would really like to understand the module better but it's hard finding any simple tutorials or documentation about asyncio.
For my script I'm gathering per coin the volume. This works perfectly, but it takes a really long time to gather all the volumes. I would like to ask if my script is running synchronously, and if so how do I fix this? I'm using an API wrapper to communicate with the Binance Exchange.
import binance
import asyncio
import time
s = time.time()
names = [name for name in binance.ticker_prices()] #Gathering all the coin names
loop = asyncio.get_event_loop()
async def get_volume(name):
async def get_data():
return binance.ticker_24hr(name) #Returns per coin a dict of the data of the last 24hr
data = await get_data()
return (name, data['volume'])
tasks = [asyncio.ensure_future(get_volume(name)) for name in names]
results = loop.run_until_complete(asyncio.gather(*tasks))
print('Total time:', time.time() - s)

Since binance.ticker_24hr does not look like it's a coroutine, it is almost certainly blocking the event loop and therefore preventing asyncio.gather to do its job. As a quick fix, you can use run_in_executor to run the blocking function in a separate thread:
async def get_volume(name):
loop = asyncio.get_event_loop()
data = await loop.run_in_executor(None, binance.ticker_24hr, name)
return name, data['volume']
This will work just fine for a reasonable number of parallel tasks. The downside is that it uses threads, so it might not scale to a huge number of parallel requests (or it would require unnecessary waiting). The correct solution in the long run is to use a library that natively supports asyncio.

Maarten firstly you are calling get_ticker for every symbol which means you're making many unnecessary requests. If you call it without a symbol value, you get all tickers in one request. This removes any loops or async as well if you aren't performing other tasks. It looks like the binance library you're using doesn't support this. You can use python-binance to do it
return client.get_ticker()
That said I've been testing an asyncio version of python-binance. It's currently in a feature branch now if you want to try it.
pip install git+https://github.com/sammchardy/python-binance#feature/asyncio
Include the asyncio version of the client and initialise the client
from binance.client_async import AsyncClient as Client
client = Client("<api_key>", "<api_secret>")
Then you can await the calls to get the ticker for a particular symbol
return await client.get_ticker(symbol=name)
Or for all symbol tickers don't pass the symbol parameter
return await client.get_ticker()
Hope that helps

Autoban asyncio client arguments

Disclaimer: This is my first time working with WS and MQTT, so structure may be wrong. Please point this out.
I am using autoban with asyncio to receive and send messages to a HA (HomeAssistant) instance through websockets.
Once my python code receives messages, I want to forward them using MQTT to AWS IoT service. This communication needs to work both ways.
I have made this work as a script where everything is floating within a file.
I am trying to make this work in a class structure, which is how my final work will be done.
In order to do that, I need my WebSocketClientProtocol to have access to AWSIoTClient .publish and .subscribe. Although WebSocketClientProtocol initialization is done through a factory, as a result I am not sure how to pass any arguments to it. For instance:
if __name__ == "__main__":
aws_iot_client = AWSIoTClient(...)
factory = WebSocketServerFactory('ws://localhost:8123/api/websocket')
factory.protocol = HomeAssistantProtocol
How can I pass aws_iot_client to HomeAssistantProtocol?
I have found examples of Autobahn - Twisted that do this using self.factory on the WebSocketClientProtocol subclass, but this is not available for asyncio.

I found that calling run_until_complete on returns transport, protocol instances, so I can then I can pass the AWS client to it.
loop = asyncio.get_event_loop()
coro = loop.create_connection(factory, '127.0.0.1', 9000)
transport, protocol = loop.run_until_complete(coro)

Why is gevent.sleep(0.1) necessary in this example to prevent the app from blocking?

I'm pulling my hair out over this one. I'm trying to get the simplest of examples working with zeromq and gevent. I changed this script to use PUB/SUB sockets and when I run it the 'server' socket loops forever. If I uncomment the gevent.sleep(0.1) line then it works as expected and yields to the other green thread, which in this case is the client.
The problem is, why should I have to manually add a sleep call? I thought when I import the zmq.green version of zmq that the send and receive calls are non blocking and underneath do the task switching.
In other words, why should I have to add the gevent.sleep() call to get this example working? In Jeff Lindsey's original example, he's doing REQ/REP sockets and he doesn't need to add sleep calls...but when I changed this to PUB/SUB I need it there for this to yield to the client for processing.
#Notes: Code taken from slide: http://www.google.com/url?sa=t&rct=j&q=zeromq%20gevent&source=web&cd=27&ved=0CFsQFjAGOBQ&url=https%3A%2F%2Fraw.github.com%2Fstrangeloop%2F2011-slides%2Fmaster%2FLindsay-DistributedGeventZmq.pdf&ei=JoDNUO6OIePQiwK8noHQBg&usg=AFQjCNFa5g9ZliRVoN_yVH7aizU_fDMtfw&bvm=bv.1355325884,d.cGE
#Jeff Lindsey talk on gevent and zeromq
import gevent
from gevent import spawn
import zmq.green as zmq
context = zmq.Context()
def serve():
print 'server online'
socket = context.socket(zmq.PUB)
socket.bind("ipc:///tmp/jeff")
while True:
print 'send'
socket.send("World")
#gevent.sleep(0.1)
def client():
print 'client online'
socket = context.socket(zmq.SUB)
socket.connect("ipc:///tmp/jeff")
socket.setsockopt(zmq.SUBSCRIBE, '')
while True:
print 'recv'
message = socket.recv()
cl = spawn(client)
server = spawn(serve)
print 'joinall'
gevent.joinall([cl, server])
print 'end'

I thought when I import the zmq.green version of zmq that the send and receive calls are non blocking and underneath do the task switching.
zmq.green will only yield if these calls would block, it does not yield if they are ready (there's nothing to wait for). In your case the sender is always ready, so it never has a reason to yield.
Some pointers:
a minimal explicit yield is gevent.sleep(0), it doesn't need to be finite.
zmq.green only yields on blocking calls. That is, if a socket is always ready to send/recv when you ask it to, it will never yield.
socket.send only blocks when the socket is not ready to send (not (socket.events & zmq.POLLOUT)),
which can never actually be true of a PUB socket (you will see it at HWM for PUSH, DEALER, etc.).
in general, don't trust send to yield, because of the way zeromq works this will rarely be the case unless
you are exceeding the capacity of your configuration.
unlike send, recv regularly blocks in normal usage, so it yields on most calls. But if a peer is flooding your incoming buffer, repeated recv calls will not yield until there is nothing ready to receive, so you may again need to explicitly yield every so often to prevent starvation.
What zmq.green amounts to is turning send/recv into:
try:
socket.send(msg, zmq.NOBLOCK) # or recv
except zmq.ZMQError as e:
if e.errno == zmq.EAGAIN:
yield # and wait for socket to be ready, then try again
so if send/recv with NOBLOCK are always succeeding, the socket never yields.
To put it another way: If a socket has nothing to wait for, it won't wait.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio