aiohttp: separate request sending from response waiting - python-asyncio

I have a specific use case where I need to send an HTTP request ASAP, but cannot wait for the HTTP response to come back before doing some necessary work.
Conceptually, I need to do this:
async with aiohttp.ClientSession() as session:
request = await session.send_request(method='GET', url='https://httpbin.org/get')
# do some necessary work
response = await request.get_response()
# process response...
The problem with the simple plain way is, I can send the HTTP request as soon as I want, but I cannot yield while waiting for the response:
async with aiohttp.ClientSession() as session:
# this blocks until both the request is sent AND response arrived
response = await session.request(method='GET', url='https://httpbin.org/get')
# process response...
I tried to spin up a new coroutine so as to not have to wait for the HTTP response to arrive:
async def foo(url):
async with aiohttp.ClientSession() as session:
response = await session.request(method='GET', url=url)
# process response...
asyncio.create_task(foo('https://httpbin.org/get'))
# do necessary work
but then since create_task() occurs at "the first chance the event loop gets", it's sometimes half a second or a second after I call create_task(), which is too slow for my purpose.
My question(s):
(a) is there a way to separate HTTP request sending from HTTP response waiting in aiohttp?
(b) if not, can you suggest an alternative way to send HTTP request ASAP but to await the response asynchronously?
Thanks!
Update #1
From #Isabi's suggestion in the comments, I tried only using await after the necessary work is done, but the HTTP request is never sent until the await is used, e.g.:
async with aiohttp.ClientSession() as session:
# send out an HTTP request that takes ~2 seconds before response comes back
request = session.request(method='GET', url='https://httpbin.org/delay/2')
await asyncio.sleep(4) # simulate 4 seconds of necessary work
# the following line still takes 2 seconds, indicating the request
# didnt go out before `await` is used
response = await request
# process response...
Update #2
I worked out a way that makes my application behave the way I want it (send the HTTP request ASAP, but don't block waiting for the HTTP response). The solution uses a call to asyncio.sleep(0), inspired from this thread. However, it is not aesthetically satisfying:
async def foo(url):
async with aiohttp.ClientSession() as session:
response = await session.request(method='GET', url=url)
# process response...
asyncio.create_task(foo('https://httpbin.org/get'))
await asyncio.sleep(0)
# do necessary work
It doesn't feel right to me that what should be a not uncommon use case requires such an inelegant solution. Am I missing something?

Are you sure the task is being run half a second or even one second later? Because that should not be the case unless the loop is busy and the loop shouldn't be busy unless it's under heavy load or you have blocking code running at the same time.
You can use logging to check exactly when the request is being sent and when it's received:
import asyncio
import aiohttp
import logging
logging.basicConfig(format="%(asctime)s.%(msecs)03d %(levelname)s %(message)s", datefmt="%Y-%m-%d %H:%M:%S", level=logging.INFO)
async def foo(url):
async with aiohttp.ClientSession() as session:
logging.info("request started")
response = await session.request(method="GET", url=url)
logging.info("response received")
text = await response.text()
logging.info(f"response read {len(text)} bytes")
# process response...
async def test():
logging.info("sleep 0.1")
await asyncio.sleep(0.1)
logging.info("create task")
asyncio.create_task(foo("https://httpbin.org/get"))
logging.info("task created, sleep 2")
await asyncio.sleep(2)
logging.info("finished")
if __name__== "__main__":
asyncio.get_event_loop().run_until_complete(test())
Output:
2020-06-10 10:52:00.017 INFO sleep 0.1
2020-06-10 10:52:00.118 INFO create task
2020-06-10 10:52:00.118 INFO task created, sleep 2
2020-06-10 10:52:00.119 INFO request started
2020-06-10 10:52:00.621 INFO response received
2020-06-10 10:52:00.622 INFO response read 308 bytes
2020-06-10 10:52:02.121 INFO finished
Notice the coroutine starts running about 1 ms after creating it and the HTTP request takes about 0.5 seconds to complete; since 0.5 value is close to what you were seeing I believe you were measuring the time to complete the request and not the time to start the request.

Related

Rate-limitting aiohttp.ClientSession() to make N requests per second

I want to create an asynchronous SDK using aiohttp client for our service. I haven't been able to figure out how to throttle the ClientSession() to make only N requests per second.
class AsyncHTTPClient:
def __init__(self, api_token, per_second_limit=10):
self._client = aiohttp.ClienSession(
headers = {"Authorization": "Bearer f{api_token}"}
)
self._throttler = asyncio.Semaphore(per_second_limit)
async def _make_request(self, method, url, **kwargs):
async with self._throttler:
return await self._client.request(method, url, **kwargs)
async def get(self, url, **params):
return await self._make_request("GET", url, **params)
async def close(self):
await self._client.close()
I have this class with get, post, patch, put, delete methods implemented as a call to _make_request.
#As a user of the SDK I run the following code.
async def main():
try:
urls = [some_url * 100]
client = AsyncHTTPClient(my_token, per_second_limit=20)
await asyncio.gather(*[client.get(url) for url in urls])
finally:
await client.close()
asyncio.run(main())
asyncio.Semaphore limits the concurrency. That is, when the main() function is called, the async with self._throttler used in client._make_request limits concurrency to 20 requests. However, if the 20 requests finished within 1 second, then requests will be continuously made. What I want to do is make sure that only N requests (i.e. 20) are made in a second. If all 20 requests finished in 0.8 seconds, then sleep for 0.2 seconds and then process the requests again.
I looked up some asyncio.Queue examples with workers example but I am not sure how I will I implement it in my SDK since creating workers will have to be done by the user using this SDK and I want to avoid that, I want AsyncHTTPClient to handle the requests per second limit.
Any suggestions/advise/samples will be greatly appreciated.

Asyncio: Fastapi with aio-pika, consumer ignores Await

I am trying to hook my websocket endpoint with rabbitmq (aio-pika). Goal is to have listener in that endpoint and on any new message from queue pass the message to browser client over websockets.
I tested the consumer with asyncio in a script with asyncio loop. Works as I followed and used aio-pika documentation. (source: https://aio-pika.readthedocs.io/en/latest/rabbitmq-tutorial/2-work-queues.html, worker.py)
However, when I use it in fastapi in websockets endpoint, I cant make it work. Somehow the listener:
await queue.consume(on_message)
is completely ignored.
This is my attempt (I put it all in one function, so its more readable):
#app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
print("Entering websockets")
await manager.connect(websocket)
print("got connection")
# params
queue_name = "task_events"
routing_key = "user_id.task"
con = "amqp://rabbitmq:rabbitmq#rabbit:5672/"
connection = await connect(con)
channel = await connection.channel()
await channel.set_qos(prefetch_count=1)
exchange = await channel.declare_exchange(
"topic_logs",
ExchangeType.TOPIC,
)
# Declaring queue
queue = await channel.declare_queue(queue_name)
# Binding the queue to the exchange
await queue.bind(exchange, routing_key)
async def on_message(message: IncomingMessage):
async with message.process():
# here will be the message passed over websockets to browser client
print("sent", message.body)
try:
######### Not working as expected ###########
# await does not await and websockets finishes, as there is no loop
await queue.consume(on_message)
#############################################
################ This Alternative code atleast receives some messages #############
# If I use this part, I atleast get some messages, when I trigger a backend task that publishes new messages to the queue.
# It seems like the messages are somehow stuck and new task releases all stucked messages, but does not release new one.
while True:
await queue.consume(on_message)
await asyncio.sleep(1)
################## one part #############
except WebSocketDisconnect:
manager.disconnect(websocket)
I am quite new to async in python. I am not sure where is the problem and I cannot somehow implement async consuming loop while getting inspired with worker.py from aio-pika.
You could use an async iterator, which is the second canonical way to consume messages from a queue.
In your case, this means:
async with queue.iterator() as iter:
async for message in iter:
async with message.process():
# do something with message
It will block as long as no message is received and will be suspended again after processing a message.
The solution was simply.
aio-pika queue.consume even though we use await is nonblocking, so
this way we consume
consumer_tag = await queue.consume(on_message, no_ack=True)
and at the end of connection we cancel
await queue.cancel(consumer_tag)
The core of the solution for me, was to make something asyncio blocking, so I used
this part of the code after consume
while True:
data = await websocket.receive_text()
x = await manager.send_message(data, websocket)
I dont use this code, but its useful as this part of the code waits for frontend websocket response. If this part of the code is missing, then what happens is that client connects just to get disconnected (the websocket endpoit is succefully executed), as there is nothing blocking

Asyncio wait not running all coroutines

I have a speech to text client set up as follows. Client sends audio packets to server and server returns the text results back, which client prints on stdout.
async def record_audio(websocket):
# Audio recording parameters
rate = 16000
chunk = int(rate / 10) # 100ms
with BufferedMicrophoneStream(rate, chunk) as stream:
audio_generator = stream.generator() # Buffer is a asynchronous queue instance
for message in audio_generator:
await websocket.send(message)
async def collect_results(websocket):
async for message in websocket:
print(message)
async def combine():
uri = "ws://localhost:8765"
async with websockets.connect(uri) as websocket:
await asyncio.wait([
record_audio(websocket),
collect_results(websocket)
])
def main():
loop = asyncio.get_event_loop()
loop.run_until_complete(combine())
loop.close()
As you can see, both the coroutines are infinite loops.
When I run the program, the server runs correctly for either of the waited coroutines, that is, if I only pass the record_audio or collect_results, I am able to confirm that they work individually, but not simultaneously.
However, if I put a asyncio.sleep(10) statement in record_audio inside the loop, then I do see output from collect_results and audio chunks are sent to server in a burst every 10th second.
What gives?
Thanks.
Update #1:
I replaced the above code with the following, still no avail:
async with websockets.connect(uri) as websocket:
futures = [
await loop.run_in_executor(executor, record_audio, websocket),
await loop.run_in_executor(executor, collect_results, websocket)
]
await asyncio.gather(*futures)

pynng: how to setup, and keep using, multiple Contexts on a REP0 socket

I'm working on a "server" thread, which takes care of some IO calls for a bunch of "clients".
The communication is done using pynng v0.5.0, the server has its own asyncio loop.
Each client "registers" by sending a first request, and then loops receiving the results and sending back READY messages.
On the server, the goal is to treat the first message of each client as a registration request, and to create a dedicated worker task which will loop doing IO stuff, sending the result and waiting for the READY message of that particular client.
To implement this, I'm trying to leverage the Context feature of REP0 sockets.
Side notes
I would have liked to tag this question with nng and pynng, but I don't have enough reputation.
Although I'm an avid consumer of this site, it's my first question :)
I do know about the PUB/SUB pattern, let's just say that for self-instructional purposes, I chose not to use it for this service.
Problem:
After a few iterations, some READY messages are intercepted by the registration coroutine of the server, instead of being routed to the proper worker task.
Since I can't share the code, I wrote a reproducer for my issue and included it below.
Worse, as you can see in the output, some result messages are sent to the wrong client (ERROR:root:<Worker 1>: worker/client mismatch, exiting.).
It looks like a bug, but I'm not entirely sure I understand how to use the contexts correctly, so any help would be appreciated.
Environment:
winpython-3.8.2
pynng v0.5.0+dev (46fbbcb2), with nng v1.3.0 (ff99ee51)
Code:
import asyncio
import logging
import pynng
import threading
NNG_DURATION_INFINITE = -1
ENDPOINT = 'inproc://example_endpoint'
class Server(threading.Thread):
def __init__(self):
super(Server, self).__init__()
self._client_tasks = dict()
#staticmethod
async def _worker(ctx, client_id):
while True:
# Remember, the first 'receive' has already been done by self._new_client_handler()
logging.debug(f"<Worker {client_id}>: doing some IO")
await asyncio.sleep(1)
logging.debug(f"<Worker {client_id}>: sending the result")
# I already tried sending synchronously here instead, just in case the issue was related to that
# (but it's not)
await ctx.asend(f"result data for client {client_id}".encode())
logging.debug(f"<Worker {client_id}>: waiting for client READY msg")
data = await ctx.arecv()
logging.debug(f"<Worker {client_id}>: received '{data}'")
if data != bytes([client_id]):
logging.error(f"<Worker {client_id}>: worker/client mismatch, exiting.")
return
async def _new_client_handler(self):
with pynng.Rep0(listen=ENDPOINT) as socket:
max_workers = 3 + 1 # Try setting it to 3 instead, to stop creating new contexts => now it works fine
while await asyncio.sleep(0, result=True) and len(self._client_tasks) < max_workers:
# The issue is here: at some point, the existing client READY messages get
# intercepted here, instead of being routed to the proper worker context.
# The intent here was to open a new context only for each *new* client, I was
# assuming that a 'recv' on older worker contexts would take precedence.
ctx = socket.new_context()
data = await ctx.arecv()
client_id = data[0]
if client_id in self._client_tasks:
logging.error(f"<Server>: We already have a task for client {client_id}")
continue # just let the client block on its 'recv' for now
logging.debug(f"<Server>: New client : {client_id}")
self._client_tasks[client_id] = asyncio.create_task(self._worker(ctx, client_id))
await asyncio.gather(*list(self._client_tasks.values()))
def run(self) -> None:
# The "server" thread has its own asyncio loop
asyncio.run(self._new_client_handler(), debug=True)
class Client(threading.Thread):
def __init__(self, client_id: int):
super(Client, self).__init__()
self._id = client_id
def __repr__(self):
return f'<Client {self._id}>'
def run(self):
with pynng.Req0(dial=ENDPOINT, resend_time=NNG_DURATION_INFINITE) as socket:
while True:
logging.debug(f"{self}: READY")
socket.send(bytes([self._id]))
data_str = socket.recv().decode()
logging.debug(f"{self}: received '{data_str}'")
if data_str != f"result data for client {self._id}":
logging.error(f"{self}: client/worker mismatch, exiting.")
return
def main():
logging.basicConfig(level=logging.DEBUG)
threads = [Server(),
*[Client(i) for i in range(3)]]
for t in threads:
t.start()
for t in threads:
t.join()
if __name__ == '__main__':
main()
Output:
DEBUG:asyncio:Using proactor: IocpProactor
DEBUG:root:<Client 1>: READY
DEBUG:root:<Client 0>: READY
DEBUG:root:<Client 2>: READY
DEBUG:root:<Server>: New client : 1
DEBUG:root:<Worker 1>: doing some IO
DEBUG:root:<Server>: New client : 0
DEBUG:root:<Worker 0>: doing some IO
DEBUG:root:<Server>: New client : 2
DEBUG:root:<Worker 2>: doing some IO
DEBUG:root:<Worker 1>: sending the result
DEBUG:root:<Client 1>: received 'result data for client 1'
DEBUG:root:<Client 1>: READY
ERROR:root:<Server>: We already have a task for client 1
DEBUG:root:<Worker 1>: waiting for client READY msg
DEBUG:root:<Worker 0>: sending the result
DEBUG:root:<Client 0>: received 'result data for client 0'
DEBUG:root:<Client 0>: READY
DEBUG:root:<Worker 0>: waiting for client READY msg
DEBUG:root:<Worker 1>: received 'b'\x00''
ERROR:root:<Worker 1>: worker/client mismatch, exiting.
DEBUG:root:<Worker 2>: sending the result
DEBUG:root:<Client 2>: received 'result data for client 2'
DEBUG:root:<Client 2>: READY
DEBUG:root:<Worker 2>: waiting for client READY msg
ERROR:root:<Server>: We already have a task for client 2
Edit (2020-04-10): updated both pynng and the underlying nng.lib to their latest version (master branches), still the same issue.
After digging into the sources of both nng and pynng, and confirming my understanding with the maintainers, I can now answer my own question.
When using a context on a REP0 socket, there are a few things to be aware of.
As advertised, send/asend() is guaranteed to be routed to the same peer you last received from.
The data from the next recv/arecv() on this same context, however, is NOT guaranteed to be coming from the same peer.
Actually, the underlying nng call to rep0_ctx_recv() merely reads the next socket pipe with available data, so there's no guarantee that said data is coming from the same peer than the last recv/send pair.
In the reproducer above, I was concurrently calling arecv() both on a new context (in the Server._new_client_handler() coroutine), and on each worker context (in the Server._worker() coroutine).
So what I had previously described as the next request being "intercepted" by the main coroutine was merely a race condition.
One solution would be to only receive from the Server._new_client_handler() coroutine, and have the workers only handle one request. Note that in this case, the workers are no longer dedicated to a particular peer. If this behavior is needed, the routing of incoming requests must be handled at application level.
class Server(threading.Thread):
#staticmethod
async def _worker(ctx, data: bytes):
client_id = int.from_bytes(data, byteorder='big', signed=False)
logging.debug(f"<Worker {client_id}>: doing some IO")
await asyncio.sleep(1 + 10 * random.random())
logging.debug(f"<Worker {client_id}>: sending the result")
await ctx.asend(f"result data for client {client_id}".encode())
async def _new_client_handler(self):
with pynng.Rep0(listen=ENDPOINT) as socket:
while await asyncio.sleep(0, result=True):
ctx = socket.new_context()
data = await ctx.arecv()
asyncio.create_task(self._worker(ctx, data))
def run(self) -> None:
# The "server" thread has its own asyncio loop
asyncio.run(self._new_client_handler(), debug=False)

Tornado cancel httpclient.AsyncHTTPClient fetch() from on_chunk()

Inside one of the handlers I am doing the following:
async def get(self):
client = httpclient.AsyncHTTPClient()
url = 'some url here'
request = httpclient.HTTPRequest(url=url, streaming_callback=self.on_chunk, request_timeout=120)
result = await client.fetch(request)
self.write("done")
#gen.coroutine
def on_chunk(self, chunk):
self.write(chunk)
yield self.flush()
The requests can sometimes be quite large and the client may leave while the request is still in progress of being fetched and pumped to the client. If this happens an exception will appear in the on_chunk function when self.write() is attempted. My question is how do I abort the remaining download if my client went away ?
If your streaming_callback raises an exception, the client request should be aborted. This will spam the logs with stack traces, but there's not currently a cleaner way to do it. You can override on_connection_close to detect when the client has disconnected and set an attribute on self that you can check in on_chunk.

Resources