Create db connections before run aiohttp loop - python-asyncio

is it not mistake create some connections (to DB, AMQP etc) before run web.run_app in aiohttp.
Some example:
async def init_app():
app = web.Application()
app['db'] = await create_db_connection()
app['amqp'] = await create_amqp_connection()
return app
if __name__ == '__main__':
app = asyncio.get_event_loop().run_until_complete(init_app())
web.run_app(app)
It works, but I'm not sure about is this right or not.
I know about app.startup but I have I'd like to handle all connection's errors before start main application.

The code is correct until you don't care about closing resources before the server exit.
Most people don't, that's fine.
Otherwise app.cleanup signal should be used.

Related

How to keep browser opening by the end of the code running with playwright-python?

I want to use playwright-python to fill some forms automatically. And then double check the filling before submit. But it always close the browser by the end of the code running. Even if I used the handleSIGHUP=False, handleSIGINT=False, handleSIGTERM=False launch arguments, and didn't use any page.close() or browser.close() in my code, it still close the browser after the code finished.
Does anyone know how to do it?
The browser is started by the python script, so it will end, when the script ends.
So you need to keep the script alive.
For the async part it is a bit tricky.
I assume you have same kind of:
asyncio.get_event_loop().run_until_complete(main())
so in the async main routine you need to keep the async loop running:
e.g. waiting for the keyboard:
import asyncio
from playwright.async_api import async_playwright
from playwright.async_api import Page
async with async_playwright() as p:
async def main(run_forever: bool):
browser = await p.__getattribute__(C.BROWSER.browser).launch(headless=False, timeout=10000)
page = await browser.new_page()
if run_forever:
print('Press CTRL-D to stop')
reader = asyncio.StreamReader()
pipe = sys.stdin
loop = asyncio.get_event_loop()
await loop.connect_read_pipe(lambda: asyncio.StreamReaderProtocol(reader), pipe)
async for line in reader:
print(f'Got: {line.decode()!r}')
else:
browser.close()
get see worked link
Context:
Playwright Version: 1.16
Operating System: Windows
Node.js version: 14.18.1
Browser: All
Overview
There are multiple ways to turn on the Inspector, as explained in https://playwright.dev/docs/inspector
As per docs, when we set "PWDEBUG", it also disables the timeout, by setting it to 0. This is a good idea, as it allows the user to click around the inspector without a time limit.
However, this is not the case with page.pause(). This option also opens the inspector, but the timeout is not set to 0, thus the inspector will force exit after 30s with e.g.
Slow test: tests\my_test.test.js (30s)
The same behavior is mentioned in a Feature request here: #10132
And I believe it should be fixed to match the default behaviour of other options, at least for consistency reasons. For example, in Java binding, page.pause() does not require timeOut to be explicitly disabled.

Broadcasting message from grpc server to all/some connected clients in python

i am learning how to use grpc streams to exchange messages between clients and server in python. I found a base example that enables the simple message sending between server and client. I am trying to modify it so that i could keep track of all the clients connected to the grpc server (on the server side) and could do two things: 1) broadcast from server to all clients, 2) send message to a particular connected client.
Here is the .proto file
syntax = 'proto3';
service Scenario {
rpc Chat(stream DPong) returns (stream DPong) {}
}
message DPong {
string name = 1;
}
And here is the client.py that creates a daemon process to listen for incoming messages and waits for stdin for any outgoing messages
import threading
import grpc
import time
import scenario_pb2_grpc, scenario_pb2
# new changes
msgQueue = queue.Queue()
def run():
channel = grpc.insecure_channel('localhost:50052')
stub = scenario_pb2_grpc.ScenarioStub(channel)
print('client connected')
global queue
def inputStream():
while 1:
msg = input('>>Enter message\n>>')
yield scenario_pb2.DPong(name=msg)
input_stream = stub.Chat(inputStream())
def read_incoming():
while 1:
print('receivedFromServer: {}\n>>'.format(next(input_stream).name))
thread = threading.Thread(target=read_incoming)
thread.daemon = True
thread.start()
while 1:
time.sleep(1)
if __name__ == '__main__':
print('client starting ...')
run()
Below is the server.py
import random
import string
import threading
import grpc
import scenario_pb2_grpc
import scenario_pb2
import time
from concurrent import futures
clientList = []
class Scenario(scenario_pb2_grpc.ScenarioServicer):
def Chat(self, request_iterator, context):
clients = []
def stream():
while 1:
time.sleep(1)
msg = input('>>Enter message\n>>')
for i in clientList:
yield msg
output_stream = stream()
def read_incoming():
while 1:
received = next(request_iterator).name
if (context,request_iterator) not in clientList:
clientList.append((context, request_iterator))
print('receivedFromClient: {}'.format(received), len(clientList))
thread = threading.Thread(target=read_incoming)
thread.daemon = True
thread.start()
while 1:
msg = output_stream
yield scenario_pb2.DPong(name=next(msg))
if __name__ == '__main__':
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
scenario_pb2_grpc.add_ScenarioServicer_to_server(
Scenario(), server)
server.add_insecure_port('[::]:50052')
server.start()
print('listening ...')
while 1:
time.sleep(1)
So far, i have tried to maintain a list object clientList that contains the context & request_iterator object of the client, and is updated every time a new client joins the server. But how do i set these object from the clientList before sending out an outgoing message? I have tried to iterate the list but the server sends the message to the same client (the last client heard from) a number of times instead of sending it to all the clients once.
Any help is highly appreciated!
This is certainly possible. The problem that you're running into here is that each call to Scenario.Chat on the server side corresponds to a single client connection. That is, this function is called when the streaming RPC starts and as soon as the function exits, the RPC ends.
So if you want n connected clients, you'll need n instances of Scenario.Chat running concurrently, each on its own thread. This does mean that the number of concurrently connected clients is limited by the size of the threadpool with which you instantiate your server.
So, let's say you have n threads in your server process dedicated to maintaining client connections. Then you need another n+1th thread (perhaps the main thread) determining when the server will broadcast a message to all clients (maybe by looking for input from STDIN?). When this extra thread determines that a message should be broadcast, it needs to communicate this intent to all of the threads maintaining connections to a client. There are many ways to make this happen. A threading.Condition and a global collections.deque, or a collections.deque per client connection (somewhat like channels between goroutines) would be two ways. The tricky bit here is ensuring that each client connection will receive the message regardless of how long the client connection thread takes to wake up and how many messages the n+1th thread decides to send in the interim.
If this is still unclear, I can follow up with some actual code demonstrating the idea.
You can spin up multiple ports in one application.
gRPC can be running in port 50011 and flask with socket.io can be running in port 8080
with python, you can use the flask framework and flask_socketio library in your server.py
eg server.py
from flask import Flask
from flask_socketio import SocketIO, emit
app = Flask(__name__)
socketio = SocketIO(app)
#app.route('/')
def index():
return "Hello, World!"
if __name__ == '__main__':
app.run(port=8080)
app.run(debug=True)
socketio.run(app)
instead of using gRPC streaming API, use WebSocket to broadcast to all connected clients and specific/selected clients using rooms.
eg
#socketio.on('message')
def handle_message(data):
// logic to send large data in chunks the logic should call the
// emit function in socket.io and emit an event that send the large
// data in chunks eg emit('my response', chunkData)
gRPC is primarily built for one client request and response and WebSocket is for multiple clients.

How to I pass my postgres connection pool to Dask workers with psycopg2 or asyncpg?

I want my Dask workers to grab a Postgres connection from a ThreadedConnectionPool, but when passing the pool like so
from psycopg2.pool import ThreadedConnectionPool
def worker_pg(n, pool) -> None:
print(n)
work = db.from_sequence(range(4))
tcp = ThreadedConnectionPool(1, 800, "db_string")
work.map(worker_pg, pool=tcp).compute()
I get serialization errors such as:
TypeError: ('Could not serialize object of type ThreadedConnectionPool.', '<psycopg2.pool.ThreadedConnectionPool object at 0x7f99dc57b128>')
Also, while I have been trying this with psycopg2 I'd also really like this to work with asyncpg (performance reasons). However, this has the added wrinkle of using await and async from asyncio
import asyncio
import asyncpg
async def get_pool():
p = await asyncpg.create_pool("db_string")
return p
pool = asyncio.get_event_loop().run_until_complete(get_pool())
work.map(worker_pg, pool=pool).compute()
although I do seem to end up with the same type of errors like
TypeError: ('Could not serialize object of type Pool.', '<asyncpg.pool.Pool object at 0x7fdee9127818>')
Any suggestions (or alternatives?) are much appreciated!
As suggested in the comments, you might consider having each of your tasks open up a connection to Postgres, perform a query, and then close that connection.
Unfortunately Dask can not move an active database connection between machines. These objects are closely tied to the process in which they are started.
You can write a simple wrapper on top of your job to make it use it's own database connection. Here's a quick example - probably this can be optimized further according to your needs.
def my_task(conn, more_args):
"""Use psycopg2 conn to run a task"""
# Something complicated here
pass
def run_my_task(more_args):
"""Wraps my_task and gives it its own conn"""
with psycopg2.connect(...) as conn:
my_task(conn, more_args)
run_my_task = dask.delayed(run_my_task)
jobs = []
for i in range(10):
jobs.append(run_my_task(i))
dask.compute(*jobs)

Autoban asyncio client arguments

Disclaimer: This is my first time working with WS and MQTT, so structure may be wrong. Please point this out.
I am using autoban with asyncio to receive and send messages to a HA (HomeAssistant) instance through websockets.
Once my python code receives messages, I want to forward them using MQTT to AWS IoT service. This communication needs to work both ways.
I have made this work as a script where everything is floating within a file.
I am trying to make this work in a class structure, which is how my final work will be done.
In order to do that, I need my WebSocketClientProtocol to have access to AWSIoTClient .publish and .subscribe. Although WebSocketClientProtocol initialization is done through a factory, as a result I am not sure how to pass any arguments to it. For instance:
if __name__ == "__main__":
aws_iot_client = AWSIoTClient(...)
factory = WebSocketServerFactory('ws://localhost:8123/api/websocket')
factory.protocol = HomeAssistantProtocol
How can I pass aws_iot_client to HomeAssistantProtocol?
I have found examples of Autobahn - Twisted that do this using self.factory on the WebSocketClientProtocol subclass, but this is not available for asyncio.
I found that calling run_until_complete on returns transport, protocol instances, so I can then I can pass the AWS client to it.
loop = asyncio.get_event_loop()
coro = loop.create_connection(factory, '127.0.0.1', 9000)
transport, protocol = loop.run_until_complete(coro)

aiohttp Error Rate Increases with Number of Connections

I am trying to get the status code from millions of different sites, I am using asyncio and aiohttp, I run the below code with a different number of connections (yet same timeout on the request) but get very different results specifically much higher number of the following exception.
'concurrent.futures._base.TimeoutError'
The code
import pandas as pd
import asyncio
import aiohttp
out = []
CONNECTIONS = 1000
TIMEOUT = 10
async def fetch(url, session, loop):
try:
async with session.get(url,timeout=TIMEOUT) as response:
res = response.status
out.append(res)
return res
except Exception as e:
_exception = 'Error: '+str(type(e))
out.append(_exception)
return _exception
async def bound_fetch(sem, url, session, loop):
async with sem:
await fetch(url, session, loop)
async def run(urls, loop):
tasks = []
sem = asyncio.Semaphore(value=CONNECTIONS,loop=loop)
_connector = aiohttp.TCPConnector(limit=CONNECTIONS, loop=loop)
async with aiohttp.ClientSession(connector=_connector,loop=loop) as session:
for url in urls:
task = asyncio.ensure_future(bound_fetch(sem, url, session, loop))
tasks.append(task)
responses = await asyncio.gather(*tasks,return_exceptions=True)
return responses
## BEGIN ##
tlds = open('data/sample_1k.txt').read().splitlines()
urls = ['http://{}'.format(x) for x in tlds[1:]]
loop = asyncio.get_event_loop()
future = asyncio.ensure_future(run(urls,loop))
ans = loop.run_until_complete(future)
print(str(pd.Series(out).value_counts()))
Results
CONNECTIONS=1000
CONNECTIONS=100
Is this a bug? These sites do response with a status code and run sequentially or with lower connections there is no timeout error so why is this happening? The other exceptions seem stable as you change number of connections. The ClientOSErrors are from sites that actually timeout or respond, honestly don't really know where the concurrent.futures._base.TimeoutError errors are coming from.
Imagine you opened 1000 urls in browser simultaneously. I bet you'll notice many of them aren't loaded after 10 seconds. It's not a bug it's a limit of your machine resources.
More parallel requests you're doing -> less network capacity for each one, less CPU time for each one, less RAM for each one -> higher chances each request wouldn't be ready before it's timeout.
If you see there are many timeouts with 1000 connections, make less connections (and may be increase timeout). Based on aiohttp documentation using different ClientSession instancies may also help:
Unless you are connecting to a large, unknown number of different
servers over the lifetime of your application, it is suggested you use
a single session for the lifetime of your application
I've had the same issue, have a look at the details of the ClientOSErrors and you might see Too many open files, if so you need to increase the OS's number of file descriptors.
Either way, you'll get more information if you print the whole exceptions, not just their types.

Resources