Pyppeteer - timeout = 1000 ms but page.goto still hangs forever

Pyppeteer - timeout = 1000 ms but page.goto still hangs forever - python-asyncio

I noticed that one url makes pyppeteer hang forever. That's weird because I've set timeout. Do you know where is the problem? Or is it asyncio problem?
async def test():
url = 'https://ig.com.br/'
browser = await launch(headless=True)
page = await browser.newPage()
page.setDefaultNavigationTimeout(1000)
await page.setUserAgent(fake_useragent.UserAgent().random)
await page.goto(url, {'waitUntil': 'networkidle0','timeout':1000})
await browser.close()
asyncio.run(test())
Other websites work well - it timeouts because the timeout value is too small. But this website makes it hang forever.
Do you know why? I want to make sure that any website will either be fetched under 20 seconds or raise error.

Related

Rate-limitting aiohttp.ClientSession() to make N requests per second

I want to create an asynchronous SDK using aiohttp client for our service. I haven't been able to figure out how to throttle the ClientSession() to make only N requests per second.
class AsyncHTTPClient:
def __init__(self, api_token, per_second_limit=10):
self._client = aiohttp.ClienSession(
headers = {"Authorization": "Bearer f{api_token}"}
)
self._throttler = asyncio.Semaphore(per_second_limit)
async def _make_request(self, method, url, **kwargs):
async with self._throttler:
return await self._client.request(method, url, **kwargs)
async def get(self, url, **params):
return await self._make_request("GET", url, **params)
async def close(self):
await self._client.close()
I have this class with get, post, patch, put, delete methods implemented as a call to _make_request.
#As a user of the SDK I run the following code.
async def main():
try:
urls = [some_url * 100]
client = AsyncHTTPClient(my_token, per_second_limit=20)
await asyncio.gather(*[client.get(url) for url in urls])
finally:
await client.close()
asyncio.run(main())
asyncio.Semaphore limits the concurrency. That is, when the main() function is called, the async with self._throttler used in client._make_request limits concurrency to 20 requests. However, if the 20 requests finished within 1 second, then requests will be continuously made. What I want to do is make sure that only N requests (i.e. 20) are made in a second. If all 20 requests finished in 0.8 seconds, then sleep for 0.2 seconds and then process the requests again.
I looked up some asyncio.Queue examples with workers example but I am not sure how I will I implement it in my SDK since creating workers will have to be done by the user using this SDK and I want to avoid that, I want AsyncHTTPClient to handle the requests per second limit.
Any suggestions/advise/samples will be greatly appreciated.

Asyncio wait not running all coroutines

I have a speech to text client set up as follows. Client sends audio packets to server and server returns the text results back, which client prints on stdout.
async def record_audio(websocket):
# Audio recording parameters
rate = 16000
chunk = int(rate / 10) # 100ms
with BufferedMicrophoneStream(rate, chunk) as stream:
audio_generator = stream.generator() # Buffer is a asynchronous queue instance
for message in audio_generator:
await websocket.send(message)
async def collect_results(websocket):
async for message in websocket:
print(message)
async def combine():
uri = "ws://localhost:8765"
async with websockets.connect(uri) as websocket:
await asyncio.wait([
record_audio(websocket),
collect_results(websocket)
])
def main():
loop = asyncio.get_event_loop()
loop.run_until_complete(combine())
loop.close()
As you can see, both the coroutines are infinite loops.
When I run the program, the server runs correctly for either of the waited coroutines, that is, if I only pass the record_audio or collect_results, I am able to confirm that they work individually, but not simultaneously.
However, if I put a asyncio.sleep(10) statement in record_audio inside the loop, then I do see output from collect_results and audio chunks are sent to server in a burst every 10th second.
What gives?
Thanks.
Update #1:
I replaced the above code with the following, still no avail:
async with websockets.connect(uri) as websocket:
futures = [
await loop.run_in_executor(executor, record_audio, websocket),
await loop.run_in_executor(executor, collect_results, websocket)
]
await asyncio.gather(*futures)

aiohttp: separate request sending from response waiting

I have a specific use case where I need to send an HTTP request ASAP, but cannot wait for the HTTP response to come back before doing some necessary work.
Conceptually, I need to do this:
async with aiohttp.ClientSession() as session:
request = await session.send_request(method='GET', url='https://httpbin.org/get')
# do some necessary work
response = await request.get_response()
# process response...
The problem with the simple plain way is, I can send the HTTP request as soon as I want, but I cannot yield while waiting for the response:
async with aiohttp.ClientSession() as session:
# this blocks until both the request is sent AND response arrived
response = await session.request(method='GET', url='https://httpbin.org/get')
# process response...
I tried to spin up a new coroutine so as to not have to wait for the HTTP response to arrive:
async def foo(url):
async with aiohttp.ClientSession() as session:
response = await session.request(method='GET', url=url)
# process response...
asyncio.create_task(foo('https://httpbin.org/get'))
# do necessary work
but then since create_task() occurs at "the first chance the event loop gets", it's sometimes half a second or a second after I call create_task(), which is too slow for my purpose.
My question(s):
(a) is there a way to separate HTTP request sending from HTTP response waiting in aiohttp?
(b) if not, can you suggest an alternative way to send HTTP request ASAP but to await the response asynchronously?
Thanks!
Update #1
From #Isabi's suggestion in the comments, I tried only using await after the necessary work is done, but the HTTP request is never sent until the await is used, e.g.:
async with aiohttp.ClientSession() as session:
# send out an HTTP request that takes ~2 seconds before response comes back
request = session.request(method='GET', url='https://httpbin.org/delay/2')
await asyncio.sleep(4) # simulate 4 seconds of necessary work
# the following line still takes 2 seconds, indicating the request
# didnt go out before `await` is used
response = await request
# process response...
Update #2
I worked out a way that makes my application behave the way I want it (send the HTTP request ASAP, but don't block waiting for the HTTP response). The solution uses a call to asyncio.sleep(0), inspired from this thread. However, it is not aesthetically satisfying:
async def foo(url):
async with aiohttp.ClientSession() as session:
response = await session.request(method='GET', url=url)
# process response...
asyncio.create_task(foo('https://httpbin.org/get'))
await asyncio.sleep(0)
# do necessary work
It doesn't feel right to me that what should be a not uncommon use case requires such an inelegant solution. Am I missing something?

Are you sure the task is being run half a second or even one second later? Because that should not be the case unless the loop is busy and the loop shouldn't be busy unless it's under heavy load or you have blocking code running at the same time.
You can use logging to check exactly when the request is being sent and when it's received:
import asyncio
import aiohttp
import logging
logging.basicConfig(format="%(asctime)s.%(msecs)03d %(levelname)s %(message)s", datefmt="%Y-%m-%d %H:%M:%S", level=logging.INFO)
async def foo(url):
async with aiohttp.ClientSession() as session:
logging.info("request started")
response = await session.request(method="GET", url=url)
logging.info("response received")
text = await response.text()
logging.info(f"response read {len(text)} bytes")
# process response...
async def test():
logging.info("sleep 0.1")
await asyncio.sleep(0.1)
logging.info("create task")
asyncio.create_task(foo("https://httpbin.org/get"))
logging.info("task created, sleep 2")
await asyncio.sleep(2)
logging.info("finished")
if __name__== "__main__":
asyncio.get_event_loop().run_until_complete(test())
Output:
2020-06-10 10:52:00.017 INFO sleep 0.1
2020-06-10 10:52:00.118 INFO create task
2020-06-10 10:52:00.118 INFO task created, sleep 2
2020-06-10 10:52:00.119 INFO request started
2020-06-10 10:52:00.621 INFO response received
2020-06-10 10:52:00.622 INFO response read 308 bytes
2020-06-10 10:52:02.121 INFO finished
Notice the coroutine starts running about 1 ms after creating it and the HTTP request takes about 0.5 seconds to complete; since 0.5 value is close to what you were seeing I believe you were measuring the time to complete the request and not the time to start the request.

Aiohttp: Server & Client in one time

I try to use aiohttp 3.6.2 both server and client:
For webhook perform work:
1) Get JSON-request from service
2) Fast send HTTP 200 OK back to service
3) Made additional work after: make http-request to slow web-service(answer 2-5 sec)
I dont understand how to perform work after view(or handler) returned web.Response(text="OK")?
Current view:
(it's slow cause slow http_request perform before response)
view.py:
async def make_http_request(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as resp:
print(await resp.text())
async def work_on_request(request):
url = (await request.json())['url']
await make_http_request(url)
return aiohttp.web.Response(text='all ok')
routes.py:
from views import work_on_request
def setup_routes(app):
app.router.add_get('/', work_on_request)
server.py:
from aiohttp import web
from routes import setup_routes
import asyncio
app = web.Application()
setup_routes(app)
web.run_app(app)
So, workaround for me is to start one more thread with different event_loop, or may be you know how to add some work to current event loop?

Already not actual, cause i found desicion to add one more task to main event_loop:
//additionaly i created one global queue to interoperate coroutine between each other.
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
queue = asyncio.Queue(maxsize=100000)
loop.create_task(worker('Worker1', queue))
app = web.Application()
app['global_queue'] = queue

Why does using asyncio.ensure_future for long jobs instead of await run so much quicker?

I am downloading jsons from an api and am using the asyncio module. The crux of my question is, with the following event loop as implemented as this:
loop = asyncio.get_event_loop()
main_task = asyncio.ensure_future( klass.download_all() )
loop.run_until_complete( main_task )
and download_all() implemented like this instance method of a class, which already has downloader objects created and available to it, and thus calls each respective download method:
async def download_all(self):
""" Builds the coroutines, uses asyncio.wait, then sifts for those still pending, loops """
ret = []
async with aiohttp.ClientSession() as session:
pending = []
for downloader in self._downloaders:
pending.append( asyncio.ensure_future( downloader.download(session) ) )
while pending:
dne, pnding= await asyncio.wait(pending)
ret.extend( [d.result() for d in dne] )
# Get all the tasks, cannot use "pnding"
tasks = asyncio.Task.all_tasks()
pending = [tks for tks in tasks if not tks.done()]
# Exclude the one that we know hasn't ended yet (UGLY)
pending = [t for t in pending if not t._coro.__name__ == self.download_all.__name__]
return ret
Why is it, that in the downloaders' download methods, when instead of the await syntax, I choose to do asyncio.ensure_future instead, it runs way faster, that is more seemingly "asynchronously" as I can see from the logs.
This works because of the way I have set up detecting all the tasks that are still pending, and not letting the download_all method complete, and keep calling asyncio.wait.
I thought that the await keyword allowed the event loop mechanism to do its thing and share resources efficiently? How come doing it this way is faster? Is there something wrong with it? For example:
async def download(self, session):
async with session.request(self.method, self.url, params=self.params) as response:
response_json = await response.json()
# Not using await here, as I am "supposed" to
asyncio.ensure_future( self.write(response_json, self.path) )
return response_json
async def write(self, res_json, path):
# using aiofiles to write, but it doesn't (seem to?) support direct json
# so converting to raw text first
txt_contents = json.dumps(res_json, **self.json_dumps_kwargs);
async with aiofiles.open(path, 'w') as f:
await f.write(txt_contents)
With full code implemented and a real API, I was able to download 44 resources in 34 seconds, but when using await it took more than three minutes (I actually gave up as it was taking so long).

When you do await in each iteration of for loop it will await to download every iteration.
When you do ensure_future on the other hand it doesn't it creates task to download all the files and then awaits all of them in second loop.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Pyppeteer - timeout = 1000 ms but page.goto still hangs forever - python-asyncio

Related

Rate-limitting aiohttp.ClientSession() to make N requests per second

Asyncio wait not running all coroutines

aiohttp: separate request sending from response waiting

Aiohttp: Server & Client in one time

Why does using asyncio.ensure_future for long jobs instead of await run so much quicker?

Categories

Resources