aiohttp timeout doesn't work properly - python-asyncio

I have code, that make http-requests to sites (using aiohttp) with async_timeout. If I run all requests together, then some requests are raising TimeoutError (even if timeout=20s.). But if I run one request -- it works.
def coro(url):
with async_timeout.timeout(TIMEOUT, loop=loop):
async with session.get(url) as response:
text, status = (await response.text()), response.status
...
Is this async_timeout problem/bug or my?
I tried to use TCPConnector (aiohttp.TCPConnector(limit=None, verify_ssl=False, loop=loop)), but it doesn't work

There is nothing strange if a request takes more than 20 sec in case of very large requests amount (and this request is much faster when executed alone).
To make sure just insert timestamp printouts before and after .get()/.text() execution.
Timeout's code is deadly simple and highly tested, don't suspect an error in it.

Related

python request get invalid url lighting speed

I have a list of 10^6 url I want to check against the status code.
The things is the requests.get is too slow for me with timeout specified and sometimes I can not be sure if url is valid or not even with 1 second timeout (let's say server response is slow).
So, currently I do:
import request
url = "https://dupa.ucho.elo.8"
r = requests.get(url, headers={'Connection': 'close'}, timeout=1)
How to quickly check if url is valid or not without setting timeout and instantly return response for invalid URLs?
Note1: I want to avoid grequests module.
Note2: I do not want to use multithreading.
I have read this https://stackoverflow.com/questions/17782142/why-doesnt-requests-get-return-what-is-the-default-timeout-that-requests-geta but it involves timeout set.
While this might not give you lightning speed due to avoiding multithreading, you can check whether the response of the URL contains what you want to see (200 status code) and terminate it right after.
import requests
import sys
url_list = ['http://google12121.com/','https://google.com/']
for url in url_list:
try:
response = requests.get(url)
if "200" in str(response.status_code):
print("Yes")
else:
print("No")
except:
print("Error: "+str(sys.exc_info()[0]))
continue
You might want to write a more specific error catching logic because generally catching all errors is bad.

How to access a python object from a previous HTTP request?

I have some confusion about how to design an asynchronous part of a web app. My setup is simple; a visitor uploads a file, a bunch of computation is done on the file, and the results are returned. Right now I'm doing this all in one request. There is no user model and the file is not stored on disk.
I'd like to change it so that the results are delivered in two parts. The first part comes back with the request response because it's fast. The second part might be heavy computation and a lot of data, so I want it to load asynchronously, whenever it's done. What's a good way to do this?
Here are some things I do know about this. Usually, asynchronicity is done with ajax requests. The request will be to some route, let's say /results. In my controller, there'll be a method written to respond to /results. But this method will no longer have any information from the previous request, because HTTP is stateless. To get around this, people pass info through the request. I could either pass all the data through the request, or I could pass an id which the controller would use to look up the data somewhere else.
My data is a big python object (a pandas DataFrame). I don't want to pass it through the network. If I use an id, the controller will have to look it up somewhere. I'd rather not spin up a database just for these short durations, and I'd also rather not convert it out of python and write to disk. How else can I give the ajax request access to the python object across requests?
My only idea so far is to have the initial request trigger my framework to render a second route, /uuid/slow_results. This would be served until the ajax request hits it. I think this would work, but it feels pretty ad hoc and unnatural.
Is this a reasonable solution? Is there another method I don't know? Or should I bite the bullet and use one of the aforementioned solutions?
(I'm using the web framework Flask, though this question is probably framework agnostic.
PS: I'm trying to get better at writing SO questions, so let me know how to improve it.)
So if your app is only being served by one Python process, you could just have a global object that's a map from ids to DataFrames, but you'd also need some way of expiring them out of the map so you don't leak memory.
So if your app is running on multiple machines, you're screwed. If your app is just running on one machine, it might be sitting behind apache or something and then apache might spawn multiple Python processes and you'd still be screwed? I think you'd find out by doing ps aux and counting instances of python.
Serializing to a temporary file or database are fine choices in general, but if you don't like either in this case and don't want to set up e.g. Celery just for this one thing, then multiprocessing.connection is probably the tool for the job. Copying and lightly modifying from here, the box running your webserver (or another, if you want) would have another process that runs this:
from multiprocessing.connection import Listener
import traceback
RESULTS = dict()
def do_thing(data):
return "your stuff"
def worker_client(conn):
try:
while True:
msg = conn.recv()
if msg['type'] == 'answer': # request for calculated result
answer = RESULTS.get(msg['id'])
conn.send(answer)
if answer:
del RESULTS[msg['id']]
else:
conn.send("doing thing on {}".format(msg['id']))
RESULTS[msg['id']] = do_thing(msg)
except EOFError:
print('Connection closed')
def job_server(address, authkey):
serv = Listener(address, authkey=authkey)
while True:
try:
client = serv.accept()
worker_client(client)
except Exception:
traceback.print_exc()
if __name__ == '__main__':
job_server(('', 25000), authkey=b'Alex Altair')
and then your web app would include:
from multiprocessing.connection import Client
client = Client(('localhost', 25000), authkey=b'Alex Altair')
def respond(request):
client.send(request)
return client.recv()
Design could probably be improved but that's the basic idea.

AJAX (XmlHttpRequest) timeout length by browser

I've been scouring the web trying to find a straight answer to this. Does anyone know the default timeout lengths for ajax request by browser? Also by version if it's changed?
According to the specs, the timeout value defaults to zero, which means there is no timeout. However, you can set a timeout value on the XHR.timeout property; the value is in milliseconds.
Sources:
http://www.w3.org/TR/2011/WD-XMLHttpRequest2-20110816/#the-timeout-attribute
http://msdn.microsoft.com/en-us/library/cc304105(v=vs.85).aspx
I don't think browsers have a timeout for AJAX, there is only synchronous or asynchronous requests; synchronous - first freezes the JavaScript execution until the request returns,
asynchronous - does not freeze JavaScript execution, it simply takes the request out of the execution flow, and if you have a callback function it will execute the the function in parallel with the running scripts (similar to a thread)
**sync flow:**
running JS script
|
ajax
(wait for response)
|
execute callback
|
running JS script
**async flow:**
running JS script
|
ajax --------------------
| |
running JS script execute callback
I did a modest amount of testing. To test I loaded my website, stopped the local server and then attempted an AJAX request. I set the timeout to something low like 1000ms until I could ensure I had minimal code (you must put the xhr.timeout after open and before send).
Once I got it working my initial goal was to determine the appropriate amount of time to allow however I was surprised how quickly the timeout would be outright ignored by browsers. My goal reformed in to trying to determine what the maximum timeout could be before error handling was no longer viable.That means past these fairly short spans of time your timeout handler script will not work at all. What I found was pretty pathetic.
Chrome 60: 995ms, 996ms will throw a dirty evil error in to the console.
Firefox 52 ESR: ~3000ms, position of mouse or other issue may cause no response around or just under three seconds.
So...
xhr.open(method,url,true);
xhr.timeout = 995;//REALLY short
xhr.send(null);
xhr.ontimeout = function ()
{
//Code will only execute if at or below *effective* timeouts list above.
//Good spot to make a second attempt.
}
So if your timeout is set higher than 995ms Chrome will ignore your code and puke on your nice clean empty console that you worked hard to keep clean. Firefox is not much better and there are unreliable requests that just timeout for well beyond any patience I have and in doing so ignore the ontimeout handler.
Browser does have a timeout value, behavior depends upon browser chrome has timeout value of 5 minutes and after 5 minutes it does resend ajax call

Ruby Eventmachine queueing problem

I have a Http client written in Ruby that can make synchronous requests to URLs. However, to quickly execute multiple requests I decided to use Eventmachine. The idea is to
queue all the requests and execute them using eventmachine.
class EventMachineBackend
...
...
def execute(request)
$q ||= EM.Queue.new
$q.push(request)
$q.pop {|request| request.invoke}
EM.run{EM.next_tick {EM.stop}}
end
...
end
Forgive my use of a global queue variable. I will refactor it later. Is what I am doing in EventMachineBackend#execute the right way of using Eventmachine queues?
One problem I see in my implementation is it is essentially synchronous. I push a request, pop and execute the request and wait for it to complete.
Could anyone suggest a better implementation.
Your the request logic has to be asynchronous for it to work with EventMachine, I suggest that you use em-http-request. You can find an example on how to use it here, it shows how to run the requests in parallel. An even better interface for running multiple connections in parallel is the MultiRequest class from the same gem.
If you want to queue requests and only run a fixed number of them in parallel you can do something like this:
EM.run do
urls = [...] # regular array with URLs
active_requests = 0
# this routine will be used as callback and will
# be run when each request finishes
when_done = proc do
active_requests -= 1
if urls.empty? && active_requests == 0
# if there are no more urls, and there are no active
# requests it means we're done, so shut down the reactor
EM.stop
elsif !urls.empty?
# if there are more urls launch a new request
launch_next.call
end
end
# this routine launches a request
launch_next = proc do
# get the next url to fetch
url = urls.pop
# launch the request, and register the callback
request = EM::HttpRequest.new(url).get
request.callback(&when_done)
request.errback(&when_done)
# increment the number of active requests, this
# is important since it will tell us when all requests
# are done
active_requests += 1
end
# launch three requests in parallel, each will launch
# a new requests when done, so there will always be
# three requests active at any one time, unless there
# are no more urls to fetch
3.times do
launch_next.call
end
end
Caveat emptor, there may very well be some detail I've missed in the code above.
If you think it's hard to follow the logic in my example, welcome to the world of evented programming. It's really tricky to write readable evented code. It all goes backwards. Sometimes it helps to start reading from the end.
I've assumed that you don't want to add more requests after you've started downloading, it doesn't look like it from the code in your question, but should you want to you can rewrite my code to use an EM::Queue instead of a regular array, and remove the part that does EM.stop, since you will not be stopping. You can probably remove the code that keeps track of the number of active requests too, since that's not relevant. The important part would look something like this:
launch_next = proc do
urls.pop do |url|
request = EM::HttpRequest.new(url).get
request.callback(&launch_next)
request.errback(&launch_next)
end
end
Also, bear in mind that my code doesn't actually do anything with the response. The response will be passed as an argument to the when_done routine (in the first example). I also do the same thing for success and error, which you may not want to do in a real application.

What's the fastest way for a true sinatra(ruby/rack) after_filter?

Okay it's a simple task. After I render html to the client I want to execute a db call with information from the request.
I am using sinatra because it's a lightweight microframework, but really i up for anything in ruby, if it's faster/easier(Rack?). I just want to get the url and redirect the client somewhere else based on the url.
So how does one go about with rack/sinatra a real after_filter. And by after_filter I mean after response is sent to the client. Or is that just not dooable without threads?
I forked sinatra and added after filters, but there is no way to flush the response, even send_data which is suppose to stream files(which is obviously for binary) waits for the after_filters.
I've seen this question: Multipart-response-in-ruby but the answer is for rails. And i am not sure if it really flushes the response to the client and then allows for processing afterwards.
Rack::Callbacks has some before and after callbacks but even those look like they would run before the response is ever sent to the client here's Rack::Callbacks implementation(added comment):
def call(env)
#before.each {|c| c.call(env) }
response = #app.call(env)
#after.each {|c| c.call(env) }
response
#i am guessing when this method returns then the response is sent to the client.
end
So i know I could call a background task through the shell with rake. But it would be nice not to have too... Also there is NeverBlock but is that good for executing a separate process without delaying the response or would it still make the app wait as whole(i think it would)?
I know this is a lot, but in short it's simple after_filter that really runs after the response is sent in ruby/sinatra/rack.
Thanks for reading or answering my question! :-)
Modified run_later port to rails to do the trick the file is available here:
http://github.com/pmamediagroup/sinatra_run_later/tree/master

Resources