Slow facebook API via Python on Google App Engine (GAE) - performance

I am fetching data from my news stream to filter it. This takes Facebook sometimes more than 5 seconds. I hit the url_fetch() timeout of Google App Engine.
Now is there any way to work around this timeout or to improve the speed with which Facebook replies to my request? This is the part where I get my exceptions:
params[u'access_token'] = self.access_token
result = json.loads(
urlfetch.fetch(
url=u'https://graph.facebook.com/me/home?limit=1000,
payload=urllib.urlencode(params),
method=urlfetch.POST,
headers={u'Content-Type': u'application/x-www-form-urlencoded'}
).content)

There's nothing you can do to speed it up - how fast it is is up to facebook. You can pass the deadline argument to URLFetch to set the maximum deadline for requests (in seconds). If you're doing a lot of calls, you probably want to look into using the asynchronous API to do calls in parallel.

I had a similar problem with a different project. You can use the mechanize library very adequately in GAE and it allows you to specify timeouts. Just copy the folder into your GAE project and you're good to go.
Use it sparingly though as long waits really drives up costs.

Related

Should I be using AJAX or WebSockets.

Oh the joyous question of HTTP vs WebSockets is at it again, however even after quit a bit of reading on the hundreds of versus blog posts, SO questions, etc, etc.. I'm still at a complete loss as to what I should be working towards for our application. In this post I will be supplying information on application functionality, and the types of requests/responses used in our application currently.
Currently our application is a sloppy piece of work, thrown together using AngularJS and AJAX requests to a Apache server running PHP, namely XAMPP. With the launch of our application I've noticed that we're having problems with response times when the server is under any kind of load. This probably has something to do with the sloppy architecture of our server, the hardware, and the fact that our MySQL database isn't exactly optimized.
However, with such a loyal fanbase and investors seeing potential in our application and giving us a chance to roll out a 2.0 I've been studying hard into how to turn this application into a powerhouse of low latency scalability. Honestly the best option would be hire someone with experience, but unfortunately I'm a hobbyist, and a one-man-army without much experience.
After some extensive research, I've decided on writing the backend using NodeJS this time. However I'm having a hard time deciding on HTTP or Websockets. Here's the types of transactions that are done between the Server/Client.
Client sends a request to the server in JSON format. The request has a few different things.
A request id (For processing logic based on the request)
The data associated with the request ID.
The server receives the request, polls the database (if necessary) and then responds to the client in JSON format. Sometimes the server is serving files to the client. Namely images in Base64 format.
Currently the application (When being used) sends a request to the server every time an interface is changed, which on average for our application is once every few seconds. Every action on our interfaces sends another request to the server. The application also sends requests to check for notifications/messages every 8 seconds, (or two seconds depending on if they're on the messaging interface).
Currently here are the benefits I see of a stated connection over a stateless connection with our application.
If the connection is stated, I can eliminate the requests for notifications and messages, as the server can just tell the client whenever one comes available. This can eliminate x(n)/4 requests per second to the server alone.
Handling something like a disconnection from the server is as simple as attempting to reconnect, opposed to handling timeouts/errors per request, this would only be handled on the socket.
Additional security can be obtained by removing security keys for database interaction, this should prevent the possibility of Hijacking(?) of a session_key and using it to manipulate or access another users data. The session_key is only needed due to there being no state in the AJAX setup.
However, I'm someone who started learning programming through TCP game server emulation. So I understand some benefits of a STATED connection, while I don't understand the benefits of a STATELESS connection very much at all. I know they both have their benefits and quirks, but I'm curious what would be the best approach for us.
We're mainly looking for Scalability, as we had a local application launch and managed to bottleneck at nearly 10,000 users in under 48 hours. Luckily I announced this as a BETA and the users are cutting me a lot of slack after learning that I did it all on my own as a learning project. I've disabled registrations while looking into improving the application's front and backend.
IMPORTANT:
If using WebSockets, would we be able to asynchronously download pictures from the server like we can with AJAX? For example, I can make 5 requests to the server using AJAX for 5 different images, and they will all start downloading immediately, using a stated connection would I have to wait for each photo to be streamed before moving to the next request? Would this only bottle-neck a single user, or every user that is waiting on a request to be completed?
It all boils down on how your application works and how it needs to scale. I would use bare WebSockets rather than any wrapper, since it is an already easy to use API and your hands won't be tied when you need to scale out.
Here some links that will give you insight, although not concrete answers to your questions because as I said, it depends on your expectations.
Hard downsides of long polling?
WebSocket/REST: Client connections?
Websockets, and identifying unique peers[PHP]
How HTML5 Web Sockets Interact With Proxy Servers
If your question is Should I use HTTP over Websockets ?, the response is: You should not.
Even if it is faster because you don't lose time opening the connection, you lose also all the HTTP specification like verbs (GET, POST, PATCH, PUT, ...), path, body, and also response, status code. This seams simple but you'll have to re-implement all or part of these protocol things.
So you should use Ajax, as long as it is one ponctual request.
When you need to make an ajax request every 2 seconds, you need in fact that the server sends you data, not YOU request server to check Api change (if changed). So this is a sign that you should implement a websocket server.

Do calls to the Google Drive Changes feed count against API quota?

I have an application that needs to stay in sync with google drive. To that end, I'm using the Changes feed that is described on this page.
I know the idea is to poll the changes feed so that I don't have to request a list of files and do a comparison. Right now I have it set to query every 30 seconds, and initiate a sync operation when the latest change number is updated. But, to make the application feel a little more responsive, I would like to query API more frequently (but still initiate a sync only when necessary)
Given that, I was wondering if requests against the Changes feed count toward the API quota? I don't want to query more frequently if it's going to double my quota consumption rate.
It looks like requests to the changes API do count toward the quota. I found the Reports section in the cloud console. It gives a detailed breakdown of requests by user, location, method, and more. Looking through the methods, I found that drive.changes.list accounts for the majority of my usage.
It's unfortunate, but better than burning through the quota with multiple calls to get the status of every file.

Proxy and Cache certain ajax requests during development process

I'm working on a single page web app which needs to load a lot of data on startup. An inital load can take up to 10 seconds, which can be quite frustrating when you just want to fix/check a minor change.
There are two ajax calls on startup which require the most time. Ideally I'd have some proxy running which can cache these calls as long as I like. It should also be possible to disable these responses easily.
This can be achieved quite easily using the AutoResponder functionality of Fiddler.
Just save the response of query and map it to a rule in the autoresponder rules.

AppEngine response time is slow

I am using a modified version of the TaskCloud example to try and read/write my own data.
While testing on a a deployed version, I've noticed that the round-trip response time is slow.
From my Android device, I have a 100ms ping response to appspot.com.
I have changed the AppEngine application to do nothing (The Google Dashboard shows insignificant Average Latency.
The problem is that the time it takes for HttpClient client .execute(post) is about 3 seconds.
(This is the time when an instance is already loaded)
Any suggestions would be greatly appreciated.
EDIT: I've watched the video of Google I/O showing the CloudTasks Android-AppEngine app, and you can see that refreshing the list (a single call to AppEngine) takes about 3 seconds as well. The guy is saying something about performance which I didn't fully get (debuggers are running at both ends?)
The video: http://www.youtube.com/watch?v=M7SxNNC429U&feature=related
Time location: 0:46:45
I'll keep investigating...
Thanks for your help so far.
EDIT 2: Back to this issue...
I've used shark packet sniffer to find out what is happening. Some of the time is spent negotiating a SSL connection for each server call. Using http (and ACSID) is faster than https (and SACSID).
new DefaultHttpClient() and new HttpPost() are used for each server call.
EDIT 3:
Looking at the sniffer logs again, there is an almost 2 seconds delay before the actual POST.
I have also found out that the issue exists with Android 2.2 (all versions) but is resolved with Android 2.3
EDIT 4: It's been resolved. Please see my answer below.
It's difficult to answer your question since no detail about your app is provided. Anyway you can try to use appstats tool provided by Google to analyze the bottleneck.
After using the Shark sniffer, I was able to understand the exact issue and I've found the answer in this question.
I have used Liudvikas Bukys's comment and solved the problem using the suggested line:
post.getParams().setBooleanParameter(CoreProtocolPNames.USE_EXPECT_CONTINUE, false);
Often the first call to your GAE app will take longer than subsequent calls. You should make yourself familiar with loading and warm-up requests and how GAE handles instances of your app: http://code.google.com/intl/de-DE/appengine/docs/adminconsole/instances.html
Some things you could also try:
make your app handle more than one request per instance (make sure your app is threadsafe!) http://code.google.com/intl/de-DE/appengine/docs/java/config/appconfig.html#Using_Concurrent_Requests
enable always on feature in app admin (this will cost you)

AJAX Polling Question - Blocking Or Frequent?

I have a web application that relies on very "live" data - so it needs an update every 1 second if something has changed.
I was wondering what the pros and cons of the following solutions are.
Solution 1 - Poll A Lot
So every 1 second, I send a request to the server and get back some data. Once I have the data, I wait for 1 second before doing it all again. I would detect client-side if the state had changed and take action appropriately.
Solution 2 - Block A Lot
So I start a request to the server that will time-out after 30 seconds. The server keeps an eye on the data on the server by checking it once per second. If the server notices the data has changed it sends the data back to the client, which takes action appropriately.
Scenario
Essentially, the data is reasonably small in size, but changes at random intervals based on live events. The thing is, the web UI will be running something in the region of 2,000 instances, so do I have 2,000 requests per second coming from the UI or do I have 2,000 long-running requests that take up to 30 seconds?
Help and advice would be much appreciated, especially if you have worked with AJAX requests under similar volumes.
One common solution for such cases is to use static json files. Server-side scripts update them when the data is changed and they are served by fast and light webserver (like nginx). Since files are static and small - webserver will do that right in cache, in very fast manner.
Consider a better architecture. Implementing this kind of messaging system is trivial to do right in something like nodeJS. Message dispatch will be instantaneous, and you won't need to poll for your data on either side.
You don't need to rewrite your whole system: The data producer could simply POST the updates to the nodeJS server instead of writing them to a file, and as a bonus, you don't even need to waste time on disk IO.
If you started without knowing any nodeJS, you could still be done in a couple hours, because you can just hack up the chat example.
I can't comment yet, but I would agree with geocar. Running live or almost live web services with just polling will be solution stuck between a rock and a hard place.
You could also look into web sockets to allow push as this sounds a better solution for this than just updating every second to 30 seconds.
Good luck!

Resources