I am trying to figure out how to use long polling to trigger a webpage refresh (the entire page as opposed to just a single section). Although it would be nicer to just update part of the page instead of a single section, I would rather just get down the initial page refresh part and then move on from there. Having said that, I was wondering if anyone would be able to point me in the right direction as to how I can go about doing this? I have been searching for examples of long polling online, but unfortunately have not been able to find anything similar to this yet. Pretty much I would have a webpage which I could remotely refresh using long polling based on some condition on the server (apache on debian), so for instance if I had a bash script based cgi page that showed am or pm based on the server time, when the time on the server changes from am to pm or vice versa, the server would trigger a page refresh on the client side so the cgi page would reload and display the correct data.
Well first of all. if you do long polling requests you need to keep in mind, that there will be an open connection to your server for each page that is viewed in the browsers.
That requires that your server infrastructure is able to handle this without huge memory consumption and wont run out of free connections to handle the long polling request.
i don't assume you use php but it is an good example: so if you have apache with php module, there is on the one hand a limit of maximum connection by configuration of apache and on the other hand for each connection the whole php module is loaded which uses much memory if you have many page views. if you use php-fpm as fcgi, there is also a maximum number of available clients, and you also don't want to increase this number over a certain limit.
so generally i would suggest not to use long polling request for public websites, if you don't have a good server backend that has some nice logic for handling this.
depending on the requirements you could think of the following solution, if you know in which intervals that page should check for refresh:
you could add a attribute data-check-for-refresh-at and data-modified-at to your html node:
<html data-check-for-refresh-at='2013-02-04 12:00:00 GMT' data-modified-at='2013-01-01 12:00:00 GMT'>
parse this with javascript and then do a refresh check at this time submitting the modified-at time with that request. if the content changed you will submit the new content, and the next time when the client should check for updates.
another thing that is important that you should add a random offset to this refresh time by the client, otherwise you probably DDOS yourself. because all clients would send a refresh request at the same time.
EDIT (Based on comments)
First a short explanation how it should be done for real system:
The server should not use one threads or processes per connection, instead it should use the event driven approach (registering callbacks to be informed if streams are ready to read or write). then if a long polling request arrives the server stores the information about which changes the client wants to be informed. then the connection is sleeping there are no cpu circles wasted for that connection anymore until client needs to be informed, also the memory usage is quite low. then if a url changed the server will be informed that is should notify all clients that listen to changes of this url. The server then will submit the responses to clients (a publication subscription system). depending on the number of clients to be notified the notifications should probably be queued and handled in an intelligent way, so that you would have a better balancing of the outgoing traffic. With this approach you will more likely run into the maximum allowed openports/filedescriptor problem then having problems with cpu or memory usage.
Of course this is a very simplistic description, but I think it is sufficient to get ene idea how it would be implemented.
Quick&Dirty Solution
It is more pseudo code then real code, so this would not work with copy and past, also it is assumed that the server creates the files for $notificationFile before any long polling request arrives):
The long polling request will call a php script like this:
set_time_limit(0);
/*
$urlToCheck and $modificationTimeToCheckAgainst should be initialized by the values send by client as parameter for the long polling request
$someTime should be the maximum time the long polling request should be keept alive
*/
$forceResponseTimeout = microtime(true) + $someTime;
$urlToCheck = "the/url/to/observe.html";
$modificationTimeToCheckAgainst = "2013-02-05 00:00:00"; //should be the time in seconds (not a real date)
$notificationFile = "./tmp/observer-file-".sha1($urlToCheck);
$responseStatus = "did-not-change";
while( microtime(true) < $forceResponseTimeout ) {
clearstatcache(); //need to clear cache otherwise we don't have the right modification date (also not the beast idea to keep cpu usage low)
if( filemtime(".update-check-file-".sha1($pathToCheck)) > $modificationTimeToCheckAgainst ) {
$responseStatus = "changed";
break;
}
usleep(100); //this is a bad idea because it creates a high cpu usage, even with the sleep
}
echo $responseStatus; //here some json response should be created, the client then gets the information if it should resend the long polling request or if it should do a refresh.
The update script should look like this:
$urlThatIsUpdated = "the/url/to/observe.html";
//doing the update of the file
$notificationFile = "./tmp/observer-file-".sha1($urlThatIsUpdated);
touch($notificationFile); //updates the modification time of the notification file, which should be recognized by the script above.
I just read that some browsers would prevent HTTP polling (I guess by limiting the rate of requests)...
From https://github.com/sstrigler/JSJaC:
Note: As security restrictions of most modern browsers prevent HTTP
Polling from being usable anymore this module is disabled by default
now. If you want to compile it in use 'make polling'.
This could explain some misbehavior of some of my JavaScripts (sometimes requests are just not sent or retried, even if they were actually successful). But I couldn't find further information on details..
Questions
if it's "max. number of requests n per x seconds", what are the usual/default settings for x and n?
Is there any way good resource for this?
Any way to detect if a request has been "delayed" or "rejected" because of a rate limit?
Thanks for your help...
Stefan
Yes, as far as I am aware there is a default pool limit of 10 and a default request timeout of 30 seconds per request, however the timeout and poll limits can be controlled and different browsers implement different limitations!
Check out this Google implementation.
and this is an awesome implementation of catching a timeout error!
You can find the Firefox specifics HERE!
Internet Explorer specifics are controlled from inside the Windows registry.
Also have a look at this question.
Basically, the way you control is not by changing the browser limitations, but by abiding them. So you apply a technique called throttle-ing.
Think of it as creating a FIFO/priority queue of functions. A queue struct that takes xhr requests as members and enforces delay between them is an Xhr Poll. For instance, I am using
Jsonp to get data from a node.js server located on another domain and I am polling of course due to browser limitations. Otherwise, I get zero response back from the server and that is only because of browser limitations.
I am actually doing a console log for every request that's supposed to be sent, but not all of them are being logged. So the browser limits them.
I'll be even more specific with helping you out. I have a page on my website which is supposed to render a view for tens or even hundreds of articles. You go through them using a cool horizontal slider.
The current value of the slider matches the currrent 'page'. Since I am only displaying 5 articles per page and I can't exactly load thousands of articles 'onload' without severe performance implications, I load the articles for the current page. I get them from a MongoDB by sending a cross-domain request to a Python script.
The script is supposed to return an array of five objects with all the details I need to build the DOM elements for a 'page'. However, there are a couple of issues.
First, the slider works extremely fast, as it's more or less a value change. Even if there is drag drop functionality, key down events etc, the actual change takes miliseconds. However, the code of the slider looks something like this:
goog.events.listen(slider, goog.events.EventType.CHANGE, function() {
myProject.Articles.page(slider.getValue());
}
The slider.getValue() method returns an int with the current page number, so basically I have to load from:
currentPage * articlesPerPage to (currentPage * articlesPerPage + 1) - 1
But in order to load, i do something like this:
I have a storage engine(think of it as an array):
I check if the content is not already there
If it is, there is no point to make another request, so go forward with getting the DOM elements from the array with the already created DOM elements in place.
If it isn't, then I need to get it so I need to send that request I was mentioning, which would look something like(without accounting for browser limitations):
JSONP.send({'action':'getMeSomeArticles','start':start,'length': itemsPerPage, function(callback){
// now I just parse the callback quickly to make sure it is consistent
// create DOM elements, and populate the client side storage
// and update the view for the user.
}}
The problem comes from the speed with which you can change that slider. Since every change supposedly triggers a request(same would happen for normal Xhr requests), then you are basically crossing the limitations of all browsers, so without throttle-ing, there would be no 'callback' for most of the requests. 'callback' is the JS code returned by the JSONP request(which is more of a remote script inclusion than anything else).
So what I do is push a request to a priority queue, not POLL, as now I don't need to send multiple simultaneous requests. If the queue is empty, the recently added member is executed and everyone is happy. If it's not, then all non-completed requests in progress are cancelled and only the last one is executed.
Now in my particular case, I do a binary search(0(log n)) to see if the storage engine doesn't have data for the previous requests yet, which tells me if the previous request has been completed or not. If it has, then it's removed from the queue and the current one is processed, otherwise the new one fires. So an and so forth.
Again, for speed consideration and shit browser wanna-bes such as Internet Explorer, I do the above described procedure about 3-4 steps ahead. So I pre-load 20 pages ahead till everything is the client side storage engine. This way, every limitation is successfully dealt with.
The cooldown time is covered by the minimum time it would take to slide through 20 pages and the throttle-ing makes sure there are no more than 1 active requests at any given time(with backwards compatibility going as far as Internet Explorer 5).
The reason why I wrote all this is to give you an example trying to say that you cannot always enforce delay directly from the FIFO structure, as your calls may need to turn into what a user sees, and you don't exactly want to make a user wait 10-15 seconds for a single page to render.
Also, always minimize the polling and the need to poll(simultaneously fired Ajax events, as not all browsers actually do good things with them). For instance, instead of doing something like sending one request to get content and sending another for that content to be tracked as viewed in your app metrics, do as many tasks at server level as you possibly can!
Of course, you probably want to track your errors properly, so your Xhr object from your library of choice implement error handling for ajax and because you are an awesome developer you want to make use of them.
so say you have a try - catch block in place
The scenario is this:
An Ajax call has finished and it's supposed to return a JSON, but the call somehow failed. However, you try to parse the JSON and do whatever you need to do with it.
so
function onAjaxSuccess (ajaxResponse) {
try {
var yourObj = JSON.parse(ajaxRespose);
} catch (err) {
// Now I've actually seen this on a number of occasions, to log that an error occur
// a lot of developers will attempt to send yet another ajax request to log the
// failure of the previous one.
// for these reasons, workers exist.
myProject.worker.message('preferrably a pre-determined error code should go here');
// Then only the worker should again throttle and poll the ajax requests that log the
//specific error.
};
};
While I have seen various implementations that try to fire as many Xhr requests at the same time as they possible can until they encounter browser limitations, then do quite a good job at stalling the ones that haven't fired in wait for the browser 'cooldown', what I can advise you is to think about the following:
How important is speed for your app?
Just how scalable and how intensive the I/O will be?
If the answer to the first one is 'very' and to the latter 'OMFG modern technology', then try to optimize your code and architecture as much as you can so that you never need to send 10 simultaneous Xhr requests. Also, for large scale apps, multi-thread your processes. The JavaScript way to accomplish that is by using workers. Or you could call the ECMA board, tell them to make this a default, and then post it here so that the rest of us JS devs can enjoy native multi-threading in JS:)(how dafuq did they not think about this?!?!)
Stefan, quick answers below:
-if it's "max. number of requests n per x seconds", what are the usual/default settings for x and n?
This sounds more like a server restriction. The browser ones usually sound like:
-"the maximum requests for the same hostname is x"
-"the maximum connections for ANY hostname is y"
-Is there any way good resource for this?
http://www.browserscope.org/?category=network (also hover over table headers to see what is measured)
http://www.stevesouders.com/blog/2008/03/20/roundup-on-parallel-connections
-Any way to detect if a request has been "delayed" or "rejected" because of a rate limit?
You could look at the http headers for "Connection: close" to detect server restrictions but I am not aware of being able in JavaScript to read settings from so many browsers in a consistent, browser-independent way. (For Firefox, you could read this http://support.mozilla.org/en-US/questions/746848)
Hope this quick answer helps?
No, browser does not in any way affect polling. I think what was meant on that page is the same origin policy - you can only access the same host and port as your original page.
Only known limitation to connections themselves is that you usually can only have from two to four simultaneous connections to the same host.
I've written some apps with long poll, some with C++ backend with my own webserver, and one with PHP backend with Apache2.
My long poll timeout is 4..10 s. When something occurs, or 4..10 s passes, my server returns an empty response. Then the client immediatelly starts another AJAX request. I found that some browsers hangs up when I start AJAX call from previous AJAX handler, so I am using setTimeout() with a small value to start the next AJAX request.
When something happens on the client side, which should be sent to server, I use another AJAX request for it, but it's a one-way thing: the server does not send any response, and the client does not process anything. The result of the operation (if any) will be received on the long poll. It requires max. 2 connection to the server, which all browsers supports.
Keep in mind, that if there's 500 client, it means 500 server-side webserver thread, which will move together, occurring load peaks, because when something happens, the server have to report it at the same time for each clients, the clients will process it near same time long, they will start the next long request in the same time, and from then, the timeout will expire also at the same time, and furthcoming ones too. You can trick with rnd timeout, say 4 rnd(0..4), but it's worthless, if anything happens, they will "sync" again, all the request have to be served at the same time, when something reportable happens.
I've tested it thru a router, and it works. I assume, routers respects 4..10 lag, it's around the speed of a slow webapge (far, far away), which no router think, that it should be canceled.
My PHP work is a collaborative spreadsheet, it looks amazing when you hit enter and the stuff is updating simultaneously in several browsers. Have fun!
No limit for no of ajax requests. However it will be on same host & port.
Server can limit no of request from a machine based on its setting.
For example. A server can set so that if there are more than few request from same machine within specified time it will reject request.
After small mistake in javascript code, neverending loop was made witch each step calling 2 ajax requests. In firebug i could see more and more requests until firefox started to slow down, dont response and finally crash.
So, yes, there is a "limit" ;)
I've got an odd problem here with Prototype 1.7.0 and an AJAX form submission using form.request().
The response status is either 202 or 200 depending on whether the server expects to be polled again with the same form submission after a timeout. 200 indicates that the response contents are done and are to be displayed to the user (backend uses WebWork's execAndWait-interceptor to execute a long-running job).
The problem is that most of the time, everything works just fine. However, occasionally, the response comes back as status code 0 and XMLHTTPRequest readyState 1. Firebug indicates correct response codes are coming from the backend, and that the actual response contents are fine, it's just that Prototype's on200 and on202 handlers do not fire (on0 does).
It appears there are similar issues reported over the Internet, but there is no conclusive solution. Is this some well known problem?
A response code 0 from prototype means that it can't communicate with the server. You can remedy this by adding an "on0: function() {}" event handler in your request.
How you handle it is up to you...either alert the user that something went wrong, and redisplay their form, or silently try and re-submit your request to the backend in a loop. If you choose the second option, set a wait timeout and each time you can't talk to the server multiply it by some factor so you don't infinite loop their browser.
You might also want to look into queuing these requests on the client-side so you're only firing one at a time, in order.
Hope that helps.
What is the best way deal with out-of-sequence Ajax requests (preferably using a jQuery)?
For example, an Ajax request is sent from the user's browser anytime a field changes. A user may change dog_name to "Fluffy", but a moment later, she changes it to "Spot". The first request is delayed for whatever reason, so it arrives at the server after the second, and her dog ends up being called "Fluffy" instead of "Spot".
I could pass along a client-side timestamp along with each request, and have the server track it as part of each Dog record and disregard earlier requests to change the same field (but only if there is a difference of less than 5 minutes, in case the user changes the time on her machine).
Is this approach sufficiently robust, or is there a better, more standardized approach?
EDIT:
Matt made a great point in his comment. It's much better to serialize requests to change the same field, so is there a standard way of implementing Ajax request queues?
EDIT #2
In response to #cherouvim's comment, I don't think I'd have to lock the form. The field changes to reflect the user's change, a change request is placed into the queue. If a request to change the same field is waiting in the queue, delete that old request. 2 things I still would have to address:
Placing a request into the queue is an asynchronous task. I could have the callback handler from the previous Ajax request send the next request in the queue. Javascript code isn't multi-threaded (or... is it?)
If a request fails, I would need the user interface to reflect the state of the last successful request. So, if the user changes the dog's name to "Spot" and the Ajax request fails, the field would have to be set back to "Fluffy" (the last value successfully committed).
What issues am I missing?
First of all you need to serialize server side processing for each client. If you are programming in Java then synchronizing execution on the http session object is sufficient. Serializing will help in case the second update comes while the first is being processed.
A second enhancement you can implement in your entity updating is http://en.wikipedia.org/wiki/Optimistic_concurrency_control. You add a version property (and column) for your entity. Each time an update happens this is incremented once. In fact the update statement looks like:
update ... set version=6 ... where id=? and version=5;
If affected rows from above pseudoquery query are 0 then someone else has managed to update the entity first. What you do then is up to you. Note that you need to be rendering the version on the html update form of the entity as a hidden parameter and sending it back to the server each time you update. On return you have to write back the updated version.
Generally the first enhancement would be enough. The second one will improve the system in case many people are editing the same entities at the same time. It solves the "lost update" problem.
I would implement a queue on the client side with chaining of successful requests or rollbacks on unsuccessful requests.
You need to define "unsuccessful", be it a timeout or a returned value.
I am trying to check pop and smtp values entered by user.. I wish to validate that pop and smtp say for example(pop.gmail.com,smtp.gmail.com) which is entered by user is correct or wrong.
For that I am sending only one request to server by taking both pop and smtp values entered by user which will do two tasks
1. Checks user entered pop by making connection to that particular server ,
2. Checks user entered smtp by sending 1 mail to some dummy mail id..
I finished all these tasks..
But now what my requirement is, I have to show the user after validating each thing.. I mean in ui i have to show as
POP connection Checked.. ok
smtp connection Checked.. ok like that.
But i sent only one request to server for doing both these tasks..So i need to get intermediate status from server after finishing each tasks..So only i can update in client side UI.. But i don't know is it possible to get intermediate responses from server for a single request... Any idea friends? If so can you come up with a little bit of code...
Expecting the suggestions?
you should take a look in the long polling technique, it is possible to retrieve partial response but it doesn't work on all browsers.
You can use HEAD request instead of GET or POST which only return HTTP header
Slightly off topic - but sending a dummy mail can be "dangerous".
Many servers "note" if you try and send to a local address, which does not exist. For example - if the server's domain is "whatever.com" and you send to a random address, say aaa#whatever.com, and "aaa" is not a valid user, then the server notices this.
The server may then take an action like blocking you, as a sender, for a period of time. (This helps to reduce spam from dictionary attacks.) So your "test" ends up effectively blocking the real mail from being delivered.
The reverse is also true. Let's say you try to send to an external address, which you know is valid (your own email address for example) as the test. In this case the from address must be a valid internal address. If you use an invalid internal address, or worse an address which is not internal, it's likely the server will refuse to deliver the mail (at best) and at worst again institute a temporary block.
The key factor in both these situations is that although the SMTP protocol is very "loose", SMTP servers watch very closely for "bad behavior" because this is one way of distinguishing a spamming program. So any hide of "incorrect" behavior can lead to it arbitrarily refusing to accept your mails (usually for a limited period of time.)
Incidentally, back to your original question.
Both of your tests are pretty much instantaneous. Even if the email server is on the other side of the world you can do both checks inside a couple seconds. So chances are even if you send back 2 packets, to the user they'll appear as "arriving together". And since 1 request from the browser can only handle 1 response from the server you would need to send the response in 2 packets.
ie do first test - send first part of response - do second test - send second part of response.
For a normal HTTP packet this is no big deal. Do some sort of flush / send after the first response is ready, and then again after the second response. The browser is used to displaying partial pages as they arrive.
However for an AJAX request you'll need to get into your framework at quite a low level. Most frameworks, that I'm aware of, require the incoming Async packet to be "complete" before they start to parse it. This is especially true if the packet is formatted as say xml where partial parsing is useless in pretty much all cases.