Related
I'm making a search agreggator and I've been wondering how could I improve the performance of the search.
Given that I'm getting results from different websites, currently I need to wait to receive the results for each provider but this is done one after another so the whole request takes a while to respond.
The easiest solution would be to just make a request from the client for each provider, but this would end up with a ton of request per search, (but if this is the proper way I'll just do it.)
Why I've been wondering is if there's way to return results everytime a provider responds, so if we have providers A, B and C and B already returned results then send it back to the client. In order for this to work all the searchs would need to run in parallel of course.
Do you know a way of doing this?
I'm trying to build a search experience similar to SkyScanner, that loads results but then you can see it still keeps getting more records and it sorts them on the fly (on client side as far as I can see).
Caching is the key here. Best practices for external API (or scraping) is to be as little of a 'taker' as possible. So in your Laravel setup, get your results, but cache the results for as long as makes sense for your app. Although the odds in a skyscanner situation is low that two users will make the exact same request, the odds are much higher that a user will make the same request multiple times, or may share the link, etc.
https://laravel.com/docs/8.x/cache
cache(['key' => 'value'], now()->addMinutes(10));
$value = cache('key');
To actually scrape the content, you could use this:
https://github.com/softonic/laravel-intelligent-scraper
Or to use an API which is the nicer route:
https://docs.guzzlephp.org/en/stable/
On the client side, you could just make a few calls to your own service in separate requests and that would give you your asynchronous feel you're looking for.
I just read that some browsers would prevent HTTP polling (I guess by limiting the rate of requests)...
From https://github.com/sstrigler/JSJaC:
Note: As security restrictions of most modern browsers prevent HTTP
Polling from being usable anymore this module is disabled by default
now. If you want to compile it in use 'make polling'.
This could explain some misbehavior of some of my JavaScripts (sometimes requests are just not sent or retried, even if they were actually successful). But I couldn't find further information on details..
Questions
if it's "max. number of requests n per x seconds", what are the usual/default settings for x and n?
Is there any way good resource for this?
Any way to detect if a request has been "delayed" or "rejected" because of a rate limit?
Thanks for your help...
Stefan
Yes, as far as I am aware there is a default pool limit of 10 and a default request timeout of 30 seconds per request, however the timeout and poll limits can be controlled and different browsers implement different limitations!
Check out this Google implementation.
and this is an awesome implementation of catching a timeout error!
You can find the Firefox specifics HERE!
Internet Explorer specifics are controlled from inside the Windows registry.
Also have a look at this question.
Basically, the way you control is not by changing the browser limitations, but by abiding them. So you apply a technique called throttle-ing.
Think of it as creating a FIFO/priority queue of functions. A queue struct that takes xhr requests as members and enforces delay between them is an Xhr Poll. For instance, I am using
Jsonp to get data from a node.js server located on another domain and I am polling of course due to browser limitations. Otherwise, I get zero response back from the server and that is only because of browser limitations.
I am actually doing a console log for every request that's supposed to be sent, but not all of them are being logged. So the browser limits them.
I'll be even more specific with helping you out. I have a page on my website which is supposed to render a view for tens or even hundreds of articles. You go through them using a cool horizontal slider.
The current value of the slider matches the currrent 'page'. Since I am only displaying 5 articles per page and I can't exactly load thousands of articles 'onload' without severe performance implications, I load the articles for the current page. I get them from a MongoDB by sending a cross-domain request to a Python script.
The script is supposed to return an array of five objects with all the details I need to build the DOM elements for a 'page'. However, there are a couple of issues.
First, the slider works extremely fast, as it's more or less a value change. Even if there is drag drop functionality, key down events etc, the actual change takes miliseconds. However, the code of the slider looks something like this:
goog.events.listen(slider, goog.events.EventType.CHANGE, function() {
myProject.Articles.page(slider.getValue());
}
The slider.getValue() method returns an int with the current page number, so basically I have to load from:
currentPage * articlesPerPage to (currentPage * articlesPerPage + 1) - 1
But in order to load, i do something like this:
I have a storage engine(think of it as an array):
I check if the content is not already there
If it is, there is no point to make another request, so go forward with getting the DOM elements from the array with the already created DOM elements in place.
If it isn't, then I need to get it so I need to send that request I was mentioning, which would look something like(without accounting for browser limitations):
JSONP.send({'action':'getMeSomeArticles','start':start,'length': itemsPerPage, function(callback){
// now I just parse the callback quickly to make sure it is consistent
// create DOM elements, and populate the client side storage
// and update the view for the user.
}}
The problem comes from the speed with which you can change that slider. Since every change supposedly triggers a request(same would happen for normal Xhr requests), then you are basically crossing the limitations of all browsers, so without throttle-ing, there would be no 'callback' for most of the requests. 'callback' is the JS code returned by the JSONP request(which is more of a remote script inclusion than anything else).
So what I do is push a request to a priority queue, not POLL, as now I don't need to send multiple simultaneous requests. If the queue is empty, the recently added member is executed and everyone is happy. If it's not, then all non-completed requests in progress are cancelled and only the last one is executed.
Now in my particular case, I do a binary search(0(log n)) to see if the storage engine doesn't have data for the previous requests yet, which tells me if the previous request has been completed or not. If it has, then it's removed from the queue and the current one is processed, otherwise the new one fires. So an and so forth.
Again, for speed consideration and shit browser wanna-bes such as Internet Explorer, I do the above described procedure about 3-4 steps ahead. So I pre-load 20 pages ahead till everything is the client side storage engine. This way, every limitation is successfully dealt with.
The cooldown time is covered by the minimum time it would take to slide through 20 pages and the throttle-ing makes sure there are no more than 1 active requests at any given time(with backwards compatibility going as far as Internet Explorer 5).
The reason why I wrote all this is to give you an example trying to say that you cannot always enforce delay directly from the FIFO structure, as your calls may need to turn into what a user sees, and you don't exactly want to make a user wait 10-15 seconds for a single page to render.
Also, always minimize the polling and the need to poll(simultaneously fired Ajax events, as not all browsers actually do good things with them). For instance, instead of doing something like sending one request to get content and sending another for that content to be tracked as viewed in your app metrics, do as many tasks at server level as you possibly can!
Of course, you probably want to track your errors properly, so your Xhr object from your library of choice implement error handling for ajax and because you are an awesome developer you want to make use of them.
so say you have a try - catch block in place
The scenario is this:
An Ajax call has finished and it's supposed to return a JSON, but the call somehow failed. However, you try to parse the JSON and do whatever you need to do with it.
so
function onAjaxSuccess (ajaxResponse) {
try {
var yourObj = JSON.parse(ajaxRespose);
} catch (err) {
// Now I've actually seen this on a number of occasions, to log that an error occur
// a lot of developers will attempt to send yet another ajax request to log the
// failure of the previous one.
// for these reasons, workers exist.
myProject.worker.message('preferrably a pre-determined error code should go here');
// Then only the worker should again throttle and poll the ajax requests that log the
//specific error.
};
};
While I have seen various implementations that try to fire as many Xhr requests at the same time as they possible can until they encounter browser limitations, then do quite a good job at stalling the ones that haven't fired in wait for the browser 'cooldown', what I can advise you is to think about the following:
How important is speed for your app?
Just how scalable and how intensive the I/O will be?
If the answer to the first one is 'very' and to the latter 'OMFG modern technology', then try to optimize your code and architecture as much as you can so that you never need to send 10 simultaneous Xhr requests. Also, for large scale apps, multi-thread your processes. The JavaScript way to accomplish that is by using workers. Or you could call the ECMA board, tell them to make this a default, and then post it here so that the rest of us JS devs can enjoy native multi-threading in JS:)(how dafuq did they not think about this?!?!)
Stefan, quick answers below:
-if it's "max. number of requests n per x seconds", what are the usual/default settings for x and n?
This sounds more like a server restriction. The browser ones usually sound like:
-"the maximum requests for the same hostname is x"
-"the maximum connections for ANY hostname is y"
-Is there any way good resource for this?
http://www.browserscope.org/?category=network (also hover over table headers to see what is measured)
http://www.stevesouders.com/blog/2008/03/20/roundup-on-parallel-connections
-Any way to detect if a request has been "delayed" or "rejected" because of a rate limit?
You could look at the http headers for "Connection: close" to detect server restrictions but I am not aware of being able in JavaScript to read settings from so many browsers in a consistent, browser-independent way. (For Firefox, you could read this http://support.mozilla.org/en-US/questions/746848)
Hope this quick answer helps?
No, browser does not in any way affect polling. I think what was meant on that page is the same origin policy - you can only access the same host and port as your original page.
Only known limitation to connections themselves is that you usually can only have from two to four simultaneous connections to the same host.
I've written some apps with long poll, some with C++ backend with my own webserver, and one with PHP backend with Apache2.
My long poll timeout is 4..10 s. When something occurs, or 4..10 s passes, my server returns an empty response. Then the client immediatelly starts another AJAX request. I found that some browsers hangs up when I start AJAX call from previous AJAX handler, so I am using setTimeout() with a small value to start the next AJAX request.
When something happens on the client side, which should be sent to server, I use another AJAX request for it, but it's a one-way thing: the server does not send any response, and the client does not process anything. The result of the operation (if any) will be received on the long poll. It requires max. 2 connection to the server, which all browsers supports.
Keep in mind, that if there's 500 client, it means 500 server-side webserver thread, which will move together, occurring load peaks, because when something happens, the server have to report it at the same time for each clients, the clients will process it near same time long, they will start the next long request in the same time, and from then, the timeout will expire also at the same time, and furthcoming ones too. You can trick with rnd timeout, say 4 rnd(0..4), but it's worthless, if anything happens, they will "sync" again, all the request have to be served at the same time, when something reportable happens.
I've tested it thru a router, and it works. I assume, routers respects 4..10 lag, it's around the speed of a slow webapge (far, far away), which no router think, that it should be canceled.
My PHP work is a collaborative spreadsheet, it looks amazing when you hit enter and the stuff is updating simultaneously in several browsers. Have fun!
No limit for no of ajax requests. However it will be on same host & port.
Server can limit no of request from a machine based on its setting.
For example. A server can set so that if there are more than few request from same machine within specified time it will reject request.
After small mistake in javascript code, neverending loop was made witch each step calling 2 ajax requests. In firebug i could see more and more requests until firefox started to slow down, dont response and finally crash.
So, yes, there is a "limit" ;)
What are the strengths of GET over POST and vice versa when creating an ajax request? How do I know which I should use at any given time? Is it a security-minded decision?
Also, what is the difference in how they are actually sent?
GETs should be used for idempotent operations, that is operations that can be safely repeated more than once without changing anything. Browsers will cache GET requests (for normal and AJAX requests)
POSTs should be generally be used for non-idenpotent operations, like saving something. Although you can use them for other operations if you want.
Data for GETs is sent over the URL query string. Data for POSTs is sent separately. Some browsers have a maximum URL length (I think Internet Explorer is 2048 characters), and if the query string becomes too long you'll get an error.
You should use GET and POST requests in AJAX calls just as you would use GET and POST requests in normal calls. Basic rule of thumb:
Will the request modify anything in your Model?
YES: The request will modify (add/update/delete) data from your data store,
or in some other way change the state of the server (cause creation of
a file, for example). Use POST.
NO: The request will not affect the state of anything (database, file system,
sessions, ...) on the server, but merely retrieve information. Use GET.
POST requests are requests that you do not want to accidentally happen. GET requests are requests you are OK with happening by a user pointing a browser to via a URL.
GET requests can be repeated quite simply since their data is based in the URL itself.
You should think about AJAX requests like you think about regular form requests (and their GET and POST)
The Yahoo! Mail team found that when using XMLHttpRequest, POST is implemented in the browsers as a two-step process: sending the headers first, then sending data. So it's best to use GET, which only takes one TCP packet to send (unless you have a lot of cookies). The maximum URL length in IE is 2K, so if you send more than 2K data you might not be able to use GET.
http://developer.yahoo.com/performance/rules.html#ajax_get
Several of my ajax applications in the past have used GET request but now I'm starting to use POST request instead. POST requests seem to be slightly more secure and definitely more url friendly/pretty. Thus, i'm wondering if there is any reason why I should use GET request at all.
I generally set up the question as thus: Does anything important change after the request? (Logging and the like notwithstanding). If it does, it should be a POST request, if it doesn't, it should be a GET request.
I'm glad that you call POST requests "slightly" more secure, because that's pretty much what they are; it's trivial to fake a POST request by a user to a page. Making it a POST request, however, prevents web accelerators or reloads from re-triggering the action accidentally.
As AJAX, there is one more consideration: if you are returning JSON with callback support, be very careful not to put any sensitive data that you don't want other websites to be able to see in there. Wikipedia had a vulnerability along these lines where the user anti-CSRF token was revealed via their JSON API.
All good points, however, in answer to the question, GET requests are more useful in certain scenarios over POST requests:
They can be bookmarked
They can be cached
They're faster
They have known consequences (assuming they don't change data), so visiting them multiple
times is not a problem.
For the sake of posterity, updating this comment with the blog notes re: point #3 here, all credit to Omar AL Zabir (the author of the referenced blog post):
"Atlas by default makes HTTP POST for all AJAX calls. Http POST is
more expensive than Http GET. It transmits more bytes over the wire,
thus taking precious network time and it also makes ASP.NET do extra
processing on the server end. So, you should use Http Get as much as
possible. However, Http Get does not allow you to pass objects as
parameters. You can pass numeric, string and date only. When you make
a Http Get call, Atlas builds an encoded url and makes a hit to that
url. So, you must not pass too much content which makes the url become
larger than 2048 chars. As far as I know, that’s what is the max
length of any url.
Another evil thing about http post is, it’s actually 2 calls. First
browser sends the http post headers and server replies with “HTTP 100
Continue”. When browser receives this, it sends the actual body."
You should use GET where you're doing a request which has no side effects, e.g. just fetching some info. This request can:
Be repeated without any problem - if the browser detects an error it can silently retry
Have its result cached by the browser
Be cached by a proxy
These things are all good. Anything which is only retrieving data (particularly public data) should really be a GET. The server should send sensible Last-Modified: and Expires: headers to allow caching if required.
There is one other difference not mentioned by anyone.
GET requests are passed in the URL string and are therefore subject to a length limit usually dependent on the browser. It seems that most are around 2000 chars.
POST requests can be much much larger - in fact not limited really. So if you're needing to request data from a web server and you're passing in lots of parameter information then a POST request might be the only option.
So, as mentioned before really a GET request is for requesting data (no side effects) while a POST request is generally used for transmitting data back to the server to be stored (with side effects). e.g. Use POST to upload a file. GET to retrieve a file.
There was a time when IE I believe had a very short GET URL string. Some applications like Lotus notes use large numbers of random characters to represent document id's. I had the displeasure of using another product that generated random strings so the page URL was unique each time. The random string was HUGE... and it didn't always work with IE6 from memory.
This might help you to decide where to use GET and where to use POST:
URIs, Addressability, and the use of HTTP GET and POST.
POST requests are just as insecure as GETs. The main difference is that POST is used to modify the state of the server application, while GET only requests data from it.
The difference matters when you use clean, "restful" URLs, where the URL itself specifies the resource, and the different methods trigger different actions on the server side.
Perhaps most importantly, GET is book-markable / viewable in url history, and searchable with Google.
POST is important where you don't want the event to be bookmarkable or able to be typed in as a URL - otherwise you (or Google crawling your URLS) could end up accidentally doing things like deleting users from your system, for example.
GET
POST
In GET method, values are visible in the URL
In POST method, values are not visible in the URL.
GET has a limitation on the length of the values, generally 255 characters.
POST has no limitation on the length of the values since they are submitted via the body of HTTP.
GET performs are better compared to POST because of the simple nature of appending the values in the URL.
It has lower performance as compared to GET method because of time spent in including POST values in the HTTP body
This method supports only string data types.
This method supports different data types, such as string, numeric, binary, etc.
GET results can be bookmarked.
POST results cannot be bookmarked.
GET request is often cacheable.
The POST request is hardly cacheable.
GET Parameters remain in web browser history.
Parameters are not saved in web browser history.
Source and more in depth analysis: https://www.guru99.com/difference-get-post-http.html
I have a page with 3 layers, one for navigation, one for database records and one for results. When I click on a database record, the results are displayed in the result layer via ajax. For navigation, the links will simply be different queries. I am wondering if it would make sense to have each different query be sent as ajax data and palced into the records layer, or rather to have the query appended to the php file each time. Which is the more efficient approach?
Well, sending a different AJAX request will be recommended as per my point of view. As
Performance wise, it will rather reduce the response times, as only the POST data is sent and databytes recieved. The page can then format it, one it receives an XMLHttpResponse
Security wise : I prefer using POST than GET as it gives at least some opaqueness as to what is being passed as a parameter and not anyone can just edit the url and play around. Plus, you don't have the URL length restriction while passing parameters in POST.
So, i'd say fire an XMLHTTPRequest
each on each link and display the
response in the Results layer
(pane/div) on the page.
I think you question is quite unspecific and confusing.
What is "appended to the php file"?
Are you really concerned about efficiency? I mean, how fast should the results be displayed? Or are you concerned about the server workload?
Have you read this tutorial? Prototype introduction to Ajax
I think it should answer most of your questions and give enough example code to continue.