Can HTTP headers be too big for browsers? - ajax

I am building an AJAX application that uses both HTTP Content and HTTP Header to send and receive data. Is there a point where the data received from the HTTP Header won't be read by the browser because it is too big ? If yes, what is the limit and is it the same behaviour in all the browser ?
I know that theoretically there is no limit to the size of HTTP headers, but in practice what is the point that past that, I could have problem under certain platform, browsers or with certain software installed on the client computer or machine. I am more looking into a guide-line for safe practice of using HTTP headers. In other word, up to what extend can HTTP headers be used for transmitting additional data without having potential problem coming into the line ?
Thanks, for all the input about this question, it was very appreciated and interesting. Thomas answer got the bounty, but Jon Hanna's answer brought up a very good point about the proxy.

Short answers:
Same behaviour: No
Lowest limit found in popular browsers:
10KB per header
256 KB for all headers in one response.
Test results from MacBook running Mac OS X 10.6.4:
Biggest response successfully loaded, all data in one header:
Opera 10: 150MB
Safari 5: 20MB
IE 6 via Wine: 10MB
Chrome 5: 250KB
Firefox 3.6: 10KB
Note
Those outrageous big headers in Opera, Safari and IE took minutes to load.
Note to Chrome:
Actual limit seems to be 256KB for the whole HTTP header.
Error message appears: "Error 325 (net::ERR_RESPONSE_HEADERS_TOO_BIG): Unknown error."
Note to Firefox:
When sending data through multiple headers 100MB worked fine, just split up over 10'000 headers.
My Conclusion:
If you want to support all popular browsers 10KB per header seems to be the limit and 256KB for all headers together.
My PHP Code used to generate those responses:
<?php
ini_set('memory_limit', '1024M');
set_time_limit(90);
$header = "";
$bytes = 256000;
for($i=0;$i<$bytes;$i++) {
$header .= "1";
}
header("MyData: ".$header);
/* Firfox multiple headers
for($i=1;$i<1000;$i++) {
header("MyData".$i.": ".$header);
}*/
echo "Length of header: ".($bytes / 1024).' kilobytes';
?>

In practice, while there are rules prohibitting proxies from not passing certain headers (indeed, quite clear rules on which can be modified and even on how to inform a proxy on whether it can modify a new header added by a later standard), this only applies to "transparent" proxies, and not all proxies are transparent. In particular, some wipe headers they don't understand as a deliberate security practice.
Also, in practice some do misbehave (though things are much better than they were).
So, beyond the obvious core headers, the amount of header information you can depend on being passed from server to client is zero.
This is just one of the reasons why you should never depend on headers being used well (e.g., be prepared for the client to repeat a request for something it should have cached, or for the server to send the whole entity when you request a range), barring the obvious case of authentication headers (under the fail-to-secure principle).

Two things.
First of all, why not just run a test that gives the browser progressively larger and larger headers and wait till it hits a number that doesn't work? Just run it once in each browser. That's the most surefire way to figure this out. Even if it's not entirely comprehensive, you at least have some practical numbers to go off of, and those numbers will likely cover a huge majority of your users.
Second, I agree with everyone saying that this is a bad idea. It should not be hard to find a different solution if you are really that concerned about hitting the limit. Even if you do test on every browser, there are still firewalls, etc to worry about, and there is absolutely no way you will be able to test every combination (and I'm almost positive that no one else has done this before you). You will not be able to get a hard limit for every case.
Though in theory, this should all work out fine, there might later be that one edge case that bites you in the butt if you decide to do this.
TL;DR: This is a bad idea. Save yourself the trouble and find a real solution instead of a workaround.
Edit: Since you mention that the requests can come from several types of sources, why not just specify the source in the request header and have the data contained entirely in the body? Have some kind of Source or ClientType field in the header that specifies where the request is coming from. If it's coming from a browser, include the HTML in the body; if it's coming from a PHP application, put some PHP-specific stuff in there; etc etc. If the field is empty, don't add any extra data at all.

The RFC for HTTP/1.1 clearly does not limit the length of the headers or the body.
According to this page modern browsers (Firefox, Safari, Opera), with the exception of IE can handle very long URIs: https://web.archive.org/web/20191019132547/https://boutell.com/newfaq/misc/urllength.html. I know it is different from receiving headers, but at least shows that they can create and send huge HTTP requests (possibly unlimited length).
If there's any limit in the browsers it would be something like the size of the available memory or limit of a variable type, etc.

Theoretically, there's no limit to the amount of data that can be sent in the browser. It's almost like saying there's a limit to the amount of content that can be in the body of a web page.
If possible, try to transmit the data through the body of the document. To be on the safe side, consider splitting the data up, so that there are multiple passes for loading.

Related

How does varnish deal with dynamic content?

I am studying up on caching and I am looking into varnish for caching. I am wondering though how does varnish deal with dynamically generated content?
All over the place people are saying you shouldn't really cache content that might change a lot but on the other hand when I look at the response headers for stackoverflow I see pages being served up via varnish.
Content here changes by the second so how does this even work? Excuse me if it's a bit of a simple question, I will research some more while this question is up.
You need to define dynamic :
if the content depends on the user (through Cookies for example), it should not be cached as you'll have lots of different contents and your HIT/MISS ration will not be high since every user has a different content.
if the content changes in time, you can always cache the content a little, for example a few seconds.
if the content changes in time, a better option is to separate the "static content" from the dynamic one. You may cache the page template and do ajax calls to refresh the content. You may also use esi, it's an old technology but it lets you specify differents "zones" in your pages, each having its cache duration.
you can benefit from IMS requests. Telling the backend to send the response body only if it has changed since the last request can save you lots of processing time. I think varnish does this from version 4
As for the stackoverflow architecture, you may learn a lot reading Nick Craver's blog post about it : http://nickcraver.com/blog/2016/02/17/stack-overflow-the-architecture-2016-edition/

What is the best way to generate a ETag based on the timestamp of the resource

So in one of my projects i have to create a http cache to handle multiple API calls to the server. I read about this ETag header that can be used with a conditional GET to minimize server load and enact caching.. However i have a problem with generating the E-Tag.. I can use the LAST_UPDATED_TIMESTAMP of the resource as the ETag or hash it using some sort of hashing algo like MD5. but what would be the best way to do this? Is there any cons in using raw timestamp as the Etag??
any supportive answer is highly appreciated .. Thanks in advance....Cheers!!
If your timestamp has enough precision so that you can guarantee it will change any time the resource changes, then you can use an encoding of the timestamp (the header value needs to be ascii).
But bear in mind that ETag may not save you much. It's just a cache revalidation header, so you will still get as many requests from clients, just some will be conditional, and you may then be able to avoid sending payload back if the ETag didn't change, but you will still incur some work figuring that out (maybe a bunch less work, so could be worth it).
In fact several versions of IIS used the file timestamp to generate an Etag. We tripped over that when building WinGate's cache module, when a whole bunch of files with the same timestmap ended up with the same Etag, and we learned that an Etag is only valid in the context of the request URI.

Should the length of a URL string be limited to increase security?

I am using ColdFusion 8 and jQuery 1.7.2.
I am using CFAJAXPROXY to pass data to a CFC. Doing so creates a JSON array (argument collection) and passes it through the URL. The string can be very long, since quite a bit of data is being passed.
The site that I am working has existing code that limits the length of any URL query string to 250 characters. This is done in the application.cfm file by testing the length of the query string. If any query string is great than 250 characters, the request is aborted. The purpose of this was to ensure that hackers or other malicious code wouldn't be passed through the URL string.
Now that we are using the query string to pass JSON arrays in the URL, we discovered that the Ajax request was being aborted quite frequently.
We have many other security practices in place, such as stripping any "<>" tags from code and using CFQUERYPARAM.
My question is whether limiting the length of a URL string for the sake of security a good idea or is simply ineffective?
There is absolutely no correlation between URI length and security rather more a question of:
Limiting the amount of information that you provide to a user agent to a 'Need to know basis'. This covers things such as the type of application server you run and associated conventions, the web server you run and associated conventions and the operating system on the host machine. These are essentially things that can be considered vulnerabilities.
Reducing the impact of exploiting those vulnerabilities i.e introducing patches, ensuring correct configuration etc.
As alluded to above, at the web tier, this doesn't only cover GET's (your concern), but also POST's, PUT's, DELETE's on just about any other operation on a HTTP resource.
Moved this into an answer for Evik -
That seems (at best) completely unnecessary if the inputs are being properly sanitized. I'm sure someone clever can quickly defeat a "security by small doorway" defense, assuming that's the only defense.
OWASP has some good, sane guidelines for web security. As far as I've read, limiting the size of the url is not on the list. For more information, see: https://www.owasp.org/index.php/Top_10_2010-Main
I would also like to echo Hereblur's comment that this makes internationalization tricky, or maybe impossible.
I'm not a ColdFusion developer. But I think it's the same with other language.
I think It's help just a little bit. The problem of malicious code or sql injection should be handle by your application.
I agree that limited length of query string value is safer and add more difficult to hackers. But you cant do this with POST data. and It's limit some functionality. For example,
For one utf-8 character, It may take 9 characters after encoded. that's mean you can put only 27 non-english characters.
The only reason to limit has to do with performance and DOS attack - not security per se (though DOS is a security threat by bringing down your server). Web servers and App servers (including CF) allow you to limit the size of POST data so that your server won't be degraded by very large file uploads. URL data if substantial can result in long running requests as the server struggles to parse or handle or write or whatever.
So there is some modest risk here related to such things. Back in the NT days IIS 3 had a number of flaws that were "locked down" by limiting the length of the URL - but those days are long gone. There are far more exploits representing low hanging fruit that I would look at first before examining this issue too closely - unless of course you feel like you are hanging a specific problem with folks probing you (with long URLs I mean :).

When sending a file via AJAX, does it get read into memory first?

I'm writing an uploader that has to be able to transmit files of any size (up to 30gigs) to the server.
My original intention was to write a java applet that would break the file up into pieces, send those to the server, and then reassemble them there.
However, someone has suggested that AJAX's XMLHttpRequest can do the job in conjunction with nsIFileInputStream
(example here: https://developer.mozilla.org/en/using_xmlhttprequest#Sending_files_using_a_FormData_object )
and by using PUT instead of POST.
I'm worried about 2 things and can't seem to find the answer.
1) Will AJAX attempt to read the file into memory before sending it (that obviously would break the whole thing)
[EDIT]
This http://www.codeproject.com/KB/ajax/AJAXFileUpload.aspx?msg=2329446 example explicitly states that they're using ActiveXObject because that DOESN'T load the file into memory... which suggests to me that XMLHttpRequest would load it into memory. I'm surprised I'm having such a hard time finding this info, to be honest.
2) How reliable is this approach. I realize that if the connection just dies the upload would have to resume from scratch, but realistically, how likely is it that using a standard cable connection with an upload throttle of about .5MB/s that a 30 gig file would arrive at the server?
I'm trying something similar using File Api and blob.slice, but it turned out to clock up memory on large files.. However, you could use Google Gears, which plays much better with large sliced files. It also doesnt cause errors with the slice order, which FileReader combined with XHR does frequently and randomly.
I do however find (generally) that uploading files via JavaScript is very unstable..

Getting ETags right

I’ve been reading a book and I have a particular question about the ETag chapter. The author says that ETags might harm performance and that you must tune them finely or disable them completely.
I already know what ETags are and understand the risks, but is it that hard to get ETags right?
I’ve just made an application that sends an ETag whose value is the MD5 hash of the response body. This is a simple solution, easy to achieve in many languages.
Is using MD5 hash of the response body as ETag wrong? If so, why?
Why the author (who obviously outsmarts me by many orders of magnitude) does not propose such a simple solution?
This last question is hard to answer unless you are the author :), so I’m trying to find the weak points of using an MD5 hash as an ETag.
ETag is similar to the Last-Modified header. It's a mechanism to determine change by the client.
An ETag needs to be a unique value representing the state and specific format of a resource (a resource could have multiple formats that each need their own ETag). Not unique across the entire domain of resources, simply within the resource.
Now, technically, an ETag has "infinite" resolution compared to a Last-Modified header. Last-Modified only changes at a granularity of 1 second, whereas an ETag can be sub second.
You can implement both ETag and Last-Modified, or simply one or the other (or none, of course). If you Last-Modified is not sufficient, then consider an ETag.
Mind, I would not set ETag for "every" resource. Basically, I wouldn't set it for anything that has no expectation of being cached (dynamic content notably). There's no point in that case, just wasted work.
Edit: I see your edit, and clarify.
MD5 is fine. The only downside is calculating MD5 all the time. Running MD5 on, say, a 200K PDF file, is expensive. Running MD5 on a resource that has no expectation of being cached is simply wasteful (i.e. dynamic content).
The trick is simply that whatever mechanism you use, it should be as cheap as Last-Modified typically is. Last-Modified is, again, typically, a property of the resource, and usually very cheap to access.
ETags should be similarly cheap. If you are using MD5, and you can cache/store the association between the resource and the MD5 hash, then that's a fine solution. However, recalculating the MD5 each time the ETag is necessary, is basically counter to the idea of using ETags to improve overall server performance.
We're using etags for our dynamic content in instela.
Our strategy is at the end of output generating the md5 hash of the content to send and if the if-none-match header exists, we compare the header with the generated hash. If the two values are the same we send 304 code and interrumpt the request without returning any content.
It's true that we consume a bit cpu to hash the content but finally we're saving much bandwidth.
We have a facebook newsfeed style main page which has different content for every user. As the newsfeed content changes only 3-4 time per hour, the main page refreshes are so efficient for the client side. In the mobile era I think it's better to spend a bit more cpu time than spending bandwidth. Bandwidth is still more expensive than the CPU, and it's a better experience for the client.
Having not read the book, I can't speak on the author's precise concerns.
However, the generation of ETags should be such that an ETag is only generated once when a page has changed. Generating an MD5 hash of a web page costs processing power and time on the server; if you have many clients connecting, it could start to cause performance problems.
Thus, you need a good technique for generating ETags only when necessary and caching them on the server until the related page changes.
I think the perceived problem with ETAGS is probably that your browser has to issue and parse a (simple and small) request / response for every resource on your page to check if the etag value has changed server side.
I personally find these extra small roundtrips to the server acceptable for often changing images, css, javascript (the server does not need to resend the content if the browser's etag is current) since the mechanism makes it quite easy to mark 'updated' content.

Resources