Why does the XMLHttpRequest spec prevent setting the Accept-Encoding header? - ajax

Today, I wanted to utilize the Accept-Encoding header to request an image as base64. Come to find out, the XMLHttpRequest spec prevents setting that header!
http://www.w3.org/TR/XMLHttpRequest/#the-setrequestheader()-method
Note: The above headers are controlled by the user agent to let it control those aspects of transport. This guarantees data integrity to some extent. Header names starting with Sec- are not allowed to be set to allow new headers to be minted that are guaranteed not to come from XMLHttpRequest.
Why in the world would they write a spec like this? It'd make more sense if the browser just provided a default value (eg. gzip,deflate,sdch) if none was specified.

The browser is responsible for accepting and processing the response. It wouldn't make much sense to manipulate your XHR to say it accepts gzip, for example, when you couldn't do anything with it. Can you just set a custom header value?

Why in the world would they write a spec like this?
In one word: laziness.
Why add in extra semantics describing secure predictable browser behavior for every possible header that would only be used by a few people like us, when they can just declare all the headers prohibited in one paragraph.

Related

RefreshHit from cloudfront even with cache-control: max-age=0, no-store

Cloudfront is getting a RefreshHit for a request that is not supposed to be cached at all.
It shouldn't be cached because:
It has cache-control: max-age=0, no-store;
The Minimum TTL is 0; and
I've created multiple invalidations (on /*) so this cached resource isn't from some historical deploy
Any idea why I'm getting RefreshHits?
I also tried modifying Cache-Control to be cache-control no-store, stale-if-error=0, creating a new invalidation on /* and now I'm seeing a cache hit (this time in Firefox):
After talking extensively with support, they explained what's going on.
So, if you have no-store and a Minimum TTL of 0, then CloudFront will indeed not store your resources. However, if your Origin is taking a long time to respond (so likely under heavy load), while CloudFront waits for the response to the request, if it gets another identical request (identical with respect to the cache key), then it'll send the one response to both requests. This is in order to lighten the load on the server. (see docs)
Support was calling these "collapse hits" although I don't see that in the docs.
So, it seems you can't have a single Behavior serving some pages that must have a unique response per request while serving other pages that are cached. Support said:
I just confirmed that, with min TTL 0 and cache-control: no-store, we cannot disable collapse hit. If you do need to fully disable cloudfront cache, you can use cache policy CachingDisabled
We'll be making a behavior for every path prefix that we need caching on. It seems there was no better way than this for our use-case (transitioning our website one page at a time from a non-cacheable, backend-rendered jinja2/jQuery to a cacheable, client-side rendered React/Next.js).
It's probably too late for OP's project, but I would personally handle this with a simple origin-response Lambda#Edge function, and a single cache behavior for /* and cache policy. You can write all of the filtering/caching logic in the origin-response function. That way you only manage one bit of function code in one place, instead of a bunch of individual cache behaviors (and possibly a bunch of cache policies).
For example, an origin-response function that looks for a cache-control response header coming from your origin. If it exists, pass it back to the client. However if it doesn't exist (or if you want to overwrite it with something else) then you can create the response header there. The edge doesn't care if the cache-control header came from your origin, or from an origin-response Lambda. To the edge, it is all the same.
Another trick you can use in order to avoid caching and still use the default CloudFront behavior, is: Have a dummy unused query parameter that equals to a unique value for each request.
Python example:
import requests
import uuid
requests.get(f'http://my-test-server-x.com/my/path?nochace={uuid.uuid4()}')
requests.get(f'http://my-test-server-x.com/my/path?nochace={uuid.uuid4()}')
Note that both calls will reach destination and will not get response from cache since the uuid.uuid4() will always generate a unique value
This works since by default (if not defined otherwise in the Behavior section) the query parameters are part of the cache key
Note: Doing so will avoid cache use, hence your backend might be loaded with requests.

Why shouldn't we allow body in a GET or HEAD request?

I'm coming to this from the InfoSec side, not the AppDev side, I just wanted to put that caveat in first. The issue is that my WAF is blocking certain images with the response, HTTP protocol compliance failed:Body in GET or HEAD requests. I need to justify keeping this rule active, so I'm asking, as a non-developer:
is this getting blocked because that's the rules of GET and HEAD requests, or can we allow Body in GET and HEAD requests, but it's really not a good idea ?
Why is it not a good idea? What are the potential problems that arise from allowing Body in a GET or HEAD request?
Thanks in advance for everyone's help.
GET and HEAD don't send request bodies, only POST and PUT do (see RFC 2616)
You are blocked for another reason, as I saw myself often, one of the request headers Content-Length or Transfer-Encoding which can be tolerated if really there's no request body at all (Content-Length: 0). As the WAF finds these headers, it considers the request as having a body, even of size 0.
If you loosen the policy, you will allow legitimate traffic but also open the door to abnormal traffic on GET/PUT. To circumvent this, you can add an iRule or LTM policy to remove the headers on GET/PUT, until F5 releases a better version of the software to not block the traffic when the body is of size 0.
The potential problem comes when a bugged Web server would buffer the data sent in a GET/HEAD body instead of returning a 400 error, and ignoring the data. This data could lead to memory consumption, or to injecting hacker's data into legitimate users requests with unknown results at this time. If you are confident in your Web server, you may loosen the WAF policy.

How to avoid AJAX caching in Internet Explorer 11 when additional query string parameters or using POST are not an option

I realize this question has been asked, but in modern REST practice none of the previous iterations of this question nor their answers are accurate or sufficient. A definitive answer to this question is needed.
The problem is well known, IE (even 11) caches AJAX requests, which is really really dumb. Everyone understands this.
What is not well understood is that none of the previous answers are sufficient. Every previous instance of this question on SO is marked as sufficiently answered by either:
1) Using a unique query string parameter (such as a unix timestamp) on each request, so as to make each request URL unique, thereby preventing caching.
-- or --
2) using POST instead of GET, as IE does not cache POST requests except in certain unique circumstances.
-- or --
3) using 'cache-control' headers passed by the server.
IMO in many situations involving modern REST API practice, none of these answers are sufficient or practical. A REST API will have completely different handlers for POST and GET requests, with completely different behavior, so POST is typically not an appropriate or correct alternative to GET. As well, many APIs have strict validation around them, and for numerous reasons, will generate 500 or 400 errors when fed query string parameters that they aren't expecting. Lastly, often we are interfacing with 3rd-party or otherwise inflexible REST APIs where we do not have control over the headers provided by the server response, and adding cache control headers is not within our power.
So, the question is:
Is there really nothing that can be done on the client-side in this situation to prevent I.E. from caching the results of an AJAX GET request?
Caching is normally controlled through setting headers on the content when it is returned by the server. If you're already doing that and IE is ignoring them and caching anyway, the only way to get around it would be to use one of the cache busting techniques mentioned in your question. In the case of an API, it would likely be better to make sure you are using proper cache headers before attempting any of the cache busting techniques.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching_FAQ
Cache-control: no-cache
Cache-control: no-store
Pragma: no-cache
Expires: 0
If you don't control the API, you might be able to disable IE caching by adding request headers on the ajax gets:
'Cache-Control': 'no-cache, no-store, must-revalidate', 'Pragma': 'no-cache', 'Expires': '0'

Response Headers

"Name two headers that, if present n an http response, always have to be processed in a particular order. state the order and explain."
I have researched this online and in my textbook: Web Application Architecture by Shklar and Rosen.
My initial thought was that it was "www authentication" and "location" however with other research i have become more confused.
Any help would be greatly appreciated.
HTTP headers are independant of each other. They can appear in any order, and be processed in any order. I have never seen multiple HTTP headers that were meant to work together, let alone require being processed in a particular order.
The WWW-Authentication and Location headers have nothing to do with each other. WWW-Authentication is used to request/negotiate auth credentials for a single URL. Location is used during redirect responses (3xx other than 304) to specify a different URL to send a request to.
There was a recent discussion about the processing semantics of the Accept and Accept-Charset headers, but that relationship is not officially defined, and it applies to request headers, not response headers.

What are the advantages of using a GET request over a POST request?

Several of my ajax applications in the past have used GET request but now I'm starting to use POST request instead. POST requests seem to be slightly more secure and definitely more url friendly/pretty. Thus, i'm wondering if there is any reason why I should use GET request at all.
I generally set up the question as thus: Does anything important change after the request? (Logging and the like notwithstanding). If it does, it should be a POST request, if it doesn't, it should be a GET request.
I'm glad that you call POST requests "slightly" more secure, because that's pretty much what they are; it's trivial to fake a POST request by a user to a page. Making it a POST request, however, prevents web accelerators or reloads from re-triggering the action accidentally.
As AJAX, there is one more consideration: if you are returning JSON with callback support, be very careful not to put any sensitive data that you don't want other websites to be able to see in there. Wikipedia had a vulnerability along these lines where the user anti-CSRF token was revealed via their JSON API.
All good points, however, in answer to the question, GET requests are more useful in certain scenarios over POST requests:
They can be bookmarked
They can be cached
They're faster
They have known consequences (assuming they don't change data), so visiting them multiple
times is not a problem.
For the sake of posterity, updating this comment with the blog notes re: point #3 here, all credit to Omar AL Zabir (the author of the referenced blog post):
"Atlas by default makes HTTP POST for all AJAX calls. Http POST is
more expensive than Http GET. It transmits more bytes over the wire,
thus taking precious network time and it also makes ASP.NET do extra
processing on the server end. So, you should use Http Get as much as
possible. However, Http Get does not allow you to pass objects as
parameters. You can pass numeric, string and date only. When you make
a Http Get call, Atlas builds an encoded url and makes a hit to that
url. So, you must not pass too much content which makes the url become
larger than 2048 chars. As far as I know, that’s what is the max
length of any url.
Another evil thing about http post is, it’s actually 2 calls. First
browser sends the http post headers and server replies with “HTTP 100
Continue”. When browser receives this, it sends the actual body."
You should use GET where you're doing a request which has no side effects, e.g. just fetching some info. This request can:
Be repeated without any problem - if the browser detects an error it can silently retry
Have its result cached by the browser
Be cached by a proxy
These things are all good. Anything which is only retrieving data (particularly public data) should really be a GET. The server should send sensible Last-Modified: and Expires: headers to allow caching if required.
There is one other difference not mentioned by anyone.
GET requests are passed in the URL string and are therefore subject to a length limit usually dependent on the browser. It seems that most are around 2000 chars.
POST requests can be much much larger - in fact not limited really. So if you're needing to request data from a web server and you're passing in lots of parameter information then a POST request might be the only option.
So, as mentioned before really a GET request is for requesting data (no side effects) while a POST request is generally used for transmitting data back to the server to be stored (with side effects). e.g. Use POST to upload a file. GET to retrieve a file.
There was a time when IE I believe had a very short GET URL string. Some applications like Lotus notes use large numbers of random characters to represent document id's. I had the displeasure of using another product that generated random strings so the page URL was unique each time. The random string was HUGE... and it didn't always work with IE6 from memory.
This might help you to decide where to use GET and where to use POST:
URIs, Addressability, and the use of HTTP GET and POST.
POST requests are just as insecure as GETs. The main difference is that POST is used to modify the state of the server application, while GET only requests data from it.
The difference matters when you use clean, "restful" URLs, where the URL itself specifies the resource, and the different methods trigger different actions on the server side.
Perhaps most importantly, GET is book-markable / viewable in url history, and searchable with Google.
POST is important where you don't want the event to be bookmarkable or able to be typed in as a URL - otherwise you (or Google crawling your URLS) could end up accidentally doing things like deleting users from your system, for example.
GET
POST
In GET method, values are visible in the URL
In POST method, values are not visible in the URL.
GET has a limitation on the length of the values, generally 255 characters.
POST has no limitation on the length of the values since they are submitted via the body of HTTP.
GET performs are better compared to POST because of the simple nature of appending the values in the URL.
It has lower performance as compared to GET method because of time spent in including POST values in the HTTP body
This method supports only string data types.
This method supports different data types, such as string, numeric, binary, etc.
GET results can be bookmarked.
POST results cannot be bookmarked.
GET request is often cacheable.
The POST request is hardly cacheable.
GET Parameters remain in web browser history.
Parameters are not saved in web browser history.
Source and more in depth analysis: https://www.guru99.com/difference-get-post-http.html

Resources