How to avoid AJAX caching in Internet Explorer 11 when additional query string parameters or using POST are not an option - ajax

I realize this question has been asked, but in modern REST practice none of the previous iterations of this question nor their answers are accurate or sufficient. A definitive answer to this question is needed.
The problem is well known, IE (even 11) caches AJAX requests, which is really really dumb. Everyone understands this.
What is not well understood is that none of the previous answers are sufficient. Every previous instance of this question on SO is marked as sufficiently answered by either:
1) Using a unique query string parameter (such as a unix timestamp) on each request, so as to make each request URL unique, thereby preventing caching.
-- or --
2) using POST instead of GET, as IE does not cache POST requests except in certain unique circumstances.
-- or --
3) using 'cache-control' headers passed by the server.
IMO in many situations involving modern REST API practice, none of these answers are sufficient or practical. A REST API will have completely different handlers for POST and GET requests, with completely different behavior, so POST is typically not an appropriate or correct alternative to GET. As well, many APIs have strict validation around them, and for numerous reasons, will generate 500 or 400 errors when fed query string parameters that they aren't expecting. Lastly, often we are interfacing with 3rd-party or otherwise inflexible REST APIs where we do not have control over the headers provided by the server response, and adding cache control headers is not within our power.
So, the question is:
Is there really nothing that can be done on the client-side in this situation to prevent I.E. from caching the results of an AJAX GET request?

Caching is normally controlled through setting headers on the content when it is returned by the server. If you're already doing that and IE is ignoring them and caching anyway, the only way to get around it would be to use one of the cache busting techniques mentioned in your question. In the case of an API, it would likely be better to make sure you are using proper cache headers before attempting any of the cache busting techniques.
https://developer.mozilla.org/en-US/docs/Web/HTTP/Caching_FAQ
Cache-control: no-cache
Cache-control: no-store
Pragma: no-cache
Expires: 0

If you don't control the API, you might be able to disable IE caching by adding request headers on the ajax gets:
'Cache-Control': 'no-cache, no-store, must-revalidate', 'Pragma': 'no-cache', 'Expires': '0'

Related

RefreshHit from cloudfront even with cache-control: max-age=0, no-store

Cloudfront is getting a RefreshHit for a request that is not supposed to be cached at all.
It shouldn't be cached because:
It has cache-control: max-age=0, no-store;
The Minimum TTL is 0; and
I've created multiple invalidations (on /*) so this cached resource isn't from some historical deploy
Any idea why I'm getting RefreshHits?
I also tried modifying Cache-Control to be cache-control no-store, stale-if-error=0, creating a new invalidation on /* and now I'm seeing a cache hit (this time in Firefox):
After talking extensively with support, they explained what's going on.
So, if you have no-store and a Minimum TTL of 0, then CloudFront will indeed not store your resources. However, if your Origin is taking a long time to respond (so likely under heavy load), while CloudFront waits for the response to the request, if it gets another identical request (identical with respect to the cache key), then it'll send the one response to both requests. This is in order to lighten the load on the server. (see docs)
Support was calling these "collapse hits" although I don't see that in the docs.
So, it seems you can't have a single Behavior serving some pages that must have a unique response per request while serving other pages that are cached. Support said:
I just confirmed that, with min TTL 0 and cache-control: no-store, we cannot disable collapse hit. If you do need to fully disable cloudfront cache, you can use cache policy CachingDisabled
We'll be making a behavior for every path prefix that we need caching on. It seems there was no better way than this for our use-case (transitioning our website one page at a time from a non-cacheable, backend-rendered jinja2/jQuery to a cacheable, client-side rendered React/Next.js).
It's probably too late for OP's project, but I would personally handle this with a simple origin-response Lambda#Edge function, and a single cache behavior for /* and cache policy. You can write all of the filtering/caching logic in the origin-response function. That way you only manage one bit of function code in one place, instead of a bunch of individual cache behaviors (and possibly a bunch of cache policies).
For example, an origin-response function that looks for a cache-control response header coming from your origin. If it exists, pass it back to the client. However if it doesn't exist (or if you want to overwrite it with something else) then you can create the response header there. The edge doesn't care if the cache-control header came from your origin, or from an origin-response Lambda. To the edge, it is all the same.
Another trick you can use in order to avoid caching and still use the default CloudFront behavior, is: Have a dummy unused query parameter that equals to a unique value for each request.
Python example:
import requests
import uuid
requests.get(f'http://my-test-server-x.com/my/path?nochace={uuid.uuid4()}')
requests.get(f'http://my-test-server-x.com/my/path?nochace={uuid.uuid4()}')
Note that both calls will reach destination and will not get response from cache since the uuid.uuid4() will always generate a unique value
This works since by default (if not defined otherwise in the Behavior section) the query parameters are part of the cache key
Note: Doing so will avoid cache use, hence your backend might be loaded with requests.

Why does the XMLHttpRequest spec prevent setting the Accept-Encoding header?

Today, I wanted to utilize the Accept-Encoding header to request an image as base64. Come to find out, the XMLHttpRequest spec prevents setting that header!
http://www.w3.org/TR/XMLHttpRequest/#the-setrequestheader()-method
Note: The above headers are controlled by the user agent to let it control those aspects of transport. This guarantees data integrity to some extent. Header names starting with Sec- are not allowed to be set to allow new headers to be minted that are guaranteed not to come from XMLHttpRequest.
Why in the world would they write a spec like this? It'd make more sense if the browser just provided a default value (eg. gzip,deflate,sdch) if none was specified.
The browser is responsible for accepting and processing the response. It wouldn't make much sense to manipulate your XHR to say it accepts gzip, for example, when you couldn't do anything with it. Can you just set a custom header value?
Why in the world would they write a spec like this?
In one word: laziness.
Why add in extra semantics describing secure predictable browser behavior for every possible header that would only be used by a few people like us, when they can just declare all the headers prohibited in one paragraph.

How long should static file be cached?

I'd like to set browser caching for some Amazon S3 files. I plan to use this meta data:
Cache-Control: max-age=86400, must-revalidate
that's equal to one day.
Many of the examples I see look like this:
Cache-Control: max-age=3600
Why only 3600 and why not use must-revalidate?
For a file that I rarely change, how long should it be cached?
What happens if I update the file and need that update to be seen immediately, but its cache doesn't expire for another 5 days?
Why only 3600 ?
Assumingly because the author of that particular example decided that one hour was an appropiate cache timeout for that page.
Why not use must-revalidate ?
If the response does not contain information that is strictly required to follow the cache rules you set, omitting must-revalidate could in theory ensure that a few more requests are delivered through the cache. See this answer for details, the most relevant part being from the HTTP spec:
When a cache has a stale entry that it would like to use as a response
to a client's request, it first has to check with the origin server
(or possibly an intermediate cache with a fresh response) to see if
its cached entry is still usable.
For a file that I rarely change, how long should it be cached?
Many web performance advices says to set a very far into the future cache expiration, such as a few years. This way, the client browser will only download the data once, and subsequent visits will be served from the cache. This works well for "truly static" files, such as Javascript or CSS.
On the other hand, if the data is dynamic, but does not change too often, you should set an expiration time that is reasonable based for your specific scenario. Do you need to get the newest version to the customer as soon as it's available, or is it okay to serve a stale version ? Do you know when the data change ? Etc. An hour or a day is often appropiate trade-offs between server load, client performance, and data freshness, but it depends on your requirements.
What happens if I update the file and need that update to be seen immediately, but its cache doesn't expire for another 5 days?
Give the file a new name, or append a value to the querystring. You will of course need to update all links. This is the general approach when static resources need to change.
Also, here is a nice overview of the cache control attributes available to you.

leverage browser caching - expires or max-age, last-modified or etag

I'm having difficulty finding a clear-cut, practical explanation for what is the correct way to leverage browser caching to increase page speed.
According to this site:
It is important to specify one of Expires or Cache-Control max-age,
and one of Last-Modified or ETag, for all cacheable resources. It is
redundant to specify both Expires and Cache-Control: max-age, or to
specify both Last-Modified and ETag.
Is this correct? If so, should I use Expires or max-age? I think I have a general understanding of what both of those are but don't know which is usually best to use.
If I have to also do Last-Modified or ETag, which one of those? I think I get Last-Modified but am still very fuzzy on this ETag concept.
Also, which files should I enable browser caching for?
Is this correct?
Yes, Expires and max-age do the same thing, but in two different ways. Same thing with Last-Modified and Etag
If so, should I do
expires or max-age?
Expires depends on accuracy of user's clock, so it's mostly a bad choice (as most browsers support HTTP/1.1). Use max-age, to tell the browser that the file is good for that many seconds. For example, a 1 day cache would be:
Cache-Control: max-age=86400
Note that when both Cache-Control and Expires are present, Cache-Control takes precedence. read
If I have to also do Last-Modified or ETag, which one of those? I think I get Last-Modified
You're right, Last-Modified should be better. Although it's a time, it's sent by the server. Hence no issue with user's clock. It's the reason why Last-Modified is better than Expires.
The browser sends the Last-Modified the server sent last time it asked for the file, and if it's the same, the server answsers with an empty response «304 Not Modified»
Also, you must note that ETag can be useful too, because Last-Modified has a time window of one second. So you can't distinguish two different sources with the same Last-Modified value. [2]
Etag needs some more computing than Last-Modified, as it's a signature of the current state of the file (similar to a md5 sum or a CRC32).
Also, which files should I enable browser caching for?
All files can benefit caching. You've got two different approaches:
with max-age: useful for files that never change (images, CSS, javascript). For as long as the max-age value, the browser won't send any request to the server. So you'll see the page loading really fast on the second load. If you need to update files, just append a question mark, and the date of change (for example /image.png?20110602, or for better proxies caching, something like /20110602/image.png or /image.20110602.png). This way you can make files expire if it's urgent (remember that the browser almost never hits the server once it has a max-age file). Main use is to speed things up and limit requests sent to the server.
with Last-Modified: can be set on all files (including those with max-age). Even if you have dynamic pages, you may not change the content of the file for a while (even if it's 10 min), so that could be useful. The main use here is to tell the browser «keep asking me for this file, if it's new, I'll send you the new one». So, there's a request sent on each page load, but the answer is empty if the file is already good (304 Not Modified), so you save on bandwidth.
The more you cache, the faster your pages will show up. But it's a difficult task to flush caches, so use with care.
A good place to learn all this with many explanations: http://www.mnot.net/cache_docs/
[2]: rfc7232 Etag https://www.rfc-editor.org/rfc/rfc7232#section-2.3

What are the advantages of using a GET request over a POST request?

Several of my ajax applications in the past have used GET request but now I'm starting to use POST request instead. POST requests seem to be slightly more secure and definitely more url friendly/pretty. Thus, i'm wondering if there is any reason why I should use GET request at all.
I generally set up the question as thus: Does anything important change after the request? (Logging and the like notwithstanding). If it does, it should be a POST request, if it doesn't, it should be a GET request.
I'm glad that you call POST requests "slightly" more secure, because that's pretty much what they are; it's trivial to fake a POST request by a user to a page. Making it a POST request, however, prevents web accelerators or reloads from re-triggering the action accidentally.
As AJAX, there is one more consideration: if you are returning JSON with callback support, be very careful not to put any sensitive data that you don't want other websites to be able to see in there. Wikipedia had a vulnerability along these lines where the user anti-CSRF token was revealed via their JSON API.
All good points, however, in answer to the question, GET requests are more useful in certain scenarios over POST requests:
They can be bookmarked
They can be cached
They're faster
They have known consequences (assuming they don't change data), so visiting them multiple
times is not a problem.
For the sake of posterity, updating this comment with the blog notes re: point #3 here, all credit to Omar AL Zabir (the author of the referenced blog post):
"Atlas by default makes HTTP POST for all AJAX calls. Http POST is
more expensive than Http GET. It transmits more bytes over the wire,
thus taking precious network time and it also makes ASP.NET do extra
processing on the server end. So, you should use Http Get as much as
possible. However, Http Get does not allow you to pass objects as
parameters. You can pass numeric, string and date only. When you make
a Http Get call, Atlas builds an encoded url and makes a hit to that
url. So, you must not pass too much content which makes the url become
larger than 2048 chars. As far as I know, that’s what is the max
length of any url.
Another evil thing about http post is, it’s actually 2 calls. First
browser sends the http post headers and server replies with “HTTP 100
Continue”. When browser receives this, it sends the actual body."
You should use GET where you're doing a request which has no side effects, e.g. just fetching some info. This request can:
Be repeated without any problem - if the browser detects an error it can silently retry
Have its result cached by the browser
Be cached by a proxy
These things are all good. Anything which is only retrieving data (particularly public data) should really be a GET. The server should send sensible Last-Modified: and Expires: headers to allow caching if required.
There is one other difference not mentioned by anyone.
GET requests are passed in the URL string and are therefore subject to a length limit usually dependent on the browser. It seems that most are around 2000 chars.
POST requests can be much much larger - in fact not limited really. So if you're needing to request data from a web server and you're passing in lots of parameter information then a POST request might be the only option.
So, as mentioned before really a GET request is for requesting data (no side effects) while a POST request is generally used for transmitting data back to the server to be stored (with side effects). e.g. Use POST to upload a file. GET to retrieve a file.
There was a time when IE I believe had a very short GET URL string. Some applications like Lotus notes use large numbers of random characters to represent document id's. I had the displeasure of using another product that generated random strings so the page URL was unique each time. The random string was HUGE... and it didn't always work with IE6 from memory.
This might help you to decide where to use GET and where to use POST:
URIs, Addressability, and the use of HTTP GET and POST.
POST requests are just as insecure as GETs. The main difference is that POST is used to modify the state of the server application, while GET only requests data from it.
The difference matters when you use clean, "restful" URLs, where the URL itself specifies the resource, and the different methods trigger different actions on the server side.
Perhaps most importantly, GET is book-markable / viewable in url history, and searchable with Google.
POST is important where you don't want the event to be bookmarkable or able to be typed in as a URL - otherwise you (or Google crawling your URLS) could end up accidentally doing things like deleting users from your system, for example.
GET
POST
In GET method, values are visible in the URL
In POST method, values are not visible in the URL.
GET has a limitation on the length of the values, generally 255 characters.
POST has no limitation on the length of the values since they are submitted via the body of HTTP.
GET performs are better compared to POST because of the simple nature of appending the values in the URL.
It has lower performance as compared to GET method because of time spent in including POST values in the HTTP body
This method supports only string data types.
This method supports different data types, such as string, numeric, binary, etc.
GET results can be bookmarked.
POST results cannot be bookmarked.
GET request is often cacheable.
The POST request is hardly cacheable.
GET Parameters remain in web browser history.
Parameters are not saved in web browser history.
Source and more in depth analysis: https://www.guru99.com/difference-get-post-http.html

Resources