Cloudflare documents a list of directives for the Cache-Control header, including stale-while-revalidate.
stale-while-revalidate=<seconds>
When present in an HTTP response, the stale-while-revalidate Cache-Control extension indicates that caches MAY serve the response in which it appears after it becomes stale, up to the indicated number of seconds since the object was originally retrieved.
I set my Cache-Control header to public, max-age=0, stale-while-revalidate=30 but I never seem to get a cache hit. Does Cloudflare actually support this?
If you put max-age=0, then effectively nothing is cached, if so, then there is nothing to serve. Try with max-age=1 - that should give some results in that 30 sec. period.
Ran into this question / issue today, and the Cloudflare documentation is not very clear, but there is a clue under their examples section on this page.
Cache-Control: max-age=600, stale-while-revalidate=30
This configuration indicates the asset is fresh for 600 seconds, and can be served stale for up to an additional 30 seconds to parallel requests for the same resource while the initial synchronous revalidation is attempted.
So, Cloudflare does support the directive, however their implementation is a bit different than one might expect. Essentially the first request made to your server once the cached content is expire will result in a synchronous request to your Origin, which that client must wait for (the behavior most people don't want).
However, there is some support because any parallel requests (e.g. made by other clients) for that same resource will get the old / cached content instead (with a CF-Cache-Status: UPDATING). Tested this and was able to confirm behavior.
In this way, if you have a very slow origin response, or a very popular cached resource, only a single client / user is held up, and the rest get served up the stale cached copy from Cloudflare. Better than nothing.
Related
I have a question in regard to how CloudFront will use an S3 object's ETag to determine if it needs to send a refreshed object or not.
I know that the ETag will be part of the Request to the CloudFront distribution, in my case I'm seeing the "weak" (shortened) version:
if-none-match: W/"eabcdef4036c3b4f8fbf1e8aa81502542"
If this ETag being sent does not match the S3 Object's current ETag value, then the CloudFront will send the latest version.
I'm seeing this work as expected, but only after the CloudFront's cache policy has been reached. In my case it's been set to 20 mins.
CloudFront with a Cache Policy:
Minimum TTL: 1
Maximum TTL: 1200 <-- (20 mins)
Default TTL: 900
Origin Request Policy is not set
S3 Bucket:
Set to only allow access via its corresponding CloudFront
distribution above.
Bucket and objects not public
The test object (index.html) in this case has only one header set:
Content-Type = text/html
While I am using the CloudFront's Cache Policy, I've also tested
using the S3 Object header of Cache-Control = max-age=6000
This had no affect on the refresh of the "index.html" object in
regard to the ETag check I'm asking about.
The Scenario:
Upon first "putObject" to that S3 bucket, the "index.html" file has an ETag of:
eabcdef4036c3b4f8fbf1e8aa81502542
When I hit the URL (GET) for that "index.html" file, the cache of 20 mins is effectively started.
Subsequent hits to the "index.html" URL (GET) has the Request with the value
if-none-match: W/"eabcdef4036c3b4f8fbf1e8aa81502542"
I also see "x-cache: Hit from cloudfront" in the Response coming back.
Before the 20 mins is up, I'll make a change to the "index.html" file and re-upload via a "putObject" command in my code.
That will then change the ETag to:
exyzcde4099c3b4f8fuy1e8aa81501122
I would expect then that the next Request to CloudFront, before the 20-minute TTL and with the old "if-none-match" value, would then prompt the CloudFront to see the ETag is different and send the latest version.
But in all cases/tests it doesn't. CloudFront will seem to ignore the ETag difference and continue to send the older "index.html" version.
It's only after the 20 mins (cache TTL) is up that the CloudFront sends the latest version.
At that time the ETag in the Request changes/updates as well:
if-none-match: W/"exyzcde4099c3b4f8fuy1e8aa81501122"
Question (finally, huh?):
Is there a way to configure CloudFront to listen to the incoming ETag, and if needed, send the latest Object without having to wait for the Cache Policy TTL to expire?
UPDATE:
Kevin Henry's response explains it well:
"CloudFront doesn't know that you updated S3. You told it not to check with the origin until the TTL has expired. So it's just serving the old file until the TTL has expired and it sees the new one that you uploaded to S3. (Note that this doesn't have anything to do with ETags)."
So I decided to test how the ETag would be used if I turned the CloudFront Caching Policy to a TTL of 0 for all three CloudFront settings. I know that this defeats the purpose, and one of the strengths, of CloudFront, but I'm still wrapping my head around certain key aspects of CDN caching.
After setting the cache to 0, I'm seeing a continual "Miss from CloudFront" in the Response coming back.
I expected this, and in the first response I see a HTTP status of 200. Note the file size being returned is 128KB for this test.
Subsequent calls to this same file return a HTTP status of 304, with a file size being returned around 400B.
As soon as I update the "index.html" file in the S3 bucket, and call that same URL, the status code is 200 with a file size of 128KB.
Subsequent calls return a status of 304, again with an average of 400B in file size.
Looking again at the definition of an HTTP status of 304:
https://httpstatuses.com/304
"A conditional GET or HEAD request has been received and would have resulted in a 200 OK response if it were not for the fact that the condition evaluated to false.
In other words, there is no need for the server to transfer a representation of the target resource because the request indicates that the client, which made the request conditional, already has a valid representation; the server is therefore redirecting the client to make use of that stored representation as if it were the payload of a 200 OK response."
So am I correct in thinking that I'm using the Browser's cache at this point?
The calls to the CloudFront will now pass the requests to the Origin, where the ETag is used to verify if the resource has changed.
As it hasn't, then a 304 is returned and the Browser kicks in and returns its stored version of "index.html".
Would this be a correct assumption?
In case you're wondering, I can't use the invalidation method for clearing cache, as my site could expect several thousand invalidations a day. I'm hosting a writing journal site, where the authors could update their files daily, therefore producing new versions of their work on S3.
I would also rather not use the versioning method, with a timestamp or other string added as a query to the page URL. SEO reasons for this one mainly.
My ideal scenario would be to serve the same version of the author's work until they've updated it, at which time the next call to that same page would show its latest version.
This research/exercise is helping me to learn and weigh my options.
Thanks again for the help/input.
Jon
"I would expect then that the next Request to CloudFront, before the 20-minute TTL and with the old if-none-match value, would then prompt the CloudFront to see the ETag is different and send the latest version."
That is a mistaken assumption. CloudFront doesn't know that you updated S3. You told it not to check with the origin until the TTL has expired. So it's just serving the old file until the TTL has expired and it sees the new one that you uploaded to S3. (Note that this doesn't have anything to do with ETags).
CloudFront does offer ways to invalidate the cache, and you can read more about how to combine that with S3 updates in these answers.
We can enable bucket versioning and object with new etag is picked up by the cloudfront
Somebody commented on this question about caching:
...using a Cache-Control value of: max-age=0, s-maxage=604800 seems to get my desired behavior of instant client updates on new page contents, but still caching at the CDN level
Will I really get caching at CDN level and instant updates for my users?
Does it make sense? How does that combination work?
Yes, it makes sense.
With the configuration mentioned in that comment, your users will get instant stale responses, so they'll have to verify it the next time they make a resquest. And the CDN will cache a valid response for 604800 seconds. So repeated requests will be mostly served by CDN, instead of the Origin server.
But what if you update your app? What happens to the stale cache on the CDN?
After a new deployment, you need to make sure all of your stale cache from the CDN will be purged / cleared.
For example, see Purging cached resources from Cloudflare: it gives you numerous options on how to do that.
Purge by single-file (by URL)
Purging by single-file through your Cloudflare dashboard
Purge everything
Purge cached resources through the API
etc
Firebase Hosting, for example, will clear all CDN cache after a new deployment:
Any requested static content is automatically cached on the CDN. If you redeploy your site's content, Firebase Hosting automatically clears all your cached static content across the CDN until the next request.
As far as the setting suggested in the comment, I think Cache-Control: no-cache would do a better job.
From MDN - Cache Control:
no-cache
The response may be stored by any cache, even if the response is normally non-cacheable. However, the stored response MUST always go through validation with the origin server first before using it, therefore, you cannot use no-cache in-conjunction with immutable. If you mean to not store the response in any cache, use no-store instead. This directive is not effective in preventing caches from storing your response.
After reading many articles and some questions on here, I finally succeded in activating the Apache mod_expires to tell the browser it MUST cache images for 1 year.
<filesMatch "\.(ico|gif|jpg|png)$">
ExpiresActive On
ExpiresDefault "access plus 1 year"
Header append Cache-Control "public"
</filesMatch>
And thankfully server responses seem to be correct:
HTTP/1.1 200 OK
Date: Fri, 06 Apr 2012 19:25:30 GMT
Server: Apache
Last-Modified: Tue, 26 Jul 2011 18:50:14 GMT
Accept-Ranges: bytes
Content-Length: 24884
Cache-Control: max-age=31536000, public
Expires: Sat, 06 Apr 2013 19:25:30 GMT
Connection: close
Content-Type: image/jpeg
Well, I thought this would stop the browser to download and even inquire the server about the images for 1 year. But it's partially true: cause if you close and reopen the browser, the browser does NOT download the images from server anymore, but browser still inquires the server with an HTTP request for each image.
How do I force browser to stop making HTTP requests for each image? Even if these HTTP requests are not followed by an image being downloaded, they are still requests made to the server that unecessarely icrease latency and slow down the page rendering!
I already told the browser it MUST keep the images in cache for 1 year! Why does browser still inquire the server for each image (even if it does not download the image)?!
Looking at network graphs in FireBug (menu FireBug > Net > Images) I can see different caching behaviours (I obviously started with the browser cache completly empty, I forced a cache delete on browser using "Clear All History"):
When the page is loaded for the 1st time all images are downloaded (and same thing happens if I force a page reload by clicking on the browser's reload page button). This makes sense!
When I navigate the site and get back to the same page the images are not downloaded at all and the browser does NOT even inquire the server for any of the images. This makes sense, (and I would like to see this behaviour also when browser is closed)!
When I close the browser and open it again on the same page, the silly browser makes anyway HTTP request to the server one time per image: it does NOT downalod the image, but it still makes an HTTP request, it's like the browser inquires the server about the image (server replies with 200 OK). This is the one that irritates me!
I also attach the graphs below if you are interested:
EDIT: just tested now also with FireFox 11.0 just to make sure it wasn't an issue of my FireFox 3.6 being too old. The same thing happens!!! I also tested Google site and Stackoverflow site, they do both send the Cache-Control: max-age=... but the browser still makes an HTTP request to the server for each image once the browser is closed and opened again on the same page, after server response the browser does NOT download the image (as I explained above) but it still makes the damn request that increases time to see page.
EDIT2: and removing the Last-Modified header as suggested here, does not solve the problem, it does not make any difference.
The behavior you are seeing is the intended (see RFC7234 for more details), specified behavior:
All modern browsers will send HTTP requests to the server for every page element displayed, regardless of cache status. This was a design decision made at the request of web services (especially advertising networks) to ensure that HTTP servers were able to maintain records of every display of every element.
If the browsers did not make these requests, the server would never be notified that an image had been displayed to the user. For advertising networks, this would be catastrophic. Early on, advertising networks 'hacked' their way around this by serving the same ad image using randomly generated names (ex: 'coke_ad_1_98719283719283.gif'). However, for ISPs this practice caused a huge increase in data transfers, because every one of their users was re-downloading these identical ad images, bypassing any caching/proxy servers their ISP was operating.
So a truce was reached: Browsers would always send HTTP requests, even for un-expired cached elements. Servers would respond with HTTP 304 status codes ("not modified"). This allows the servers to record the fact that the image was displayed to the client. As a result, advertising networks generally stopped using randomized image names to bypass network cache servers.
This gave the ad networks what they wanted - a record of every image displayed - and it gave ISPs what they wanted - cache-able images and static content.
That is why there isn't much you can do to prevent browsers from sending HTTP requests for cached page elements.
But if you look at other available client-side solutions that came along with html5, there is a scope to prevent resource loading
Cache Manifest (in spite of its gotchas)
IndexedDB (nice asynchronous features, allows blob storage)
Local Storage (not async)
You were using the wrong tool for analysing the requests.
I'd recommend the really useful Firefox addon Live HTTP headers so you can see what is really going on on the network.
And just to be sure, you can ssh/putty your server and do something like
tail -f /var/log/apache2/access.log
There's a difference between "reloading" and "refreshing". Just navigating to a page with back and forward buttons usually doesn't initiate new HTTP requests, but specifically hitting F5 to "refresh" the page will cause the browser to double check its cache. This is browser dependent but seems to be the norm for FF and Chrome (i.e. the browsers that have the ability to easily watch their network traffic.) Hitting F6, enter should focus the URL address bar and then "go" to it, which should reload the page but not double check the assets on the page.
Update: clarification of back and forward navigating behavior. It's called "Back Forward Cache" or BFCache in browsers. When you navigate with back/forward buttons the intent is to show you exactly as the page was when you saw it in your own timeline. No server requests are made when using back and forward, even if a server cache header says that a particular item expired.
If you see (200 OK BFCache) in your developer network panel, then the server was never hit - even to ask if-modified-since.
http://www.softwareishard.com/blog/firebug/firebug-tip-what-the-heck-is-bfcache/
If I force a refresh using F5 or F5 + Ctrl, a request is send. However if I close the browser and enter the url again then NO reqeust is send. The way I tested if a request is send or not was by using breakpoints on begin request on the server even when a request is not send it still shows up in Firebug as having done a 7 ms wait, so beware of this.
What you are describing here does not reflect my experience. If content is served with a no-store directive or you do an explicit refresh, then yes, I'd expect it to go back to the origin server otherwise it should be cached across browser restarts (assuming it is allowed to, and can write a cache file).
Looking at your waterfalls in a bit more detail (which is tricky because they are a bit small & blurry) the browser appears to be doing exactly what it should - it has entries for the images - but these are just loading from the local cache not from the origin server - check the 'Date' header in the response (why do you think it's taking milliseconds instead of seconds?). That's why they are coloured differently.
After myself spending considerable time looking for a reasonable answer, I found the below link most useful and it does answer the question asked here.
https://webmasters.stackexchange.com/questions/25342/headers-to-prevent-304-if-modified-since-head-requests
If it is a matter of life or death (If you want to optimise page loading this way or if you want to reduce the load on the server as much as possible no matter what), then there IS a workaround.
Use HTML5 local storage to cache images after they were requested for the first time.
[+] You can prevent browser from sending HTTP requests, which in 99% would return 304 (Not Modified), no matter how hard user tries (F5, ctrl+F5, simply revisiting page, etc.)
[-] You have to put some extra efforts in javascript support for this.
[-] Images are stored in base64 (we cannot store binary data), thats why they are decoded each time at client side. Which is usually pretty fast and not big deal, but it is still some extra cpu usage at client side and should be kept in mind.
[-] Local storage is limited. You can aim at using ~5mb of data per domain (Note: base64 adds ~30% to original size of image).
[?] Supported by majority of browsers. http://caniuse.com/#search=localstorage
Example
Test
What you are seeing in Chrome is not a record of the actual HTTP requests - it's a record of asset requests. Chrome does this to show you that an asset is actually being requested by the page. However, this view does not really actually indicate if the request is being made. If an asset is cached, Chrome will never actually create the underlying HTTP request.
You can also confirm this by hovering over the purple segments in the timeline. Cached resources will have a (from cache) in the tooltip.
In order to see the actual HTTP requests, you need to look on a lower level. In some browsers this can be done with a plugin (like Live HTTP Headers).
In reality though, to verify the requests are not actually being made you need to check your server logs or use a debugging proxy like Charles or Fiddler. This will work on an HTTP level to make sure the requests are not actually happening.
Cache Validation and the 304 response
There are a number of situations in which Internet Explorer needs to check whether a cached entry is valid:
The cached entry has no expiration date and the content is being accessed for the first time in a browser session
The cached entry has an expiration date but it has expired
The user has requested a page update by clicking the Refresh button or pressing F5
If the cached entry has a last modification date, IE sends it in the If-Modified-Since header of a GET request message:
GET /images/logo.gif HTTP/1.1
Accept: */*
Referer: http://www.google.com/
Accept-Encoding: gzip, deflate
If-Modified-Since: Thu, 23 Sep 2004 17:42:04 GMT
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)
Host: www.google.com
The server checks the If-Modified-Since header and responds accordingly. If the content has not been changed since the date/time specified, it replies with a status code of 304 and a response message that just contains headers:
HTTP/1.1 304 Not Modified
Content-Type: text/html
Server: GWS/2.1
Content-Length: 0
Date: Thu, 04 Oct 2004 12:00:00 GMT
The response can be quickly downloaded because it contains no content and causes IE to read the data it requires from the cache. In effect, it is like a redirection to the local browser cache.
If the requested object has actually changed since the date/time in the If-Modified-Since header, the server responses with a status code of 200 and supplies the modified version of the resource.
This question has a better answer here at webmasters stack-exchange site.
More information, which is also cited in the above link, is on httpwatch
According to the article:
There are a number of situations in which Internet Explorer needs to check whether a cached entry is valid:
The cached entry has no expiration date and the content is being accessed for the first time in a browser session
The cached entry has an expiration date but it has expired
The user has requested a page update by clicking the Refresh button or pressing F5
enter code here
For some reason Firefox 4 is not updating ajax content. It seems to store cache until the cache is manually removed. I am using must-revalidate headers and each request sends an expired time, so what can be causing this issue?
Some debugging revealed that firefox doesn't do actual requests if an expire time is set in the future.
It'll only start doing requests when the expire time is in the past. This means that the usual manner of getting 304's doesn't work because they never actually get requested.
Assume browser default settings, and content is sent without expires headers.
user visits website, browser caches images etc.
user does not close browser, or refresh page.
user continues to surf site normally.
assume the browse doesn't dump the cache for any reason.
The browser will cache images etc as the user surfs, but it's unclear when it will issue a conditional GET request to ask about content freshness (apart from refreshing the page). If this is a browser specific setting, where can I see it's value (for browsers like: safari, IE, FireFox, Chrome).
[edit: yes - I understand that you should always send expires headers. However, this research is aimed at understanding how the browser works with content w/o expires headers.]
From the the HTTP caching spec (section 13.4): Unless specifically constrained by a cache-control (section 14.9) directive, a caching system MAY always store a successful response (see section 13.8) as a cache entry, MAY return it without validation if it is fresh, and MAY return it after successful validation. This means that a user agent is free to do whatever it wants if no cache control header is sent. Most browsers use a combination of user settings and heuristics to determine whether (and how long) to cache in this situation.
HTTP/1.1 defines a selection of caching mechanisms; the expires header is merely one, there is also the cache-control header.
To directly answer your question: for a resource returned with no expires header, you must consider the returned cache-control directives.
HTTP/1.1 defines no caching behaviour for a resource served with no cache-related headers. If a resource is sent with no cache-control or expires headers you must assume the client will make a regular (non-conditional) request the next time the same resources is requested.
Any deviation from this behaviour qualifies the client as being not a fully conformant HTTP client, in which case the question becomes: what behaviour is to be expected from a non-conformant HTTP client? There is no way to answer that.
HTTP caching is complex, to fully understand what a conformant client should do in a given scenario, read and understand the HTTP caching spec.
Unless you send an expires header, most browsers will make a GET request for each subsequent refresh and will either get HTTP 200 OK (it will download the content again) or HTTP 304 Not Modified (and use the data in cache).