How to cache a HTML page with must-revalidate? - caching

When caching a HTML page with must-revalidate, this means that browser must check for any update defined by Last-Modified or Etag. However, the problem is that before max-age, browser will not make any connection with the website to read HTTP headers (to analyze Last-Modified and Etag)?
How to force the browser to make a brief connection to read (at least) HTTP readers before loading the page from cache?
I do not understand the usage of must-revalidate! Doesn't it its responsibility to check for updates before max-age? because after reaching max-age, browser will read from the website and never use local cache.

Yes, your understanding of must-revalidate is wrong: it says that the cache may not serve this content when it is stale (i.e. "expired"), but must revalidate before that. Yes, caches (and browsers) can in theory be set to serve pages even if they are stale, though the standard says they should warn the user if they do this.
To force the browser to recheck your page with the server, the simplest solution is to add max-age=0 to the Cache-Control header. This will let your browser keep a copy of the page in its cache, but compare it with the server's version by sending the contents of Last-Modified or ETag, as you wanted.
It used to be that you could add no-cache instead, but as users have been expecting this to behave as no-store, browsers are gradually treating them the same.
Check the HTTP/1.1 RFC, section 14.9 for more information on the headers.

Related

Cache-Control Headers not respected on CloudFlare

I am trying to get some html pages to be cached, the same way images are automatically cached via CloudFlare but I can't get CloudFlare to actually hits its cache for html.
According to the documentation (Ref: https://support.cloudflare.com/hc/en-us/articles/202775670-How-Do-I-Tell-CloudFlare-What-to-Cache-), it's possible to cache anything with a Cache-Control set to public with a max-age greater than 0.
I've tried various combinations of headers on my origin Nginx server without success. From a simple Cache-Control: public, max-age=31536000 to more complex headers including s-maxage=31536000, Pragma: public, ETag: "569ff137-6", Expires: Thu, 31 Dec 2037 23:55:55 GMT without any results.
Any ideas to force CloudFlare to serve the html pages from their cache?
PS: I am getting the CF-Cache-Status: HIT on the images and it works fine but on the html pages nothing, not even CF-Cache-Status: something. With a CloudFlare page rule for html pages, it seems to work fine but I want to avoid using one, mainly because it's too CloudFlare specific. I am not serving cookies or anything dynamic from these pages.
It is now possible to get Cloudflare to respect your web servers headers instead of overriding them with the minimum described in the Browser Cache TTL setting.
Firstly navigate to the Caching tab in the Cloudflare dashboard:
From here you can scroll down to the "Browser Cache Expiration" setting, from here you can select the "Respect Existing Headers" option in the dropdown:
Further reading:
Does CloudFlare honor my Expires and Cache-Control headers for static content?
Caching Anonymous Page Views
How do I cache static HTML?
Note: If this setting isn't chosen, Cloudflare will apply a default 4 hour minimum to Cache-Control headers. Once this setting is set, Cloudflare will not touch your Cache-Control headers (even if they're low or not at all set).
I stumbled on this too. From the page it says
Pro Tip: Sending cache directives from your origin for resources with extensions we don't cache by default will make absolutely no difference. To specify the caching duration at your origin for the extensions we don't cache by default, you'd have to create a Page Rule to "Cache Everything".
So it appears that you do have to set a page rule to use this for files that CloudFlare doesn't cache by default. This page describes this in more detail,
https://blog.cloudflare.com/edge-cache-expire-ttl-easiest-way-to-override/
That said it still didn't work for me and appears not to be supported. After contacting their support they confirmed this. Respect Origin Header has been removed from all plan types. So if you have no page rules they will respect the origin header.
This doesn't help for hitting their edge cache for html pages however. To that you have set up a page rule. Once that is done you can, I believe, set your max-age as low as your plan allows. Any lower and it gets over-written. That is to say, with no page rule you could say Cache-Control: max-age:30 and it would pass through. With a page rule that include edge caching your max-age then becomes subject to the minimum time your plan allows even if the page rule doesn't specify browser cache.
The CF documentation is very unclear. Go into "Page Rules", and define a rule that turns on caching, based upon wildcards -- and then it will work.

How can I prevent web browsers from making "If-Modified-Since" requests for cached files?

I'm trying to set up caching in nginx so that images will not need to be fetched repeatedly. It seems to be working except that the browser is still making a request for each file with an If-Modified-Since header. The server then responds with 304 Not Modified and the actual file isn't transferred again. I can see how this is desired behavior in a lot of cases but in my particular situation it's fine for the files to be up to a week out of date and I would prefer to skip the delay introduced by the extra requests.
Is it possible to add cache headers that tell the browser to always automatically use the cached version until the expiration is reached? My current nginx config is
expires 7d;
add_header Pragma public;
add_header Cache-Control "public";
From researching this more it appears that these requests at the browsers discretion and it doesn't have anything to do with the nginx configuration. If you do a ctrl-f5 refresh in Chrome then it will re-request all files, regardless of whether or not they're cached. If you do just f5 then it will send the requests with the 'If-Modified-Since' header and then only update the files if they've been changed (this is what I was doing). When you simply click a link to a page then the cached files aren't requested.

at which extensions caches the browser a page

We have a web application.
Until now we had no real cache handling strategy.
When we had a new version of certain JavaScript files, we instructed our users to clear their browser cache.
Now we want to change this.
Up to this date our starting page was "start_app.html".
In our effort to implement our cache busting strategy we want to ensure that the browser will NOT cache our starting page.
We will change the extension from ".html" into ".php".
It seems that the browser has an array of extensions, when he ALWAYS fetches a fresh copy from the web server, like "php", "asp", and so on.
Is this true and which extensions are these?
Thanks alot in advance
Please don't rely on incorrect browser behavior to not cache your page. Instead, use this header:
Cache-Control: no-cache, no-store
This page has all the details as to why that header will do what you want.

Can't the browser just use its cache from prior ajax calls?

I am trying to rely upon the browser cache to hold
JSON data returned from AJAX calls in jQuery.
Normal browser activity relies upon the browser cache all the time.
Example: jpg and gif images are not refetched on a page reload.
But when I try using jQuery getJSON ajax calls, I cannot seem to avoid fetching the data from the server.
My returned headers look like this (confirmed with firebug):
Transfer-Encoding: chunked
Date: Wed, 05 Aug 2009 02:55:39 GMT
Content-Type: text/plain; charset=ISO-8859-1
Expires: Wed, 05 Aug 2009 03:55:39 GMT
Cache-Control: max-age=3600
Yet an immediate refresh of the page causes identical requests to hit the server.
I've seen several postings about avoiding caching behavior, which isn't what I need.
I've seen several postings about utilizing caching, but those all seem to rely
upon saving data in the DOM. I want something that behaves just like cached images do during a page reload.
Cant the browser just fetch it from it's own cache?
--x--x--x--x UPDATE --x--x--x--
Much to my disappointment, several respectable folks agree that this isn't just possible.
Some even argue that it shouldn't be (which still baffles me).
Stubburn to a fault, I tried the following:
I set the Etag header on all outgoing pages I want to be cached
(I pick a few choice URL arguments that represent the data I'm requesting and just use that for the Etag value)
At the beginning of the next request, I simply check if the 'If-None-Match' header is in the request. If so, then the browser isn't caching the request like I wanted, so I sent a 304 Not Modified response.
Testing shows that Firefox won't cache my request
(but I can still avoid the 'fetch the expensive data' part of my cgi),
while IE6 will actually cache it (and wont even attempt fetching back from the server).
It's not a pretty answer, but it's working for me for now
(those pesty full-page refreshes of graph data wont be so slow or expensive now).
(What? I'm running IE6! OMG! Oh look a squirrel!)
Ajax caching is possible and predictable (at least in IE and Firefox).
This blog post discusses Ajax caching and has a demo web page:
http://blog.httpwatch.com/2009/08/07/ajax-caching-two-important-facts/
There's also a follow up by Steve Souders on the F5 issue:
http://stevesouders.com/tests/ajax_caching.php
The short answer is no. Unfortunately, browsers do not reliably cache AJAX requests in the same way that they do "normal" pages. (Although the data may in fact be cached, the browser often doesn't use the cache when handling AJAX requests the way you would expect.) This should change in the future, but for now you have to work around it.
You may want to check your resources using the Resource Expert Droid, to make sure they’re doing what you intend. You should also run a network trace to double-check the request and response headers, using something like Wireshark, in case Firebug isn’t telling the full story.
It’s possible that jQuery is including some request headers in a way that the browser decides should bypass the cache. Have you tried a plain XMLHTTPRequest without a framework?
While not the "browser cache" what about session state or some other form of client side saving. You will still have to look into an if modified since situation as mentioned in your comment.
The browser won't natively know if the data has been changed or not since json did retrieve dynamically and what is in the cache is static. I think?
Found a relevant link here after author claimed ajax browser caches are indeed reliable.
claim found here: http://davidwalsh.name/cache-ajax
linked to here: http://ajaxref.com/ch6/builtincache.html

IE6 and Caching

It seems that IE6 ignores any form of cache invalidation sent via http headers, I've tried setting Pragma to No Cache and setting Cache Expiration to the current time, yet in IE6, hitting back will always pull up a cached version of a page I am working on.
Is there a specific HTTP Header that IE6 does listen too?
Cache-Control: private, max-age=0 should fix it. From classic ASP this is done with Response.Expires=-1.
Keep in mind when testing that just because your server is serving pages with caching turned off doesn't mean that the browser will obey that when it has an old cached page that it was told was okay to cache. Clear the cache or use F5 to force that page to be reloaded.
Also, for those cases where the server is serving cached content it you can use Ctrl+F5 to signal the server not to serve it from cache.
You must be careful. If you are using AJAX via XMLHttpRequest (XHR), cache "recommendations" set in the header are not respected by ie6.
The fix is to use append a random number to the url queries used in AJAX requests. For example:
http://test.com?nonce=0123
A good generator for this is the UTC() function that returns a unique timestame for the user's browser... that is, unless they mess with their system clock.
Have you tried setting an ETag in the header? They're a pretty reliable way to indicate that content has changed w3c Spec & Wikipedia
Beyond that, a little more crude way is to append a random query string parameter to the request, such as the current unix timestamp. As I said, crude, but then IE6 is not the most subtle of beasts
see Question: Making sure a webpage is not cached, across all browsers. How to control web page caching, across all browsers? I think this should help out with your problem too.
Content with "Content-Encoding: gzip" Is Always Cached Although You Use "Cache-Control: no-cache"
http://support.microsoft.com/kb/321722
You could also disable gzip just for IE6
A little note: By experience I know that IE6 will load Javascript from cache even if forced to reload the page via Ctrl-F5. So if you are working on Javascript always empty the cache.
The IE web developer toolbar can help immensely with this. There's a button for clearing the cache.

Resources