Cache-control Immutable Header - caching

I was reading about immutable header and i came across with this article saying that:
Cache-Control: max-age=365000000, immutable
When a client supporting immutable sees this attribute it should
assume that the resource, if unexpired, is unchanged on the server and
therefore should not send a conditional revalidation for it (e.g.
If-None-Match or If-Modified-Since) to check for updates. Correcting
possible corruption (e.g. shift reload in Firefox) never uses
conditional revalidation and still makes sense to do with immutable
objects if you're concerned they are corrupted.
source
I cant understand this phrase "if unexpired, is unchanged on the server and therefore should not send a conditional revalidation"
Client, by default doesnt send a revalidation until the max-age is expired.
So whats the point define immutable in the first place?

People pressing the refresh button.
Facebook, who first proposed this immutable cache-control directive, have a good post on this about how it saved them a huge amount of requests, including this quote:
The problem with reloads
The browser’s reload button exists to allow the user to get an updated
version of the current page. In order to meet this goal, when you
reload, browsers revalidate the page that you are currently on, even
if that page hasn’t expired yet. However, they also go a step further
and revalidate all sub-resources on the page — things like images and
JavaScript files.

Related

Skip SSR in NextJS when I already have the data cached and not stale in the client

I'm using NextJS with TanStack Query (formerly ReactQuery). TanStack Query acts as a cache between my NextJS app and the data stored in the backend.
I was previously doing SSR only, but I'm complementing it with TanStack Query for optimistic updates. I previously needed to fetch data on getServerSideProps for every "detail" page, but now I'm thinking that I could skip some of those fetches since I already have the data in the cache and it's still fresh.
For example. Let's say we have a TODO app. When I visit /todo/id_1 for the first time, it's nice to have SSR to send the page already rendered to the client. If I go somewhere else, and come back to /todo/id_1, I know for a fact that the contents of that TODO hasn't changed, but I still need to go through SSR.
Would there be a way to skip SSR in that case?
I was hoping I could to something like the following:
<Link href={`/todo/${id}`} skipSsr={cachedTodo[id].notStale} />
NextJs will always call getServerSideProps if the url changes. As mentioned in the comments, shallow routing does not work if the url actually changes.
I think there are a couple of ways around this:
set a cache-control: max-age on the response with the time of caching that you want. That way, the request will be made, but it will not hit your server, but come from the browser cache instead. As an advantage, those fetches will also succeed while you're offline.
instruct next to not make the query if the request comes from a client transition. There is an open discussion about this:
Add option to disable getServerSideProps on client-side navigation
shallow routing doesn't really solve it, so it seems you have to make the request and then check if it comes from SSR or not. This comment has the best workaround I guess ?
use incremental static site regeneration. It basically makes your page static, but it revalidates after a certain time if a request comes in.

Send an entire web app as 1 HTTP response (html, js, css, images, ...)

Traditionally a browser will parse HTML and then send further requests to the server for all related data. This seems like inefficient to me, since it might require a large number of requests, even though my server already knows that a browser that wants to use this web application will need all of it's resources.
I know that js and css could be inlined, but that complicates server side code and img data as base64 bloats the size of the data... I'm aware as well that rendering can start before all assets are downloaded, which would potentially no longer work (depending on the implementation). I still feel that streaming an entire application in one go should be faster on slow connections than making tens of requests separately.
Ideally I would like the server to stream an entire directory into one HTTP response.
Does any model for this exist?
Does the reasoning make sense?
ps: If browser support for this is completely lacking, I'm wondering about a 2 step approach. Download a small JavaScript which downloads a compressed web app file, extracts it and plugs the resources into the page. Is anyone already doing something like this?
Update
I found one: http://blog.another-d-mention.ro/programming/read-load-files-from-zip-in-javascript/
I started to research related issues in order to find the way to get best results with what seems possible without changing web standards, and I wondered about caching. If I could send the last modified date of every subresource of a page along with the initial HTML page, a browser could avoid asking if modified headers once it has loaded every resource at least once. This would in effect be better than to send all resources with the initial request, since that would be beneficial only on the first load, and detrimental on subsequent loads, since it would be better for browsers to use their cache (as Barmar pointed out).
Now it turns out that even with a web extension you can not get hold of the if-modified-since header and so you surely can't tell the browser to use the cached version instead of contacting the server.
I then found this post from Facebook on how they tried to reduce traffic by hashing their static files and giving them a 1 year expiry date. This would mean that the url garantuees the content of the file. They still saw plenty of unnecessary if-modified-since requests and they managed to convince Firefox and Chrome to change the behaviour of their reload buttons to no longer reload static resources. For Firefox this requires a new cache-control: immutable header, for Chrome it doesn't.
I then remembered that I had seen something like that before and it turns out there is a solution for this problem which is more convenient than hashing the contents of resources and serving them from a database for at least ten years. It is to just a new version number in the filename. The even more convenient solution would be to just add a version query string, but it turns out that that doesn't always work.
Admittedly, changing your filenames all the time is a nuisance, because files referencing these files also need to change. However the files don't actually need to change. If you control the server it might be as simple as writing a redirect rule to make sure that logo.vXXXX.png will be redirected to logo.png (where XXXX is the last modified timestamp in seconds since epoch)[1]. Now let your template system automatically generate the timestamp, like in wordpress' wp_enqueue_script. WordPress actually satisfies itself with the query string technique. Now you can set the expiration date to a far future and use the immutable cache header. If browsers respect the cache control, you can now safely ignore etags and if-modified-since headers, since they are now completely redundant.
This solution guarantees the browser shall never ask for cache validation and yet you shall never see a stale resource, without having to decide on the expiry date in advance.
It doesn't answer the original question here about how to avoid having to do multiple requests to fetch the resources on the same page on a clean cache, but ever after (as long as the browser cache doesn't get cleared), you're good! I suppose that's good enough for me.
[1] You can even avoid the server overhead of checking the timestamp on every resource every time a page references it by using the version number of your application. In debug mode, for development, one can use the timestamp to avoid having to bump the version on every modification of the file.

Firefox incorrectly caches AJAX calls in BFCache ignoring caching headers

We have a page that makes AJAX calls to get a JSON file. The JSON file has a 'max-age=60' header.
On Firefox, the JSON file is incorrectly cached by BFCache beyond the 60 seconds specified by caching header. What is worse is that force-reloading (Shift+F5) doesn't help, as the JSON file is not retrieved from the server anymore.
This is a bug in Firefox, opened one year ago and still unresolved: https://bugzilla.mozilla.org/show_bug.cgi?id=1055024
Answers to this question How to force Firefox to bypass BFCache for Angular.JS partials? mention some workarounds that involve clearing the full cache or installing extensions. We have millions of users and it's not practical for us to ask them all to go through these steps.
Also, setting a 'unload' event in the page seems to disable BFCache completely for the page. This is also not an optimum solution for us, because we would like the whole page to benefit from BFCache speedup, and still see fresh JSON content honoring the max-age=60 header.
Does anyone know if there are specific caching headers that will hint BFCache NOT to keep this particular file and fetch it from the server once the max-age period has elapsed?

RFC2616 13.3, Browser History and Caching

I've been trying to get my head around the whole issue of browser history Vs caching and RFC2616 13.13
Does this section of the RFC mean that if a user goes "Back" in the browser, for example, it should always display the page from it's local storage, ignoring any cache directives, unless the user has configured it otherwise?
So browsers that reload the page when navigating the history, even if caching directives are instructing it do so, are not complying with the specification? And the spec is saying this is bad because "this will tend to force service authors to avoid using HTTP expiration controls and cache controls when they would otherwise like to."
Also, even though a directive may instruct the broswer not to cache, e.g. using Cache-Control: no-store, it can/should store it in it's history cache?
From what I've read, it seems that most browsers violate the standard, apart from Opera. Is this because the security concerns around the re-display of pages with sensitive data from history are seen as more important than the issue the standard talks about?
I'd be grateful if anyone is able shed some light/clarification on this area, thanks.
History and cache are completely separate. We're trying to clarify this in httpbis; see https://svn.tools.ietf.org/svn/wg/httpbis/draft-ietf-httpbis/latest/p6-cache.html#history.lists

show "webpage has expired" on back button

What is the requirement for the browser to show the ubiquitous "this page has expired" message when the user hits the back button?
What are some user-friendly ways to prevent the user from using the back button in a webapp?
Well, by default whenever you're dealing with a form POST, and then the user hits back and then refresh then they'll see the message indicating that the browser is resubmitting data. But if the page is set to expire immediately then they won't even have to hit refresh and they'll see the page has expired message when they hit back.
To avoid both messages there are a couple things to try:
1) Use a form GET instead. It depends on what you're doing but this isn't always a good solution as there are still size restrictions on a GET request. And the information is passed along in the querystring which isn't the most secure of options.
-- or --
2) Perform a server-side redirect to a different page after the form POST.
Looks like a similar question was answered here:
Redirect with a 303 after POST to avoid "Webpage has expired": Will it work if there are more bytes than a GET request can handle?
As a third option one could prevent a user from going back in their browser at all. The only time I've felt a need to do this was to prevent them from doing something stupid such as paying twice. Although there are better server-side methods to handle that. If your site uses sessions then you can prevent them from paying twice by first disabling cache on the checkout page and setting it expire immediately. And then you can utilize a flag of some sort stored in a session which will actually change the behavior of the page if you go back to it.
you need to set pragma-cache control option in HTTP headers:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9
However, from the usability point of view, this is discouraged approach to the matter. I strongly encourage you to look for other options.
ps: as proposed by Steve, redirection via GET is the proper way (or check page movement with JS).
Try using the following code in the Page_Load
Response.Cache.SetCacheability(HttpCacheability.Private)
use one of the following before session_start:
session_cache_expire(60); // in minutes
ini_set('session.cache_limiter', 'private');
/Note:
Language is PHP
I'm not sure if this is standard practice, but I typically solve this issue by not sending a Vary header for IE only. In Apache, you can put the following in httpd.conf:
BrowserMatch MSIE force-no-vary
According to the RFC:
The Vary field value indicates the set
of request-header fields that fully
determines, while the response is
fresh, whether a cache is permitted to
use the response to reply to a
subsequent request without
revalidation.
The practical effect is that when you go "back" to a POST, IE simply gets the page from the history cache. No request at all goes to the server side. I can see this clearly in HTTPWatch.
I would be interested to hear potential bad side-effects of this approach.

Resources