Long term caching with webpack chunkhash and cache control - caching

I am using webpack to bundle all of my files. Inside webpack I use chunkhash, Md5Hash and manifest to produce unique chunkhash for each of my files that get download by browser. It would look like this.
styles.3840duiel348fdjh385idfki.css
bundle.488503289dfksdlkor93lfui.js
vendor.sdkkfuuewkf892377rfkjdle.js
image.dkkdiiue9984ujjkfld003kfpp.png
This means that browser can cache them and if the hash is not changed it will use the cached version.At the same time I can for example just change styles and only that hash will be changed so when I deploy the app browser will download only the new styles.
The problem is that on my server i use this:
Cache-Control: public max-age=31536000
This represents aggressive caching and the browser will use cached version for one year unless URL, file name is changed. When I for example change styles my hash is changed and browser should request the new styles from server. That is according to this article(pattern 1) and a few more that I found https://jakearchibald.com/2016/caching-best-practices/
My problem is that when I update something, example styles, hash for styles is changed and I deploy that. The browser will not request new styles unles I hit reload while on my page. It will server the cached files. How can I fix this?
I can use Cache-Control: no-cache but that is not a solution because than browser has to check with server every time if he can use cached version. That is 4 http requests that are not needed every time someone visits the page.

The way I solved this is by adding one more number (which is Date.now()) into my file names as below.
filename: [name].${Date.now().valueOf()}.[chunkhash].js
This works pretty reliably for foreseeable time. The only drawback I see with this method is that: With each release, this forces all the bundles to be refreshed.
However, There are cases when chunkhash is rendered to create problem is when the modules do not change but their order changes (and hence the module id). The module id and order is not part of chunkhash! Please refer: https://github.com/webpack/webpack/issues/1856
Other alternatives is to use named module (which I believe will have some performance impact) besides may probably involve naming my modules which I am not sure of.

The browser is only instructed to load your assets when your page loads... That is just the way it works. You can poll or use any kind of push to the browser to detect backend changes and then force your users to refresh their browser.
This is not a caching issue, not a browser issue... just the fact that as soon as you have info it might already be outdated. You'll have to live with it or create a workaround.
See it as a calendar next to the coffee machine at your company which has an event 'party' this saturday. You see it and run to your coworkers to tell them the great news. Meanwhile the person who wrote the event on the calendar realized he has made a mistake and changes the event to next saturday. You don't have this new information so you will provide your coworkers with wrong information as long as you do not go for another coffee. The only way to know about the changes is if someone would notify you sooner than you go for another coffee... for example the person who wrote the event on the calendar sends an email to everyone to say sorry for his mistake and that the party is scheduled next saturday.

I did it like this. I use different webpack config for my client and server bundle. For my client bundle that has bundle.js, vendor.js, styles.css... I use chunkhash and Cache-Control: public max-age=31536000 as described in my question. For my server bundle that serves html(ejs) I use Cache-Control: no-cache. This works good because browser will contact server on every reload but that will be only one http request. If nothing is changed browser uses the cached version of all the assets since chunkhash in html didn't changed.
If for example I update styles and chunkhash is changed browser will see that when in contacts the server on reload and only new styles will be downloaded.

Related

Send an entire web app as 1 HTTP response (html, js, css, images, ...)

Traditionally a browser will parse HTML and then send further requests to the server for all related data. This seems like inefficient to me, since it might require a large number of requests, even though my server already knows that a browser that wants to use this web application will need all of it's resources.
I know that js and css could be inlined, but that complicates server side code and img data as base64 bloats the size of the data... I'm aware as well that rendering can start before all assets are downloaded, which would potentially no longer work (depending on the implementation). I still feel that streaming an entire application in one go should be faster on slow connections than making tens of requests separately.
Ideally I would like the server to stream an entire directory into one HTTP response.
Does any model for this exist?
Does the reasoning make sense?
ps: If browser support for this is completely lacking, I'm wondering about a 2 step approach. Download a small JavaScript which downloads a compressed web app file, extracts it and plugs the resources into the page. Is anyone already doing something like this?
Update
I found one: http://blog.another-d-mention.ro/programming/read-load-files-from-zip-in-javascript/
I started to research related issues in order to find the way to get best results with what seems possible without changing web standards, and I wondered about caching. If I could send the last modified date of every subresource of a page along with the initial HTML page, a browser could avoid asking if modified headers once it has loaded every resource at least once. This would in effect be better than to send all resources with the initial request, since that would be beneficial only on the first load, and detrimental on subsequent loads, since it would be better for browsers to use their cache (as Barmar pointed out).
Now it turns out that even with a web extension you can not get hold of the if-modified-since header and so you surely can't tell the browser to use the cached version instead of contacting the server.
I then found this post from Facebook on how they tried to reduce traffic by hashing their static files and giving them a 1 year expiry date. This would mean that the url garantuees the content of the file. They still saw plenty of unnecessary if-modified-since requests and they managed to convince Firefox and Chrome to change the behaviour of their reload buttons to no longer reload static resources. For Firefox this requires a new cache-control: immutable header, for Chrome it doesn't.
I then remembered that I had seen something like that before and it turns out there is a solution for this problem which is more convenient than hashing the contents of resources and serving them from a database for at least ten years. It is to just a new version number in the filename. The even more convenient solution would be to just add a version query string, but it turns out that that doesn't always work.
Admittedly, changing your filenames all the time is a nuisance, because files referencing these files also need to change. However the files don't actually need to change. If you control the server it might be as simple as writing a redirect rule to make sure that logo.vXXXX.png will be redirected to logo.png (where XXXX is the last modified timestamp in seconds since epoch)[1]. Now let your template system automatically generate the timestamp, like in wordpress' wp_enqueue_script. WordPress actually satisfies itself with the query string technique. Now you can set the expiration date to a far future and use the immutable cache header. If browsers respect the cache control, you can now safely ignore etags and if-modified-since headers, since they are now completely redundant.
This solution guarantees the browser shall never ask for cache validation and yet you shall never see a stale resource, without having to decide on the expiry date in advance.
It doesn't answer the original question here about how to avoid having to do multiple requests to fetch the resources on the same page on a clean cache, but ever after (as long as the browser cache doesn't get cleared), you're good! I suppose that's good enough for me.
[1] You can even avoid the server overhead of checking the timestamp on every resource every time a page references it by using the version number of your application. In debug mode, for development, one can use the timestamp to avoid having to bump the version on every modification of the file.

Website is different when I upload on FTP

I have just started developing for a few weeks now and I bought a domain, but when I upload the files on live, the website looks different than what I have uploaded. Now, this gets fixed when I clear my cache. The problem is that my visitors enter, they see the page in a way, and after I update it they see it as the previous version!
Is there any possible solution for this? I don't want my visitors to clear cache every time I make a change on my website!
This is quite probable to be due to css cache. Your server is loading a cached version. You can specify the cached time in a few ways. Etags and htaccess (on apache) are the most common.
A very simple trick is just to add at the end of your style link url (where you load your main style in the head of the document) a get-like parameter: just like this:
main.css?v=2

How to cache bust in Dart

I'm developing a web app in Dart, packaged in tomcat 6 as a deployable .war. This app is used by a bunch of clients, all with Google Chrome.
Every time I republish a new version, every single client must clear his browser cache before seeing the updated files: this is very annoying and I can't find any solution other than broadcast a mail to everyone "Please clear the browser cache".
The desirable solution is not a complete cache disable but that the browser keeps caching all stuff to be the quicker it could, and that I can control this at my wish.
I'm not sure what your question is about exactly.
There is nothing specific to Dart. Caching is handled by the browser depending on the expires headers the server returns with a response to a request.
What you can do is something like explained here Force browser to clear cache or Forcing cache expiration from a JavaScript file, and make the client application poll the server frequently for updates and then redirect to the new URL. You could implement some kind of redirection on the server or ignore the version URL query parameter, to be able to actually keep the same names of the resources.
Another possibility could be to use AppCache and serve the manifest file with immediate expiration. When you have an updated version modify the manifest file which makes the client reload the resources listed in the manifest (https://stackoverflow.com/a/13107058/217408, https://developer.mozilla.org/en-US/docs/Web/HTML/Using_the_application_cache, http://alistapart.com/article/application-cache-is-a-douchebag#section4).

How do I set caching headers for my CSS/JS but ensure visitors always have the latest versions?

I'd like to speed up my site's loading time in part by ensuring all CSS/JS is being cached by the browser, as recommend by Google's PageSpeed tool. But I'd like to ensure that visitors have the latest CSS/JS files, if they are updated and the cache now contains old code.
From my research so far, appending something like "?459454" to the end of the CSS/JS url is popular. But wouldn't that force the visitor's browser to re-download the CSS/JS file every time?
Is there a way to set the files to be cached by the browser, but ensure the browser knows about updated versions of the cached files?
If you're using Apache, you can use mod_pagespeed (mentioned earlier by symcbean) to do this automatically.
It would work best if you also use the ModPagespeedLoadFromFile directive since that will create a new URL as soon as it detects that the resource has changed on disk, however it will work fine without that (it will use the cache expiry time returned when it fetches the resource to rewrite it).
If you're using nginx, you could use ngx_pagespeed.
If you're using IIS, you could use IISpeed, which is not a Google product and I don't know it's full feature set.
Version numbers will work, but you can also append a hash of the file to the filename with your web framework or asset build script:
<script src="script-5054a101c8b164cbfa570d97fe23cc0d.js"></script>
That way, once your HTML changes to reflect this new version, browsers will just download and cache the updated version of your script.
As you say, append a query string to the URL of the asset, but only change it if the content is different, or change it when you deploy a new version.
appending something like "?459454" to the end of the CSS/JS url is popular. But wouldn't that force the visitor's browser to re-download the CSS/JS file every time?
No it won't force them to download each time, however there are a lot of intermediate proxies out there which ignore query strings on cacheable content - hence many tools (including mod_pagespeed which does automatic url rewriting based on file conents, and content merging on the fly along with lots of other cool tricks) move the version information into the path / filename.
If you've only got .htaccess type access then you can strip the version information out to map direct to a file, or use a scripted 404 redirector (but this is probably only a good idea if you're behind a caching reverse proxy).

client-side file caching

If I understand correctly, a broswer caches images, JS files, etc. based on the file name. So there's a danger that if one such file is updated (on the server), the browser will use the cached copy instead.
A workaround for this problem is to rename all files (as part of the build), such that the file name includes an MD5 hash of it's contents, e.g.
foo.js -> foo_AS577688BC87654.js
me.png -> me_32126A88BC3456BB.png
However, in addition to renaming the files themselves, all references to these files must be changed. For exmaple a tag such as <img src="me.png"/> should be changed to <img src="me_32126A88BC3456BB.png"/>.
Obviously this can get pretty complicated, particularly when you consider that references to these files may be dynamically created within server-side code.
Of course, one solution is to completely disable caching on the browser (and any caches between the server and the browser) using HTTP headers. However, having no caching will create it's own set of problems.
Is there a better solution?
Thanks,
Don
The best solution seems to be to version filenames by appending the last-modified time.
You can do it this way: add a rewrite rule to your Apache configuration, like so:
RewriteRule ^(.+)\.(.+)\.(js|css|jpg|png|gif)$ $1.$3
This will redirect any "versioned" URL to the "normal" one. The idea is to keep your filenames the same, but to benefit from cache. The solution to append a parameter to the URL will not be optimal with some proxies that don't cache URLs with parameters.
Then, instead of writing:
<img src="image.png" />
Just call a PHP function:
<img src="<?php versionFile('image.png'); ?>" />
With versionFile() looking like this:
function versionFile($file){
$path = pathinfo($file);
$ver = '.'.filemtime($_SERVER['DOCUMENT_ROOT'].$file).'.';
echo $path['dirname'].'/'.str_replace('.', $ver, $path['basename']);
}
And that's it! The browser will ask for image.123456789.png, Apache will redirect this to image.png, so you will benefit from cache in all cases and won't have any out-of-date issue, while not having to bother with filename versioning.
You can see a detailed explanation of this technique here: http://particletree.com/notebook/automatically-version-your-css-and-javascript-files/
Why not just add a querystring "version" number and update the version each time?
foo.js -> foo.js?version=5
There still is a bit of work during the build to update the version numbers but filenames don't need to change.
Renaming your resources is the way to go, although we use a build number and embed that in to the file name instead of an MD5 hash
foo.js -> foo.123.js
as it means that all your resources can be renamed in a deterministic fashion and resolved at runtime.
We then use custom controls to generate links to resources at on page load based upon the build number which is stored in an app setting.
We followed a similar pattern to PJP, using Rails and Nginx.
We wanted user avatar images to be browser cached, but on an avatar's change we needed the cache to be invalidated ASAP.
We added a method to the avatar model to append a timestamp to the file name:
return "/images/#{sourcedir}/#{user.login}-#{self.updated_at.to_s(:flat_string)}.png"
In all places in the code where avatars were used, we referenced this method rather than an URL. In the Nginx configuration, we added this rewrite:
rewrite "^/images/avatars/(.+)-[\d]{12}.png" /images/avatars/$1.png;
rewrite "^/images/small-avatars/(.+)-[\d]{12}.png" /images/small-avatars/$1.png;
This meant if a file changed, its URL in the HTML changed, so the user's browser made a new request for the file. When the request reached Nginx, it got rewritten to the simple name of the file.
I would suggest using caching by ETags in this situation, see http://en.wikipedia.org/wiki/HTTP_ETag. You can then use the hash as the etag. A request will still be submitted for each resource, but the browser will only download items that have changed since last download.
Read up on your web server / platform docs on how to use etags properly, most decent platforms have built-in support.
Most modern browsers check the if-modified-since header whenever a cacheable resource is in a HTTP request. However, not all browsers support the if-modified-since header.
There are three ways to "force" the browser to load a cached resource.
Option 1 Create a query string with a version#. src="script.js?ver=21". The downside is many proxy servers wont cache a resource with query strings. It also requires site-wide updating for changes.
Option 2 Create a naming system for your files src="script083010.js". However the downside to option 1 is that this as well requires site-wide updates whenever a file changes.
Option 3 Perhaps the most elegant solution, simply set up the caching headers: last-modified and expires in your server. The main downside to this is users may have to recache resources because they expired yet never changed. Additionally, the last-modified header does not work well when content is being served from multiple servers.
Here a few resources to check out: Yahoo Google AskApache.com
This is really only an issue if your web server sets a far-future "Expires" header (setting something like ExpiresDefault "access plus 10 years" in your Apache config). Otherwise, a browser will make a conditional GET, based on the modified time and/or the Etag. You can verify what is happening on your site by using a web proxy or an extension like Firebug (on the Net panel). Your question doesn't mention how your web server is configured, and what headers it is sending with static files.
If you're not setting a far-future Expires header, there's nothing special you need to do. Your web server will usually handle conditional GETs for static files based on last modified time just fine. If you are setting a far-future Expires header then yes, you need to add some sort of version to the file name like your question and the other answers have mentioned already.
I have also been thinking about this for a site I support where it would be a big job to change all references. I have two ideas:
1.
Set distant cache expiry headers and apply the changes you suggest for the most commonly downloaded files. For other files set the headers so they expire after a very short time - eg. 10 minutes. Then if you have a 10 minute downtime when updating the application, caches will be refreshed by the time users go to the site. General site navigation should be improved as the files will only need downloading every 10 minutes not every click.
2.
Each time a new version of the application is deployed to a different context that contains the version number. eg. www.site.com/app_2_6_0/ I'm not really sure about this as users bookmarks would be broken on each update.
I believe that a combination of solutions works best:
Setting cache expiry dates for each type of resource (image, page, etc) appropreatly for that resource, for example:
Your static "About", "Contact" etc pages probably arn't going to change more than a few time a year, so you could easily put a cache time of a month on these pages.
Images used in these pages could have eternal cache times, as you are more likey to replace an image then to change one.
Avatar images might have an expiry time of a day.
Some resources need modified dates in their names. For example avatars, generated images, and the like.
Some things should never be caches, new pages, user content etc. In these cases you should cache on the server, but never on the client side.
In the end you need to carfully consider each type of resource to determine what cache time to instruct the browser to use, and always be conservitive if you are unsure. You can increase the time later, but it's much more pain to uncache something.
You might want to check out the approach taken by the grails "uiperformance" plugin, which you can find here. It does a lot of the things you mention, but automates them (set expiry time to a long time, then increments version numbers when files change).
So if you're using grails, you get this stuff for free. If you are not - maybe you can borrow the techniques employed.
Also - borrowed form the ui-performance page, - read the following 14 rules.
ETags seemingly provide a solution for this...
As per http://httpd.apache.org/docs/2.0/mod/core.html#fileetag, we can set the browser to generate ETags on file-size (instead of time/inode/etc). This generation should be constant across multiple server deployments.
Just enable it in (/etc/apache2/apache2.conf)
FileETag Size
& you should be good!
That way, you can simply reference your images as <img src='/path/to/foo.png' /> and still use all the goodness of HTTP caching.

Resources