Browser caching of static resources (js, css): is it a real problem? - caching

I wonder whether caching static resources by browsers (pretty much fresh: IE8, FF 3.6) is a real problem of development web application (when from time to time fresh version of webapp is going live and development continues).
Taking into account that serving static content by server is correct in terms of appropriate headers (last-modified, etags and etc.) and response codes (304 when not modified and 200 with body when changes exist).
Can be there any situations when serving html is fresh, while static still taken from browser cache?

Unless you provide an Expires header, the browser should check for a new version each time the content is loaded, so it shouldn't be a problem (assuming the server gives the correct response).
But to be absolutely sure, you can give each version of your javascript/css a different filename, and change the filename in the HTML when you update. Then when the browser loads the HTML it will have to load the correct resources.

Related

Send an entire web app as 1 HTTP response (html, js, css, images, ...)

Traditionally a browser will parse HTML and then send further requests to the server for all related data. This seems like inefficient to me, since it might require a large number of requests, even though my server already knows that a browser that wants to use this web application will need all of it's resources.
I know that js and css could be inlined, but that complicates server side code and img data as base64 bloats the size of the data... I'm aware as well that rendering can start before all assets are downloaded, which would potentially no longer work (depending on the implementation). I still feel that streaming an entire application in one go should be faster on slow connections than making tens of requests separately.
Ideally I would like the server to stream an entire directory into one HTTP response.
Does any model for this exist?
Does the reasoning make sense?
ps: If browser support for this is completely lacking, I'm wondering about a 2 step approach. Download a small JavaScript which downloads a compressed web app file, extracts it and plugs the resources into the page. Is anyone already doing something like this?
Update
I found one: http://blog.another-d-mention.ro/programming/read-load-files-from-zip-in-javascript/
I started to research related issues in order to find the way to get best results with what seems possible without changing web standards, and I wondered about caching. If I could send the last modified date of every subresource of a page along with the initial HTML page, a browser could avoid asking if modified headers once it has loaded every resource at least once. This would in effect be better than to send all resources with the initial request, since that would be beneficial only on the first load, and detrimental on subsequent loads, since it would be better for browsers to use their cache (as Barmar pointed out).
Now it turns out that even with a web extension you can not get hold of the if-modified-since header and so you surely can't tell the browser to use the cached version instead of contacting the server.
I then found this post from Facebook on how they tried to reduce traffic by hashing their static files and giving them a 1 year expiry date. This would mean that the url garantuees the content of the file. They still saw plenty of unnecessary if-modified-since requests and they managed to convince Firefox and Chrome to change the behaviour of their reload buttons to no longer reload static resources. For Firefox this requires a new cache-control: immutable header, for Chrome it doesn't.
I then remembered that I had seen something like that before and it turns out there is a solution for this problem which is more convenient than hashing the contents of resources and serving them from a database for at least ten years. It is to just a new version number in the filename. The even more convenient solution would be to just add a version query string, but it turns out that that doesn't always work.
Admittedly, changing your filenames all the time is a nuisance, because files referencing these files also need to change. However the files don't actually need to change. If you control the server it might be as simple as writing a redirect rule to make sure that logo.vXXXX.png will be redirected to logo.png (where XXXX is the last modified timestamp in seconds since epoch)[1]. Now let your template system automatically generate the timestamp, like in wordpress' wp_enqueue_script. WordPress actually satisfies itself with the query string technique. Now you can set the expiration date to a far future and use the immutable cache header. If browsers respect the cache control, you can now safely ignore etags and if-modified-since headers, since they are now completely redundant.
This solution guarantees the browser shall never ask for cache validation and yet you shall never see a stale resource, without having to decide on the expiry date in advance.
It doesn't answer the original question here about how to avoid having to do multiple requests to fetch the resources on the same page on a clean cache, but ever after (as long as the browser cache doesn't get cleared), you're good! I suppose that's good enough for me.
[1] You can even avoid the server overhead of checking the timestamp on every resource every time a page references it by using the version number of your application. In debug mode, for development, one can use the timestamp to avoid having to bump the version on every modification of the file.

How do scripts and modules get cached?

I understand for assets like images, there is a src associated with them, which means that a browser will check the expiration date of that asset before making a new request for that asset at the src to download that asset again and then render it onto the page. How does this work with a script or module such as React? If it is a CDN, does the browser first download the script and then run it the very first time it encounters the script? And then every time after that when it needs this script again, does it just load it from its cached (instead of downloading it again from the source) and running it? Is this the same thing that happens if you have React as a node module?
This is a very large topic, the basic answer is that browsers will cache assets how you tell them too. You mention that images have expiration dates, these dates are set in HTTP headers sent by the server. You can set the same headers for javascript and any other files you request from a server and the browser will cache them the same way.
After a javascript asset is fetched (from the server or the cache), the browser parses and runs your javascript.
Node modules live in node land. Usually, before you can use code in node_modules in the browser you run it through a tool like webpack or browersify. These tools bundles ALL the code (your application + react + whatever else) into one file (usually), which is then served to browser. The browser doesn't know anything about node_modules. It just parses and runs the javascript you provided.
The one bundled file is cached based on the headers it was sent with. A CDN is (basically) just a special server optimized in serving assets quickly.

How firefox fetches correct data from Browser Cache

Once we open a link in a new tab in Firefox, the data corresponding to that web page(static or dynamic) gets stored in Browser Cache. Then, when we switches at that tab again, it extracts data of that page from Cache(not requesting from the server of that site) and paints it at the frame buffer of the screen.
I want to know that how Firefox fetches this data in correct sequence?
What kind of mapping does the Firefox uses to extract the page data from its Cache?
Firefox (like any other browser) uses heuristics to decide when and what to cache. This is assuming no caching information is included in the resources. When no caching information is provided, Firefox might still decide to cache the files for certain period of time.
If you want to avoid Firefox to cache your resources altogether, you must include the following response header on your resources:
Cache-Control:no-cache, no-store
Now, the exact algorithm that Firefox uses to fetch from cache I don't think is public. Maybe somebody from Mozilla is able to answer this.

Force Browser Caching Across Browser Sessions

I help maintain several Wordpress-based websites that publish news and reference information.
We have been working hard to make pages at the websites load as fast as possible.
One of the things we've done is implement very long "max-age" times in the "cache-control" http headers for most of our static files, such as images and css files.
The particular cache-control setting we're using is "public, max-age=31536000". 31,536,000 seconds is 365 days.
The upside is that this setting does, in fact, cause the static files to be cached as visitors browse through different pages of our sites.
But here's the rub. This cache-control setting doesn't do much for us across browser sessions. Even though the setting is supposed to tell the browser "cache this file for an entire year", if a visitor to our site shuts down their browser, then starts it up just five minutes later and comes back to our site, the browser insists on re-loading all the static files, even though it still has them in its cache.
I've checked this carefully in Firefox, viewing the headers with Live HTTP Headers. But I can also qualitatively see the same thing happening in other browsers.
Apparently, browsers insist on re-loading all content for a website if the content hasn't been loaded once during the current browser session.
So ... Is there any way we can "politely suggest" to browsers that they always load cached content from the cache, even if the browser hasn't been to our site during the current browser session?
Check the ETag, Expires, and Last-Modified headers as well.
You need an Expires header, and sometimes ETag and Last-Modified can defeat caching.

Lazy HTTP caching

I have a website which is displayed to visitors via a kiosk. People can interact with it. However, since the website is not locally hosted, and uses an internet connection - the page loads are slow.
I would like to implement some kind of lazy caching mechanism such that as and when people browse the pages - the pages and the resources referenced by the pages get cached, so that subsequent loads of the same page are instant.
I considered using HTML5 offline caching - but it requires me to specify all the resources in the manifest file, and this is not feasible for me, as the website is pretty large.
Is there any other way to implement this? Perhaps using HTTP caching headers? I would also need some way to invalidate the cache at some point to "push" the new changes to the browser...
The usual approach to handling problems like this is with HTTP caching headers, combined with smart construction of URLs for resources referenced by your pages.
The general idea is this: every resource loaded by your page (images, scripts, CSS files, etc.) should have a unique, versioned URL. For example, instead of loading /images/button.png, you'd load /images/button_v123.png and when you change that file its URL changes to /images/button_v124.png. Typically this is handled by URL rewriting over static file URLs, so that, for example, the web server knows that /images/button_v124.png should really load the /images/button.png file from the web server's file system. Creating the version numbers can be done by appending a build number, using a CRC of file contents, or many other ways.
Then you need to make sure that, wherever URLs are constructed in the parent page, they refer to the versioned URL. This obviously requires dynamic code used to construct all URLs, which can be accomplished either by adjusting the code used to generate your pages or by server-wide plugins which affect all text/html requests.
Then, you then set the Expires header for all resource requests (images, scripts, CSS files, etc.) to a date far in the future (e.g. 10 years from now). This effectively caches them forever. This means that all requests loaded by each of your pages will be always be fetched from cache; cache invalidation never happens, which is OK because when the underlying resource changes, the parent page will use a new URL to find it.
Finally, you need to figure out how you want to cache your "parent" pages. How you do this is a judgement call. You can use ETag/If-None-Match HTTP headers to check for a new version of the page every time, which will very quickly load the page from cache if the server reports that it hasn't changed. Or you can use Expires (and/or Max-Age) to reload the parent page from cache for a given period of time before checking the server.
If you want to do something even more sophisticated, you can always put a custom proxy server on the kiosk-- in that case you'd have total, centralized control over how caching is done.

Resources