How do I stop Opera from caching a page? - caching

I am trying to get Opera to re-request a page every time instead of just serving it from the cache. I'm sending the 'Cache-control: no-cache' and 'Pragma: no-cache' response headers but it seems as if Opera is just ignoring these headers. It works fine in other browsers - Chrome, IE, Firefox.
How do I stop Opera from caching pages? What I want to be able to do is have Opera re-request a page when the user clicks the Back button on the browser.

As a user, I absolutely detest pages that slow down my history navigation by forcing re-loads when I use the back button. (If the browser you use on a daily basis paid attention to the various caching directives and let them affect history navigation the way you want as a developer you'd probably notice some sites slowing down yourself...)
If you have a very strong use case for doing this I'd say your architecture might be "wrong" in some sense - for example, if you're switching between different "views" of constantly updating data and thus want to enforce re-load when users go back perhaps using Ajaxy techniques for loading the constantly changing data into the current page would be better?
Opera's implementation is on purpose - "caching" is seen as conceptually different from "history navigation", the former is more about storing things on disk and between sessions, the latter is switching back to a temporarily hidden page you just visited, in the state you left it.
However, if you really, really need it there is a loophole in this policy that enables the behaviour you want. Sending "Cache-control: must-revalidate" will force Opera to re-load every page on navigation, but only if you're sending the page over https. (This is a feature requested by and intended for paranoid banks, it slows down way too many normal sites if applied on http).

It sounds like your problem is related to this answer. After testing your header and the suggested headers, I could only reproduce your expected behavior in Internet Explorer.

SIMPLE SERVERSIDE CACHE CONTROL WITHOUT HEADERS OR FRONTEND SCRIPTS
Zero Dependency, Universal Language Edition
You can force re-caching globally without using a header by appending an md5 or sha1 checksum to your filename.
That way it will cache if it is an exact match, and otherwise treat it like a new resource.
Works in all browsers
Validates as strict HTML5 (originally did not, but this has been updated. Untested for XHTML, but probably not valid for that)
Does not require extra headers
Keeps frontend concerns and backend concerns nicely decoupled.
Does not require client side sanity checks or source validation.
Anything that can print html can do this consistently, including static content
If not static, easy to extend runtime control to end users (with authentication, if desired) that allows for simple page flags to determine minified, prettified, or debug source being returned.
Entirely encapsulates client cache control in the content serving mechanism, which makes things super simple to maintain.
As a side perk, introduces versioned client-side caching automatically by deferring to the checksums the browser has cached, which can be useful if you have alternate versions and need to unit test a release package to determine it's minimum stable dependency versions or something.
You don't ever have to fiddle with your browser to get the caching not to interfere with your development process again.
This approach also can be used for versioned images, video, audio, pdfs, etc. Pretty much any resource that is served as static data will operate similarly, cache on the first request for the content, and persist automatically without further consideration if the file does not change.
This is RFC valid markup. Notice the script and link tags have a get string:
?checksum=ba411cafee2f0f702572369da0b765e2
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Client Cache Control Example</title>
<meta name="description" content="You're only going to cache this when the content changes, and always when the content changes.">
<meta name="author" content="https://stackoverflow.com/users/1288121/mopsyd">
<!-- Example Stylesheet -->
<link rel="stylesheet" href="css/styles.css?checksum=ba411cafee2f0f702572369da0b765e2">
<!-- Example Script -->
<script src="js/scripts.js?checksum=ba411cafee2f0f702572369da0b765e2"></script>
</head>
<body>
</body>
</html>
The GET string ?checksum=ba411cafee2f0f702572369da0b765e2 refers to either an MD5 or SHA1 hash of the filesize of the resource. It can be obtained through a command line, language construct, or by hashing it from the value of the Content-Length: header. You then construct your href or src attribute by appending it as a GET string to the filename.
This browser will interpret these as distinct, and cache separately.
The server will ignore the GET parameter if it is a static resource, but if it is served dynamically, then the GET parameter will be available to the interpreting language.
This means that whenever that hash changes in the links, the browser will cache that specific version independently one time, and then keep it until forever, or Expires: goes by, whichever is sooner.
Since the checksum is a direct reflection of the filesize, you can set Expires: to forever and it doesn't make much difference. You will still see your changes immediately as soon as that file changes even a single byte.
Generate your css or js source with whatever utilities you normally do.
Run an md5 or sha1 checksum on the filesize at runtime if you are serving dynamically, and at compiletime if you are generating static content (like ApiGen docs, for example).
Serve the normal file with the hash as a GET string appended to the filename (eg: styles.css becomes styles.css?checksum=ba411cafee2f0f702572369da0b765e2)
Any change in the file forces a recache, which means you see the real value reflected immediately.
Optional, but rad: An additional benefit of this approach is that you can easily set up a dev GET flag, which will make ALL frontend source resolve to prettified dev source with any of your own custom debug functionality enabled, or use it to interpret versioning flags. You can do a redundant check to make sure that flag is only passed from a known development IP address, proxy authentication, etc. by the server and otherwise is not honored if you need it secure. I usually divide my frontend source up whenever possible similar to this:
This is what it is doing on live right now (minified production, cached, default, ?checksum=ba411cafee2f0f702572369da0b765e2).
This is what it ought to be doing on live right now, prettified enough for me to read (prettified production, never cached, ?debug_pretty_source=true).
This is what I use to figure out what isn't doing what it ought to on live if it exists in both of the previous (prettified with debug enabled, never cached, ACL/whitelist authorized, ?debug_dev_enable=true or similar).
You can apply the same principle to package releases by using version numbers instead of checksums, provided your versions don't change. Checksums are less readable but easier to automate and keep in sync with exact changes, but version suffixes are useful for testing package stability also, provided the version number reflects an immutable resource.

Found this whilst searching for solution. No joy, so wrote some javascript to solve the problem which may be of use to others.
In <HEAD> above any other javascript:
<script>
if( typeof(opera) != 'undefined' ) { // only do for Opera
if (window.name == 'previously_loaded') { // will be "" before page is loaded
alert('Reloading Page from Server'); // for testing
window.name = ''; // prevent multiple reload
window.location.reload(true);
}
}
</script>
Now change window name so Opera detects it on subsequent load from cache:
window.name = 'previously_loaded';
Insert this line in one of your js blocks that wont be executed during “window load” causing infinite reload. For me there was no need to refresh the page unless someone has exited by a link, so I just added it to my onclick/onunload function.
Before and after demos here with a few more notes. I intend to add it to my blog. I've only a few late versions of Opera, so I would appreciate some tries of the demo before I get egg on my face.
Edit: Just realised that if a later visited site changes window name (its persistent) then back-tab reload wont happen. Just alter above if statement to:
if (window.name != "") {
Demo worked fine when open in multiple tabs; but I vaguely recollect that window names should be unique; so I've altered the demo to generate a unique name.
window.name = new Date().getTime();

Related

Send an entire web app as 1 HTTP response (html, js, css, images, ...)

Traditionally a browser will parse HTML and then send further requests to the server for all related data. This seems like inefficient to me, since it might require a large number of requests, even though my server already knows that a browser that wants to use this web application will need all of it's resources.
I know that js and css could be inlined, but that complicates server side code and img data as base64 bloats the size of the data... I'm aware as well that rendering can start before all assets are downloaded, which would potentially no longer work (depending on the implementation). I still feel that streaming an entire application in one go should be faster on slow connections than making tens of requests separately.
Ideally I would like the server to stream an entire directory into one HTTP response.
Does any model for this exist?
Does the reasoning make sense?
ps: If browser support for this is completely lacking, I'm wondering about a 2 step approach. Download a small JavaScript which downloads a compressed web app file, extracts it and plugs the resources into the page. Is anyone already doing something like this?
Update
I found one: http://blog.another-d-mention.ro/programming/read-load-files-from-zip-in-javascript/
I started to research related issues in order to find the way to get best results with what seems possible without changing web standards, and I wondered about caching. If I could send the last modified date of every subresource of a page along with the initial HTML page, a browser could avoid asking if modified headers once it has loaded every resource at least once. This would in effect be better than to send all resources with the initial request, since that would be beneficial only on the first load, and detrimental on subsequent loads, since it would be better for browsers to use their cache (as Barmar pointed out).
Now it turns out that even with a web extension you can not get hold of the if-modified-since header and so you surely can't tell the browser to use the cached version instead of contacting the server.
I then found this post from Facebook on how they tried to reduce traffic by hashing their static files and giving them a 1 year expiry date. This would mean that the url garantuees the content of the file. They still saw plenty of unnecessary if-modified-since requests and they managed to convince Firefox and Chrome to change the behaviour of their reload buttons to no longer reload static resources. For Firefox this requires a new cache-control: immutable header, for Chrome it doesn't.
I then remembered that I had seen something like that before and it turns out there is a solution for this problem which is more convenient than hashing the contents of resources and serving them from a database for at least ten years. It is to just a new version number in the filename. The even more convenient solution would be to just add a version query string, but it turns out that that doesn't always work.
Admittedly, changing your filenames all the time is a nuisance, because files referencing these files also need to change. However the files don't actually need to change. If you control the server it might be as simple as writing a redirect rule to make sure that logo.vXXXX.png will be redirected to logo.png (where XXXX is the last modified timestamp in seconds since epoch)[1]. Now let your template system automatically generate the timestamp, like in wordpress' wp_enqueue_script. WordPress actually satisfies itself with the query string technique. Now you can set the expiration date to a far future and use the immutable cache header. If browsers respect the cache control, you can now safely ignore etags and if-modified-since headers, since they are now completely redundant.
This solution guarantees the browser shall never ask for cache validation and yet you shall never see a stale resource, without having to decide on the expiry date in advance.
It doesn't answer the original question here about how to avoid having to do multiple requests to fetch the resources on the same page on a clean cache, but ever after (as long as the browser cache doesn't get cleared), you're good! I suppose that's good enough for me.
[1] You can even avoid the server overhead of checking the timestamp on every resource every time a page references it by using the version number of your application. In debug mode, for development, one can use the timestamp to avoid having to bump the version on every modification of the file.

Are <meta HTTP-EQUIV> cache settings supported by any modern browsers?

Yes, I know headers are better. But we've all dealt with that system where we want something to be cached (HTML only in this case as it's a tag) like so:
<meta http-equiv="Cache-Control" content="max-age=200" />
It does not appear to work when I test it casually. Is there any way to get a document to be cached for, say, 200 seconds without access to an .htaccess file or a programming language?
I know it's not ideal, but it's occasionally functional. I was hoping there would be a way to denote a particular directory with some sort of simple rules in the HTML cache manifest. No dice.
IE has partial (and buggy) support for using a META tag to prevent caching, but this doesn't allow you to specify a non-zero freshness lifetime.
Specifying the freshness lifetime using the Cache-Control response header is absolutely the right way to go.

Clear cookies and cache from site

Every time i update my website system UI/Jquery,
users complain that things are not working for them and that they have bugs.
Users are internet/computer dummies so they don't know how to clear the cookies or the cache of the browser, so i need to connect to each one of their computers and do it myself.
I spend lots off hours doing it and they always complain.
Some of the users use Chrome, some Firefox.
Googled and found no solution for this.
Is there any client code operation that will command the browser to clear its cache
or even pop up browser window which will ask user to confirm the clear?
Regarding cache clearing: No, there isn't.
What you can do, however, is configure your web server to correctly serve expiration and cache validity headers for your content. (How to do this depends on your web server.)
You can also use "cache busting" versioned URLs. Instead of using, let's say,
<script src="script.js">
you can "version" the URL like this:
<script src="script.js?2012-12-03-13-06">
<!-- or instead of dates any other versioning scheme you like -->
and when said script is updated, also increment/change the query parameter accordingly. This should cause browsers to consider the script as new as the URL isn't found in the cache.
document.cookie = '';
With browsers that allow entering js code in the address bar you can simply make a shortcut lets say a favourite with the following code as a url:
javascript:document.cookie = '';
If you'd like to clear cache you can use meta tags not to cache the site, though caching is conciderable.
<META HTTP-EQUIV="CACHE-CONTROL" CONTENT="NO-CACHE">

Firefox not re-validating static files from child frames

I'm working with an application that has an iframe - both the outer html body and the frame require certain javascript and CSS files. To cut down on load times, all these static files have expiry set to a year from now and essentially should be loaded from cache for normal page hits - which is expected behavior in IE8 and FF3.6
However, once I reload/refresh (F5) the page, I expect the browsers to send an 'If-Updated-Since' requests to the server for these files. IE8 sends the requests for all the files used outside as well as within the iframe. But, FF3.6 only sends the requests for files used outside (not for the files used within the iframe, it just loads those from cache!).
The response headers are exactly the same for all files regardless of whether they are in the iframe or not. Is there a reason for this behavior of FF? Any way to avoid it?
Note: I can append version parameters to the source, or add a version folder in the path, etc. But, I want to know if this quirk can be avoided/has a good reason behind it?
Firefox behaves correctly - the server indicated that the scripts are good for a year so there is no reason to send pointless requests which waste time, bandwidth and server resources. For debugging purposes you can keep the Shift key pressed while clicking the Reload button, it will make sure that all data is refreshed. However, for end users adding the version information to the URL (e.g. http://example.com/.../script.js?version=1.2.3) is probably the best solution. This makes sure that the cached version can be used as long as it is valid and the new version is downloaded as soon as you update the script.

client-side file caching

If I understand correctly, a broswer caches images, JS files, etc. based on the file name. So there's a danger that if one such file is updated (on the server), the browser will use the cached copy instead.
A workaround for this problem is to rename all files (as part of the build), such that the file name includes an MD5 hash of it's contents, e.g.
foo.js -> foo_AS577688BC87654.js
me.png -> me_32126A88BC3456BB.png
However, in addition to renaming the files themselves, all references to these files must be changed. For exmaple a tag such as <img src="me.png"/> should be changed to <img src="me_32126A88BC3456BB.png"/>.
Obviously this can get pretty complicated, particularly when you consider that references to these files may be dynamically created within server-side code.
Of course, one solution is to completely disable caching on the browser (and any caches between the server and the browser) using HTTP headers. However, having no caching will create it's own set of problems.
Is there a better solution?
Thanks,
Don
The best solution seems to be to version filenames by appending the last-modified time.
You can do it this way: add a rewrite rule to your Apache configuration, like so:
RewriteRule ^(.+)\.(.+)\.(js|css|jpg|png|gif)$ $1.$3
This will redirect any "versioned" URL to the "normal" one. The idea is to keep your filenames the same, but to benefit from cache. The solution to append a parameter to the URL will not be optimal with some proxies that don't cache URLs with parameters.
Then, instead of writing:
<img src="image.png" />
Just call a PHP function:
<img src="<?php versionFile('image.png'); ?>" />
With versionFile() looking like this:
function versionFile($file){
$path = pathinfo($file);
$ver = '.'.filemtime($_SERVER['DOCUMENT_ROOT'].$file).'.';
echo $path['dirname'].'/'.str_replace('.', $ver, $path['basename']);
}
And that's it! The browser will ask for image.123456789.png, Apache will redirect this to image.png, so you will benefit from cache in all cases and won't have any out-of-date issue, while not having to bother with filename versioning.
You can see a detailed explanation of this technique here: http://particletree.com/notebook/automatically-version-your-css-and-javascript-files/
Why not just add a querystring "version" number and update the version each time?
foo.js -> foo.js?version=5
There still is a bit of work during the build to update the version numbers but filenames don't need to change.
Renaming your resources is the way to go, although we use a build number and embed that in to the file name instead of an MD5 hash
foo.js -> foo.123.js
as it means that all your resources can be renamed in a deterministic fashion and resolved at runtime.
We then use custom controls to generate links to resources at on page load based upon the build number which is stored in an app setting.
We followed a similar pattern to PJP, using Rails and Nginx.
We wanted user avatar images to be browser cached, but on an avatar's change we needed the cache to be invalidated ASAP.
We added a method to the avatar model to append a timestamp to the file name:
return "/images/#{sourcedir}/#{user.login}-#{self.updated_at.to_s(:flat_string)}.png"
In all places in the code where avatars were used, we referenced this method rather than an URL. In the Nginx configuration, we added this rewrite:
rewrite "^/images/avatars/(.+)-[\d]{12}.png" /images/avatars/$1.png;
rewrite "^/images/small-avatars/(.+)-[\d]{12}.png" /images/small-avatars/$1.png;
This meant if a file changed, its URL in the HTML changed, so the user's browser made a new request for the file. When the request reached Nginx, it got rewritten to the simple name of the file.
I would suggest using caching by ETags in this situation, see http://en.wikipedia.org/wiki/HTTP_ETag. You can then use the hash as the etag. A request will still be submitted for each resource, but the browser will only download items that have changed since last download.
Read up on your web server / platform docs on how to use etags properly, most decent platforms have built-in support.
Most modern browsers check the if-modified-since header whenever a cacheable resource is in a HTTP request. However, not all browsers support the if-modified-since header.
There are three ways to "force" the browser to load a cached resource.
Option 1 Create a query string with a version#. src="script.js?ver=21". The downside is many proxy servers wont cache a resource with query strings. It also requires site-wide updating for changes.
Option 2 Create a naming system for your files src="script083010.js". However the downside to option 1 is that this as well requires site-wide updates whenever a file changes.
Option 3 Perhaps the most elegant solution, simply set up the caching headers: last-modified and expires in your server. The main downside to this is users may have to recache resources because they expired yet never changed. Additionally, the last-modified header does not work well when content is being served from multiple servers.
Here a few resources to check out: Yahoo Google AskApache.com
This is really only an issue if your web server sets a far-future "Expires" header (setting something like ExpiresDefault "access plus 10 years" in your Apache config). Otherwise, a browser will make a conditional GET, based on the modified time and/or the Etag. You can verify what is happening on your site by using a web proxy or an extension like Firebug (on the Net panel). Your question doesn't mention how your web server is configured, and what headers it is sending with static files.
If you're not setting a far-future Expires header, there's nothing special you need to do. Your web server will usually handle conditional GETs for static files based on last modified time just fine. If you are setting a far-future Expires header then yes, you need to add some sort of version to the file name like your question and the other answers have mentioned already.
I have also been thinking about this for a site I support where it would be a big job to change all references. I have two ideas:
1.
Set distant cache expiry headers and apply the changes you suggest for the most commonly downloaded files. For other files set the headers so they expire after a very short time - eg. 10 minutes. Then if you have a 10 minute downtime when updating the application, caches will be refreshed by the time users go to the site. General site navigation should be improved as the files will only need downloading every 10 minutes not every click.
2.
Each time a new version of the application is deployed to a different context that contains the version number. eg. www.site.com/app_2_6_0/ I'm not really sure about this as users bookmarks would be broken on each update.
I believe that a combination of solutions works best:
Setting cache expiry dates for each type of resource (image, page, etc) appropreatly for that resource, for example:
Your static "About", "Contact" etc pages probably arn't going to change more than a few time a year, so you could easily put a cache time of a month on these pages.
Images used in these pages could have eternal cache times, as you are more likey to replace an image then to change one.
Avatar images might have an expiry time of a day.
Some resources need modified dates in their names. For example avatars, generated images, and the like.
Some things should never be caches, new pages, user content etc. In these cases you should cache on the server, but never on the client side.
In the end you need to carfully consider each type of resource to determine what cache time to instruct the browser to use, and always be conservitive if you are unsure. You can increase the time later, but it's much more pain to uncache something.
You might want to check out the approach taken by the grails "uiperformance" plugin, which you can find here. It does a lot of the things you mention, but automates them (set expiry time to a long time, then increments version numbers when files change).
So if you're using grails, you get this stuff for free. If you are not - maybe you can borrow the techniques employed.
Also - borrowed form the ui-performance page, - read the following 14 rules.
ETags seemingly provide a solution for this...
As per http://httpd.apache.org/docs/2.0/mod/core.html#fileetag, we can set the browser to generate ETags on file-size (instead of time/inode/etc). This generation should be constant across multiple server deployments.
Just enable it in (/etc/apache2/apache2.conf)
FileETag Size
& you should be good!
That way, you can simply reference your images as <img src='/path/to/foo.png' /> and still use all the goodness of HTTP caching.

Resources