at which extensions caches the browser a page - caching

We have a web application.
Until now we had no real cache handling strategy.
When we had a new version of certain JavaScript files, we instructed our users to clear their browser cache.
Now we want to change this.
Up to this date our starting page was "start_app.html".
In our effort to implement our cache busting strategy we want to ensure that the browser will NOT cache our starting page.
We will change the extension from ".html" into ".php".
It seems that the browser has an array of extensions, when he ALWAYS fetches a fresh copy from the web server, like "php", "asp", and so on.
Is this true and which extensions are these?
Thanks alot in advance

Please don't rely on incorrect browser behavior to not cache your page. Instead, use this header:
Cache-Control: no-cache, no-store
This page has all the details as to why that header will do what you want.

Related

RFC2616 13.3, Browser History and Caching

I've been trying to get my head around the whole issue of browser history Vs caching and RFC2616 13.13
Does this section of the RFC mean that if a user goes "Back" in the browser, for example, it should always display the page from it's local storage, ignoring any cache directives, unless the user has configured it otherwise?
So browsers that reload the page when navigating the history, even if caching directives are instructing it do so, are not complying with the specification? And the spec is saying this is bad because "this will tend to force service authors to avoid using HTTP expiration controls and cache controls when they would otherwise like to."
Also, even though a directive may instruct the broswer not to cache, e.g. using Cache-Control: no-store, it can/should store it in it's history cache?
From what I've read, it seems that most browsers violate the standard, apart from Opera. Is this because the security concerns around the re-display of pages with sensitive data from history are seen as more important than the issue the standard talks about?
I'd be grateful if anyone is able shed some light/clarification on this area, thanks.
History and cache are completely separate. We're trying to clarify this in httpbis; see https://svn.tools.ietf.org/svn/wg/httpbis/draft-ietf-httpbis/latest/p6-cache.html#history.lists

Force Browser Caching Across Browser Sessions

I help maintain several Wordpress-based websites that publish news and reference information.
We have been working hard to make pages at the websites load as fast as possible.
One of the things we've done is implement very long "max-age" times in the "cache-control" http headers for most of our static files, such as images and css files.
The particular cache-control setting we're using is "public, max-age=31536000". 31,536,000 seconds is 365 days.
The upside is that this setting does, in fact, cause the static files to be cached as visitors browse through different pages of our sites.
But here's the rub. This cache-control setting doesn't do much for us across browser sessions. Even though the setting is supposed to tell the browser "cache this file for an entire year", if a visitor to our site shuts down their browser, then starts it up just five minutes later and comes back to our site, the browser insists on re-loading all the static files, even though it still has them in its cache.
I've checked this carefully in Firefox, viewing the headers with Live HTTP Headers. But I can also qualitatively see the same thing happening in other browsers.
Apparently, browsers insist on re-loading all content for a website if the content hasn't been loaded once during the current browser session.
So ... Is there any way we can "politely suggest" to browsers that they always load cached content from the cache, even if the browser hasn't been to our site during the current browser session?
Check the ETag, Expires, and Last-Modified headers as well.
You need an Expires header, and sometimes ETag and Last-Modified can defeat caching.

HTML 5 Cache Manifest Vs. Etags, Expires or cache-control header

Can someone explain to me how HTML 5's cache manifest differs from using other file header techniques for telling the browser to cache the file?
I feel strange posting an answer to a question that you have asked, commented and answered yourself but I think that nearly two years of your absolute monopoly in this topic is enough. ;)
The main differences between the HTML5 cache manifest vs. the traditional HTTP headers:
for the cache manifest you need support in the browser
for the HTTP headers you also need support in the browser of course but it's more universal
you have more control over the caching with cache manifest
your website or Web app can work correctly offline with no connection at all
you can have two version of every resource - for offline and online usage
The last point is very handy and lets you easily swap parts of your website that need connection with eg. placeholders containing optional comments that the user doesn't get full functionality without the connection or whatever you want.
For the support see the Compatibility table for support of offline web applications in desktop and mobile browsers. Not surprisingly IE has some problems like always, currently Opera Mini doesn't support it, so I would suggest that if you use cache manifests then to also use the traditional HTTP headers (both HTTP/1.1 Cache-Control and HTTP/1.0 Expires - see RFC 2616 sec. 14.9.3).
You have more control over the whole caching process in your JavaScript, eg. you can use the window.applicationCache.swapCache() method to force an update of the cached version of your website without the need for manually reloading the page. There are some nice code examples on HTML5 Rocks (links below) explaining how to update users to the newest version of your website or Web application.
Keep in mind that you need to serve your cache manifest with correct HTTP headers, specifically the Content-Type and headers related to caching so that your browser knows that it's a cache manifest and that it should always be checked for new versions. This is for example how Github serves cache manifests for GitHub Pages:
Content-Type: text/cache-manifest
Cache-Control: max-age=0
Expires: [CURRENT TIME]
where [CURRENT TIME] is the current GMT time in the correct format (see RFC 2616 sec. 3.3).
Here are some resources that will get you started:
A Beginner's Guide to Using the Application Cache on HTML5 Rocks
Using the application cache on Mozilla Developer Network
Cache manifest in HTML5 on Wikipedia
Offline Web Applications W3C Working Group Note
Offline Web applications at WHATWG
See also my recent answers to those related questions:
Force browser to clear cache
Determining a page is outdated on github pages
I 'believe' that the primary difference between regular disk cache and the new html5 offline cache is that when working offline (or without internet connection), traditional disk cache would not be used or available to render the page, whereas the offline cache will.

Clear all website cache?

Is it possible to clear all site cache? I would like to do this when the user logs out or the session expires instead of instructing the browser not to cache on each request.
As far as I know, there is no way to instruct the browser to clear all the pages it has cached for your site. The only control that you, as a website author, have over caching of a page occurs when the browser tries to access that page. You can specify that cached versions of your pages should expire at a certain time using the Expires header, but even then the browser won't actually clear the page from its cache at that time.
i certainly hope not - that would give the web site destructive powers over the client machine!
If security is your main concern here, why not use HTTPS? Browsers don't cache content received via HTTPS (or cache it only in memory).
One tricky way to mimic this would be to include the session-id as a parameter when referencing any static piece of content on the site. When the user establishes the session, the browser will recognize all the pieces of content as new due to the inclusion of this parameter. For the duration of the session the browser will used the static content in its cache. After the user logs out and logs back in again, the session-id parameter for the static contents will be different, so the browser will recognize this is as completely new content and will download everything again.
That being said... this is a hack and I wouldn't recommend pursuing it.. For what reason do you want the user's cache to be cleared after their session expires? There's probably a better solution that can fit your situation as opposed to what you are currently asking for.
If you are talking about asp.net cache objects, you can use this:
For Each elem As DictionaryEntry In Cache
Cache.Remove(elem.Key)
Next
to remove items from the cache, but that may not be the full-extent of what you are trying to accomplish.

IE6 and Caching

It seems that IE6 ignores any form of cache invalidation sent via http headers, I've tried setting Pragma to No Cache and setting Cache Expiration to the current time, yet in IE6, hitting back will always pull up a cached version of a page I am working on.
Is there a specific HTTP Header that IE6 does listen too?
Cache-Control: private, max-age=0 should fix it. From classic ASP this is done with Response.Expires=-1.
Keep in mind when testing that just because your server is serving pages with caching turned off doesn't mean that the browser will obey that when it has an old cached page that it was told was okay to cache. Clear the cache or use F5 to force that page to be reloaded.
Also, for those cases where the server is serving cached content it you can use Ctrl+F5 to signal the server not to serve it from cache.
You must be careful. If you are using AJAX via XMLHttpRequest (XHR), cache "recommendations" set in the header are not respected by ie6.
The fix is to use append a random number to the url queries used in AJAX requests. For example:
http://test.com?nonce=0123
A good generator for this is the UTC() function that returns a unique timestame for the user's browser... that is, unless they mess with their system clock.
Have you tried setting an ETag in the header? They're a pretty reliable way to indicate that content has changed w3c Spec & Wikipedia
Beyond that, a little more crude way is to append a random query string parameter to the request, such as the current unix timestamp. As I said, crude, but then IE6 is not the most subtle of beasts
see Question: Making sure a webpage is not cached, across all browsers. How to control web page caching, across all browsers? I think this should help out with your problem too.
Content with "Content-Encoding: gzip" Is Always Cached Although You Use "Cache-Control: no-cache"
http://support.microsoft.com/kb/321722
You could also disable gzip just for IE6
A little note: By experience I know that IE6 will load Javascript from cache even if forced to reload the page via Ctrl-F5. So if you are working on Javascript always empty the cache.
The IE web developer toolbar can help immensely with this. There's a button for clearing the cache.

Resources