Now that I've successfully cached my web page, how do I uncache it after making a change?
My user can't dl the latest version, even after I've changed a comment in my cache.manifest file.
My server is an IIS server.
The thing with caching is, well, stuff gets cached. Browsers won't, in general, try to download anything you've told them to cache until the cached items expire.
If you set everything to cache for a certain time span, the browser won't try to download any of the cached items until the end of it, which includes the cache.manifest file itself, by the sound of it.
Typically, you don't want to cache the content of the website, because then that makes it hard to change. Instead, you want to cache the various pieces, like images, css, and javascript, that the various pages of your site need. If you do this right, you can get a huge benefit for your users, and still have control over those resources, since you can always link to a different version of a particular resource in the content of the pages.
That said, if you do need to cache some portions of your pages, you can use server-side caching to reuse portions that are expensive to put together.
Related
It's very important for the website I'm working on to be offline-functional. I'm using a Cache Manifest to store all the files on the application cache, so that takes care of that and all is good and well.
BUT, as I read and noticed myself, the browser first shows the cached version of the site before checking for an update online. Hitting refresh reloads the cache again, with the new cached files this time (or what it had time to update for the swift refreshers).
I'm aware of this fix : http://www.html5rocks.com/en/tutorials/appcache/beginner/, where the user is told an update is available and is asked to refresh the page. Not a bad method, but still sketchy for user experience.
Is there any other way to force the browser to show the most up to date files if online? Would cache busting all files manually AND using a cache manifest fix this problem, or will it conflict with the cache manifest and cause problem to the offline functionality?
I found something that works well for me:
The URL linking to the web page contains a parameter. If there is ever a change to the page or related files, the url is changed to something like this: http:/ /www.mywebsite.com/mypage.html?v=3 where v=3 is changed depending on updates.
This is a longer fix to implement (finding every page affected by a change & changing all their cache busting links), but the pages at least show what they're supposed to on the first load and the cache manifest still load the update for offline viewing.
Is it possible that if user with a full browser cache comes to my website and some my resources required to be cached (via Cache-Control header), then browser will remove some old (but still valid, i.e. non-expired) items from the cache in order to make it possible to cache my resources?
If there is no outdated resources in user's cache, will the browser ignore my cache policy?
If the cache is full the browser has to find room for new content somewhere.
A sane approach to this would be to
1) first remove all expired contents.
2) second remove oldest (based on time since last visit) contents
But, to be honest with you I have never looked deep into this issue because I find it counter productive to rely on cached contents. Many browsers have a "delete all cache on exit" button. Or the users run programs like CCleaner to remove temporary files. Cache is basically considered temporary files.
I suppose you could try writing a server log analyzer that would determine whether specific resources are being reloaded at a time when companion resources are not being loaded. This would involve separating the header-only call that is used to determine "age" of the resource vs. the full download of the resource. Again mostly speculation since, I don't rely on cache being there AND I don't use cache as a tracking device.
For an (enterprise) web project i want to keep previous versions of the static files so that projects can decide for themselves when they are ready to implement design changes. My initial plan is to provide folders for static content like so:
company.com/static/1.0.0/
company.com/static/1.0.0/css/
company.com/static/1.0.0/js/
company.com/static/1.0.0/images/
company.com/static/2.0.0/
company.com/static/2.0.0/css/
company.com/static/2.0.0/js/
company.com/static/2.0.0/images/
Each file in these folders should then have a cache-policy to cache "forever" -- one year at least. I also plan to concatenate css files and js files into one, in order to minimize number of requests.
Then i would also provide a current folder (which symlinks to the latest released version)
company.com/static/current/
company.com/static/current/css/
company.com/static/current/js/
company.com/static/current/images/
This will solve my first problem (that projects and sub websites can lock their code to a certain version and can upgrade whenever they are ready).
But then I can see some caching issue. Now i cannot "just" cache current folder, since it will change for each release. What should my caching policies be on that folder.
Also, for each release, most of the static files will never change anyway. Is it relevant to cache them forever, and rename if there are changes?
I am looking for advice here, since i want to know about your best trade-off between caching and changing the files.
Beware of HTTP caching. I looked into this some time ago.
my blog article on the HTTP caching
There are three approaches you can select from:
Use resource's path as a cache key, i.e. when it changed - the browser will have to download new version of your resources. In this case you don't need /current folder at all, you just need to avoid .html page caching and put appropriate path to your resources in it.
You can point browser to /current folders only and add ETag to your resources, in this case another server request will be made from the client, but it will be conditional request (i.e. with If-None-Match header), so you can return 304 response (with no resource body) until your customer decide to migrate to another version. Another drawback of such solution (if you have several customers who use different versions) is that /current folder will contain only some single version of the design.
As you're going to concatenate resources into single files, you can specify resource version as part of url: /current/js/combined.js?version=1.0.0.0 But this is not much different from first approach.
Hope this helps.
It might be worth your while looking at how Google, Microsoft etc. have implemented the caching policies for their jQuery CDNs
Your policy of caching forever is OK for the versioned URLs.
For the current URLs you're obviously going to need a shorter expiry time.
Couple of things to consider:
How are the applications going to be able to test against /current/ i.e. if they use it how do you know a change isn't going to break an existing application?
Caching forever is only really about reducing requests during the 'current session' as most browser caches aren't big enough to hold files for a long time (they get removed as people browse others sites)
I have a website which is displayed to visitors via a kiosk. People can interact with it. However, since the website is not locally hosted, and uses an internet connection - the page loads are slow.
I would like to implement some kind of lazy caching mechanism such that as and when people browse the pages - the pages and the resources referenced by the pages get cached, so that subsequent loads of the same page are instant.
I considered using HTML5 offline caching - but it requires me to specify all the resources in the manifest file, and this is not feasible for me, as the website is pretty large.
Is there any other way to implement this? Perhaps using HTTP caching headers? I would also need some way to invalidate the cache at some point to "push" the new changes to the browser...
The usual approach to handling problems like this is with HTTP caching headers, combined with smart construction of URLs for resources referenced by your pages.
The general idea is this: every resource loaded by your page (images, scripts, CSS files, etc.) should have a unique, versioned URL. For example, instead of loading /images/button.png, you'd load /images/button_v123.png and when you change that file its URL changes to /images/button_v124.png. Typically this is handled by URL rewriting over static file URLs, so that, for example, the web server knows that /images/button_v124.png should really load the /images/button.png file from the web server's file system. Creating the version numbers can be done by appending a build number, using a CRC of file contents, or many other ways.
Then you need to make sure that, wherever URLs are constructed in the parent page, they refer to the versioned URL. This obviously requires dynamic code used to construct all URLs, which can be accomplished either by adjusting the code used to generate your pages or by server-wide plugins which affect all text/html requests.
Then, you then set the Expires header for all resource requests (images, scripts, CSS files, etc.) to a date far in the future (e.g. 10 years from now). This effectively caches them forever. This means that all requests loaded by each of your pages will be always be fetched from cache; cache invalidation never happens, which is OK because when the underlying resource changes, the parent page will use a new URL to find it.
Finally, you need to figure out how you want to cache your "parent" pages. How you do this is a judgement call. You can use ETag/If-None-Match HTTP headers to check for a new version of the page every time, which will very quickly load the page from cache if the server reports that it hasn't changed. Or you can use Expires (and/or Max-Age) to reload the parent page from cache for a given period of time before checking the server.
If you want to do something even more sophisticated, you can always put a custom proxy server on the kiosk-- in that case you'd have total, centralized control over how caching is done.
Background
I'm building an app that links recent
web pages you've visited together.
To do this, I need to get the HTML
for recent URLs using Cocoa.
Right now, I'm using an invisible
WebView to do this.
As I understand it, if the URL isn't
in the cache for my app, this is
hitting web servers.
What I want
The chances are high that the URL I'm grabbing has already been cached by Safari as the page has already been visited.
I want my app to check Safari's cache for the URL first. If it's there, it should just use this data. If not, it should hit the web server and store the page in my app's cache.
I don't really want to have to parse the cache.db file from Safari using sqlite3 - I've no idea if this format will stay the same. I'm after something simpler and more high level.
What I've tried
I know that you can set up your own NSURLCache using the method initWithMemoryCapacity:diskCapacity:diskPath: but I don't want to try pointing this to the Safari cache in case it screws up Safari by writing to it.
Is there an easy, high level way of sharing the Safari cache?
UPDATE
Aha. I've just realised there may be a way to do this I've been missing.
I could make a new instance of NSURLCache with initWithMemoryCapacity:diskCapacity:diskPath:, point it at the Safari cache, then specify a cache policy of NSURLRequestReturnCacheDataDontLoad for the URL Request when loading the page.
When this fails, I could just try and load the page as normal. I'll try this out and update the question when I know more.
To be honest, you just can't do this.
Firstly, I'm pretty certain -[NSURLCache initWithMemoryCapacity:diskCapacity:diskPath:] won't work as you expect. It will instead blow away the old cache file to create its own; potentially highly upsetting Safari.
Secondly NSURLCache is a composite cache. That is, it caches data first in memory, and then moves it out to disk at some point. So even if you could properly access Safari's cache file (which you can't) you'd only be able to access the older cached data; not the most recent.