How to serve cached pages to first time visitors? - caching

Is there a way to cache pages from previous visitors and then share that cashe to first time visitors?
I know this can't be done on client side but not sure about sever side of things.
I'm hoping you can point me in the right direction and maybe some resources as I can't find find much on this.

Sure, that's how page caching works in general. Your site code will do something like this:
look in the cache for this page
if (it's in the cache) {
serve it
} else {
generate the page
store the page in the cache
serve it
}
So, the very first visitor to the page will cause it to be cached, and then all subsequent visitors will get the cached version. This can be done at the application level (i.e., via code written by you or perhaps some library you're using) or at the server level, like with Squid.

Related

Website is different when I upload on FTP

I have just started developing for a few weeks now and I bought a domain, but when I upload the files on live, the website looks different than what I have uploaded. Now, this gets fixed when I clear my cache. The problem is that my visitors enter, they see the page in a way, and after I update it they see it as the previous version!
Is there any possible solution for this? I don't want my visitors to clear cache every time I make a change on my website!
This is quite probable to be due to css cache. Your server is loading a cached version. You can specify the cached time in a few ways. Etags and htaccess (on apache) are the most common.
A very simple trick is just to add at the end of your style link url (where you load your main style in the head of the document) a get-like parameter: just like this:
main.css?v=2

Employing a CDN for a dynamic website

I have a website forum where users exchange photos and text with one another on the home page. The home page shows 20 latest objects - be they photos or text. The 21st object is pushed out out of view. A new photo is uploaded every 5 seconds. A new text string is posted every second. In around 20 seconds, a photo that appeared at the top has disappeared at the bottom.
My question is: would I get a performance improvement if I introduced a CDN in the mix?
Since the content is changing, it seems I shouldn't be doing it. However, when I think about it logically, it does seem I'll get a performance improvement from introducing a CDN for my photos. Here's how. Imagine a photo is posted, appearing on the page at t=1 and remaining there till t=20. The first person to access the page (closer to t=1) will enable to photo to be pulled to an edge server. Thereafter, anyone accessing the photo will be receiving it from the CDN; this will last till t=20, after which the photo disappears. This is a veritable performance boost.
Can anyone comment on what are the flaws in my reasoning, and/or what am I failing to consider? Also would be good to know what alternative performance optimizations I can make for a website like mine. Thanks in advance.
You've got it right. As long as someone accesses the photo within the 20 seconds that the image is within view it will be pulled to an edge server. Then upon subsequent requests, other visitors will receive a cached response from the nearest edge server.
As long as you're using the CDN for delivering just your static assets, there should be no issues with your setup.
Additionally, you may want to look into a CDN which supports HTTP/2. This will provide you with improved performance. Check out cdncomparison.com for a comparison between popular CDN providers.
You need to consider all requests hitting your server, which includes the primary dynamically generated HTML document, but also all static assets like CSS files, Javascript files and, yes, image files (both static and user uploaded content). An HTML document will reference several other assets, each of which needs to be downloaded separately and thus incurs a server hit. Assuming for the sake of argument that each visitor has an empty local cache, a single page load may incur, say, ~50 resource hits for your server.
Probably the only request which actually needs to be handled by your server is the dynamically generated HTML document, if it's specific to the user (because they're logged in). All other 49 resource requests are identical for all visitors and can easily be shunted off to a CDN. Those will just hit your server once [per region], and then be cached by the CDN and rarely bother your server again. You can even have the CDN cache public HTML documents, e.g. for non-logged in users, you can let the CDN cache HTML documents for ~5 seconds, depending on how up-to-date you want your site to appear; so the CDN can handle an entire browsing session without hitting your server at all.
If you have roughly one new upload per second, that means there is likely a magnitude more passive visitors per second. If you can let a CDN handle ~99% of requests, that's a dramatic reduction in actual hits to your server. If you are clever with what you cache and for how long and depending on your particular mix of anonymous and authenticated users, you can easily reduce server loads by a magnitude or two. On the other side, you're speeding up page load times accordingly for your visitors.
For every single HTML document and other asset, really think whether this can be cached and for how long:
For HTML documents, is the user logged in? If no, and there's no other specific cookie tracking or similar things going on, then the asset is static and public for all intents and purposes and can be cached. Decide on a maximum age for the document and let the CDN cache it. Even caching it for just a second makes a giant difference when you get 1000 hits per second.
If the user is logged in, set the cache pragma to private, but still let the visitor's browser cache it for a few seconds. These headers must be decided upon by your forum software while it's generating the document.
For all other assets which aren't access restricted: let the CDN cache it for a long time and you can practically forget about ever having to serve those particular files ever again. These headers can be statically configured for entire directories in the web server.

cache manifest uncache

Now that I've successfully cached my web page, how do I uncache it after making a change?
My user can't dl the latest version, even after I've changed a comment in my cache.manifest file.
My server is an IIS server.
The thing with caching is, well, stuff gets cached. Browsers won't, in general, try to download anything you've told them to cache until the cached items expire.
If you set everything to cache for a certain time span, the browser won't try to download any of the cached items until the end of it, which includes the cache.manifest file itself, by the sound of it.
Typically, you don't want to cache the content of the website, because then that makes it hard to change. Instead, you want to cache the various pieces, like images, css, and javascript, that the various pages of your site need. If you do this right, you can get a huge benefit for your users, and still have control over those resources, since you can always link to a different version of a particular resource in the content of the pages.
That said, if you do need to cache some portions of your pages, you can use server-side caching to reuse portions that are expensive to put together.

Lazy HTTP caching

I have a website which is displayed to visitors via a kiosk. People can interact with it. However, since the website is not locally hosted, and uses an internet connection - the page loads are slow.
I would like to implement some kind of lazy caching mechanism such that as and when people browse the pages - the pages and the resources referenced by the pages get cached, so that subsequent loads of the same page are instant.
I considered using HTML5 offline caching - but it requires me to specify all the resources in the manifest file, and this is not feasible for me, as the website is pretty large.
Is there any other way to implement this? Perhaps using HTTP caching headers? I would also need some way to invalidate the cache at some point to "push" the new changes to the browser...
The usual approach to handling problems like this is with HTTP caching headers, combined with smart construction of URLs for resources referenced by your pages.
The general idea is this: every resource loaded by your page (images, scripts, CSS files, etc.) should have a unique, versioned URL. For example, instead of loading /images/button.png, you'd load /images/button_v123.png and when you change that file its URL changes to /images/button_v124.png. Typically this is handled by URL rewriting over static file URLs, so that, for example, the web server knows that /images/button_v124.png should really load the /images/button.png file from the web server's file system. Creating the version numbers can be done by appending a build number, using a CRC of file contents, or many other ways.
Then you need to make sure that, wherever URLs are constructed in the parent page, they refer to the versioned URL. This obviously requires dynamic code used to construct all URLs, which can be accomplished either by adjusting the code used to generate your pages or by server-wide plugins which affect all text/html requests.
Then, you then set the Expires header for all resource requests (images, scripts, CSS files, etc.) to a date far in the future (e.g. 10 years from now). This effectively caches them forever. This means that all requests loaded by each of your pages will be always be fetched from cache; cache invalidation never happens, which is OK because when the underlying resource changes, the parent page will use a new URL to find it.
Finally, you need to figure out how you want to cache your "parent" pages. How you do this is a judgement call. You can use ETag/If-None-Match HTTP headers to check for a new version of the page every time, which will very quickly load the page from cache if the server reports that it hasn't changed. Or you can use Expires (and/or Max-Age) to reload the parent page from cache for a given period of time before checking the server.
If you want to do something even more sophisticated, you can always put a custom proxy server on the kiosk-- in that case you'd have total, centralized control over how caching is done.

Clear all website cache?

Is it possible to clear all site cache? I would like to do this when the user logs out or the session expires instead of instructing the browser not to cache on each request.
As far as I know, there is no way to instruct the browser to clear all the pages it has cached for your site. The only control that you, as a website author, have over caching of a page occurs when the browser tries to access that page. You can specify that cached versions of your pages should expire at a certain time using the Expires header, but even then the browser won't actually clear the page from its cache at that time.
i certainly hope not - that would give the web site destructive powers over the client machine!
If security is your main concern here, why not use HTTPS? Browsers don't cache content received via HTTPS (or cache it only in memory).
One tricky way to mimic this would be to include the session-id as a parameter when referencing any static piece of content on the site. When the user establishes the session, the browser will recognize all the pieces of content as new due to the inclusion of this parameter. For the duration of the session the browser will used the static content in its cache. After the user logs out and logs back in again, the session-id parameter for the static contents will be different, so the browser will recognize this is as completely new content and will download everything again.
That being said... this is a hack and I wouldn't recommend pursuing it.. For what reason do you want the user's cache to be cleared after their session expires? There's probably a better solution that can fit your situation as opposed to what you are currently asking for.
If you are talking about asp.net cache objects, you can use this:
For Each elem As DictionaryEntry In Cache
Cache.Remove(elem.Key)
Next
to remove items from the cache, but that may not be the full-extent of what you are trying to accomplish.

Resources