For a web app, what's the best way to ensure users get the latest code (avoiding cache issues)? - caching

There have been cases where things are getting cached and users have to manually clear the cache in order to use the latest version of the app. What's the best strategy for dealing with this?

There are a whole bunch of HTTP headers which deal with caching behaviour.
Configuring this correctly is possibly more an art than a science - without any caching, your site will feel very slow to end users, and you'll have to deal with a LOT of traffic to your site.
However, with aggressive caching, you may end up serving stale content, or old versions of your JavaScript and CSS.
A common solution to this is to use eTags and "modifiedSince" requests - this means the browser will check whether a resource in the cache is out of date by asking the server if it has a newer version, and only download the new version if it needs to.

Related

Do websites share cached files?

We're currently doing optimizations to our web project when our lead told us to push the use of CDNs for external libraries as opposed to including them into a compile+compress process and shipping them off a cache-enabled nginx setup.
His assumption is that if the user has visits example.com which uses a CDN'ed version of jQuery, the jQuery is cached that time. If the user happens to visit example2.com and happen to use the same CDN'ed jQuery, the jQuery will be loaded from cache instead of over the network.
So my question is: Do domains actually share their cache?
I argued that even if it is possible the browser does share cache, the problem is that we are running on the assumption that the previous sites use the same exact CDN'ed file from the same exact CDN. What are the chances of running into a user browsing through a site using the same CDN'ed file? He said to use the largest CDN to increase chances.
So the follow-up question would be: If the browser does share cache, is it worth the hassle to optimize based on his assumption?
I have looked up topics about CDNs and I have found nothing about this "shared domain cache" or CDNs being used this way.
Well your lead is right this is basic HTTP.
All you are doing is indicating to the client where it can find the file.
The client then handles sending a request to the CDN in compliance with their caching rules.
But you shouldn't over-use CDNs for libraries either, keep in mind that if you need a specific version of the library, especially older ones, you won't be likely to get much cache hits because of version fragmentation.
For widely used and heavy libraries like jQuery you want the latest version of it is recommended.
If you can take them all from the same CDN all the better (ie: Google's) especially as http2 is coming.
Additionally they save you bandwidth, which can amount to a lot when you have high loads of traffic, and can reduce the load time for users far from your server (Google's is great for this).

Sitecore with DMS vs caching server - how do you handle it?

We're planning to introduce DMS to our customer's Sitecore installation. It's a rather popular site in our country and we have to use proxy caching server (it's Nginx in this case) to make it high-traffic-proof.
However, as far as we know, it's not possible to use all the DMS features with caching proxy enabled - for example personalization of content - if it gets cached it won't be personalized.
Is there a way to make use of all the DMS features with proxy cache turned on? If not, how do you handle this problem for high-traffic sites - is it buying more Content Delivery servers to carry the load, or extending current server with better hardware (RAM, CPU, bandwidth)?
You might try moving away from your proxy caching for some pages, or even all.
There's no reason not to use a CDN for static assets and media library assets, so stick with that
Leverage Sitecore's built-in html cache for sublayouts/renderings - there are quite a few options for caching
Use Sitecore's Debug feature to track down the slowest components on your site
Consider using indexes instead of doing "fast" or Sitecore queries
Don't do descendants query "//*" (I often see this when calculating selected state for navigation - hint: go the other way, calculate the ancestors of the current page)
#jammykam wrote an excellent answer on this over here.
John West wrote a great blog post on this also, though a bit older.
Good luck!
I've been wondering about this myself.
I have been thinking of implementing an ajax web service that:
- talks to the DMS and returns JSON
- allows you to render the personalised components client side
- allows you to trigger anlaytics events
I have been googling around and I haven't found anyone that has done it and published the information yet. The only place I have found something similar is actually in the mobile sdk, but I haven't had a chance to delve into it yet.
I have also not been able to use proxy server caching and DMS together successfully. For extremely high loads, I have recommended to clients to follow the standard optimization and scaling guidelines, especially architecting for proper Sitecore sublayout and layout caching for as much of the site as possible. With that caching done, follow it up by distributing across multiple Content Delivery nodes with load balancing to help support high volume with personalization at the same time.
I've heard that other CMS's with personalization use a javascript approach to load the personalized content on the client-side, but I would be worried about losing track of the analytics data that is gathered when personalized content is loaded and interacted with.

Server side caching in openrasta

I have a site that is being polled rather hard for the JSON representation of the same resources from multiple clients (browsers, other applications, unix shell scripts, python scripts, etc).
I would like to add some caching such that some of the resources are cached within the server for a configurable amount of time, to avoid the CPU hit of handling the request and serialising the resource to JSON. I could of course cache them myself within the handlers, but would then take the serialisation hit on every request, and would have to modify loads of handlers too.
I have looked at the openrasta-caching module but think this is only for controlling the browser cache?
So any suggestions for how I can get openrasta to cache the rendered representation of the resource, after the codec has generated it?
Thanks
openrasta-caching does have preliminary support for server-side caching, with an API you can map to the asp.net server-side cache, using the ServerCaching attribute. That said it's not complete, neither is openrasta-caching for that matter. It's a 0.2 that would need a couple of days of work to make it to a good v1 that completely support all the scenarios I wanted to support that the asp.net caching infrastructure doesn't currently support (mainly, make the caching in OpenRasta work exactly like an http intermediary rather than an object and .net centric one as exists in asp.net land, including client control of the server cache for those times you want the client to be allowed to force the server to redo the query). As I currently have no client project working on caching it's difficult to justify any further work on that plugin, so I'm not coding anything for it right now. I do have 4 days free coming up though, so DM me if you want openrasta-caching to be bumped to 0.3 with whatever requirements you have in that that would fit 4 days of work.
You can implement something simpler using an IOperationInterceptor and plugging in the asp.net pipeline using that, or be more web-friendly and implement your caching out of process using squid and rely on openrasta-caching for generating the correct http caching instructions.
That said, for your problem, you may not even need server caching if the cost is in json. If you map a last modified or an Etag to what the handler returns, it'll generate correctly a 304 where appropriate, and bypass json rendering alltogether, provided your client does conditional requests (and it should).
There's also a proposed API to allow you to further optimize your API by doing a first query on last modified/etag to return the 304 without retrieving any data.

Is it better to use Cache or CDN?

I was studying about browser performance when loading static files and this doubt has come.
Some people say that use CDN static files (i.e. Google Code, jQuery
latest, AJAX CDN,...) is better for performance, because it requests
from another domain than the whole web page.
Other manner to improve the performance is to set the Expires header
equal to some months later, forcing the browser to cache the static
files and cutting down the requests.
I'm wondering which manner is the best, thinking about performance and
if I may combine both.
Ultimately it is better to employ both techniques if you are doing web performance optimization (WPO) of a site, also known as front-end optimization (FEO). They can work amazingly hand in hand. Although if I had to pick one over the other I'd definitely pick caching any day. In fact I'd say it's imperative that you setup proper resource caching for all web projects even if you are going to use a CDN.
Caching
Setting Expires headers and caching of resources is a must and should be done 100% of the time for your resources. There really is no excuse for not doing caching. On Apache this is super easy to config after enabling mod_expires.c and mod_headers.c. The HTML5 Boilerplate project has good implementation example in the .htaccess file and if your server is something else like nginx, lighttpd or IIS check out these other server configs.
Here's a good read if anyone is interested in learning about caching: Mark Nottingham's Caching Tutorial
Content Delivery Network
You mentioned Google Code, jQuery latest, AJAX CDN and I want to just touch on CDN in general including those you pay for and host your own resources on but the same applies if you are simply using the jquery hosted files cdn or loading something from http://cdnjs.com/ for example.
I would say a CDN is less important than setting server side header caching but a CDN can provide significant performance gains but your content delivery network performance will vary depending on the provider.
This is especially true if your traffic is a worldwide audience and the CDN provider has many worldwide edge/peer locations. It will also reduce your webhosting bandwidth significantly and cpu usage (a bit) since you're offloading some of the work to the CDN to deliver resources.
A CDN can, in some rarer cases, cause a negative impact on performance if the latency of the CDN ends up being slower then your server. Also if you over optimize and employ too much parallelization of resources (using multi subdomains like cdn1, cdn2, cdn3, etc) it is possible to end up slowing down the user experience and cause overhead with extra DNS lookups. A good balance is needed here.
One other negative impact that can happen is if the CDN is down. It has happened, and will happen again. This is more true with free CDN. If the CDN goes down for whatever reason, so does your site. It is yet another potential single point of failure (SPOF). For javascript resources you can get clever and load the resource from the CDN and should it fail, for whatever the case, then detect and load a local copy. Here's an example of loading jQuery from ajax.googleapis.com with a fallback (taken from the HTML5 Boilerplate):
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.8.2/jquery.min.js"></script>
<script>window.jQuery || document.write('<script src="js/vendor/jquery-1.8.2.min.js"><\/script>')</script>
Besides obvious free API resources out there (jquery, google api, etc) if you're using a CDN you may have to pay a fee for usage so it is going to add to hosting costs. Of course for some CDN you have to even pay extra to get access to certain locations, for example Asian nodes might be additional cost then North America.
For public applications, go for CDN.
Caching helps for repeated requests, but not for the first request.
To ensure fast load on first page visit use a CDN, chances are pretty good that the file is already cached by another site already.
As other have mentioned already CDN results are of course heavily cached too.
However if you have an intranet website you might want to host the files yourself as they typically load faster from an internal source than from a CDN.
You then also have the option to combine several files into one to reduce the number of requests.
A CDN has the benefit of providing multiple servers and automatically routing your traffic to the closest location to your client. This can result in faster delivery, optimized by location.
Also, static content doesn't require special application servers (like dynamic content) so for you to be able to offload it to a CDN means you completely reduce that traffic. A streaming video clip may be too big to cache or should not be cached. But you don't neccessarily want to support that bandwidth. A CDN will take on that traffic for you.
It is not always about the cache. A small application web server may just want to provide the dynamic content but needs a solution for the heavy hitting media that rarely changes. CDNs handle the scaling issue for you.
Agree with #Anthony_Hatzopoulos (+1)
CDN complements Caching; also in some cases, it will help optimize Caching directives.
For example, a company I work for has integrated behavior-learning algorithms into its CDN, to identify and dynamically cache generated objects.
Ordinarily, these objects would be un-Cachable (i.e. [Cache-Control: max-age=0] Http header). But in this case, the system is able to identify Caching possibilities and override original HTTP Header directions. (For example: a dynamically generated popular product that should be Cached, or popular Search result page that, while being generated dynamically, is still presented time over time in the same form to thousands of users).
And yes, before you ask, the system can also identify personalized data and very freshness, to prevent false positives... :)
Implementing such an algorithm was only possible due to a reverse-proxy CDN technology. This is an example of how CDN and Caching can complement each other, to create better and smarter acceleration solutions.
Above those experts quotes, the explanation are perfect to understand CDN tech and also cache
I would just provide my personal experience, I had worked on the joomla virtuemart site and unfortunately it will not allow update new joomla and virtuemart version cause it was too much customised fields in product pages, so once the visitor up to 900/DAY and lots user could not put their items in their basket because each time to called lots js and ajax called for order items takes too much time
After optimise the site, we decide to use CDN, then the performance is really getting good, along by record from gtmetrix, the first YSlow Score was 50% then after optimise + CDN it goes to 74%
https://gtmetrix.com/reports/www.florihana.com/jWlY35im
and from dashboard of CDN you could see which datacenter cost most and data charged most to get your improvement of marketing:
But yes to configure CDN it has to be careful of purge time and be balancing numbers of resource CDN cause if it down some problem you need to figure out which resource CDN cause
Hope this does help

How long do you cache resources client side?

A website of mine will host the usual images, javascript and CSS stylesheets in the database. Since these are unlikely to change each day, I am going to use some client caching on these to reduce the server load.
How long do you cache these? A few days? More?
I'm probably not going to reuse the same name twice if I update the resource, so I shouldn't have outdated data concerns.
If you're changing the name when you update, then you have the luxury of caching forever. This is a big plus if you're okay with name changing.
It's also a good idea when it comes to script and CSS as some client machines will persist to cache even if you had ruled otherwise.

Resources