Azure storage and CDN and getting blob - caching

I have problem with caching in CDN for Azure Storage. I have set up storage, add CDN on it, and put custom domain.
And all works fine, until recently, when one image I uploaded, get stuck on CDN. When I'm getting to it without CDN all works fine, but over CDN it always shown old image. I tried everything, I put custom cache expiration, I deleted, I moved it... But nothing works. And I waited for one day, maybe will fix by Auzura automatically, or some caching will expiry, but nothing.
Does anybody had similar problem before? How to fix it?
All other images (blobs) in same container works fine.

You have to be really careful when planning the CDN usage. When you enable for CDN, you have to be in full control of all and any Blob files that will be served through the CDB.
This is by being explicit about setting x-ms-blob-cache-control property on a Put Blob, Put Block List, or Set Blob Properties request, or use the Azure Managed Library to set the BlobProperties.CacheControl property.
In case you forgot to set this property before the file was accessed by the CDN, the CDN assumes 7 days as TTL (Time-to-live for that file). Any consequent change of the settings (cache-control property of the blob) will not take effect until after the 7 days TTL elapses. I believe you have accidentally entered into this default 7 days TTL (hoping it is not the worst - wrongly set cache-expire header with a longer period)
You can read more on best practices for controlling CDN content here. And I warmly ask you to give your 3 votes at this feature request.

Related

Website is different when I upload on FTP

I have just started developing for a few weeks now and I bought a domain, but when I upload the files on live, the website looks different than what I have uploaded. Now, this gets fixed when I clear my cache. The problem is that my visitors enter, they see the page in a way, and after I update it they see it as the previous version!
Is there any possible solution for this? I don't want my visitors to clear cache every time I make a change on my website!
This is quite probable to be due to css cache. Your server is loading a cached version. You can specify the cached time in a few ways. Etags and htaccess (on apache) are the most common.
A very simple trick is just to add at the end of your style link url (where you load your main style in the head of the document) a get-like parameter: just like this:
main.css?v=2

Employing a CDN for a dynamic website

I have a website forum where users exchange photos and text with one another on the home page. The home page shows 20 latest objects - be they photos or text. The 21st object is pushed out out of view. A new photo is uploaded every 5 seconds. A new text string is posted every second. In around 20 seconds, a photo that appeared at the top has disappeared at the bottom.
My question is: would I get a performance improvement if I introduced a CDN in the mix?
Since the content is changing, it seems I shouldn't be doing it. However, when I think about it logically, it does seem I'll get a performance improvement from introducing a CDN for my photos. Here's how. Imagine a photo is posted, appearing on the page at t=1 and remaining there till t=20. The first person to access the page (closer to t=1) will enable to photo to be pulled to an edge server. Thereafter, anyone accessing the photo will be receiving it from the CDN; this will last till t=20, after which the photo disappears. This is a veritable performance boost.
Can anyone comment on what are the flaws in my reasoning, and/or what am I failing to consider? Also would be good to know what alternative performance optimizations I can make for a website like mine. Thanks in advance.
You've got it right. As long as someone accesses the photo within the 20 seconds that the image is within view it will be pulled to an edge server. Then upon subsequent requests, other visitors will receive a cached response from the nearest edge server.
As long as you're using the CDN for delivering just your static assets, there should be no issues with your setup.
Additionally, you may want to look into a CDN which supports HTTP/2. This will provide you with improved performance. Check out cdncomparison.com for a comparison between popular CDN providers.
You need to consider all requests hitting your server, which includes the primary dynamically generated HTML document, but also all static assets like CSS files, Javascript files and, yes, image files (both static and user uploaded content). An HTML document will reference several other assets, each of which needs to be downloaded separately and thus incurs a server hit. Assuming for the sake of argument that each visitor has an empty local cache, a single page load may incur, say, ~50 resource hits for your server.
Probably the only request which actually needs to be handled by your server is the dynamically generated HTML document, if it's specific to the user (because they're logged in). All other 49 resource requests are identical for all visitors and can easily be shunted off to a CDN. Those will just hit your server once [per region], and then be cached by the CDN and rarely bother your server again. You can even have the CDN cache public HTML documents, e.g. for non-logged in users, you can let the CDN cache HTML documents for ~5 seconds, depending on how up-to-date you want your site to appear; so the CDN can handle an entire browsing session without hitting your server at all.
If you have roughly one new upload per second, that means there is likely a magnitude more passive visitors per second. If you can let a CDN handle ~99% of requests, that's a dramatic reduction in actual hits to your server. If you are clever with what you cache and for how long and depending on your particular mix of anonymous and authenticated users, you can easily reduce server loads by a magnitude or two. On the other side, you're speeding up page load times accordingly for your visitors.
For every single HTML document and other asset, really think whether this can be cached and for how long:
For HTML documents, is the user logged in? If no, and there's no other specific cookie tracking or similar things going on, then the asset is static and public for all intents and purposes and can be cached. Decide on a maximum age for the document and let the CDN cache it. Even caching it for just a second makes a giant difference when you get 1000 hits per second.
If the user is logged in, set the cache pragma to private, but still let the visitor's browser cache it for a few seconds. These headers must be decided upon by your forum software while it's generating the document.
For all other assets which aren't access restricted: let the CDN cache it for a long time and you can practically forget about ever having to serve those particular files ever again. These headers can be statically configured for entire directories in the web server.

How to cache images and html files in PhoneGap

I need a way for cache images and html files in PhoneGap from my site. I'm planning that users will see site without internet connection like it will be with it. But I see information only about sql data storing, but how can I store images (and use later).
To cache images check out this library -of which I'm the creator-:
imgcache.js
. It's designed for the very purpose of caching images using the local filesystem. If you check out the examples you will see that it can also detect when an image fails to be loaded (because you're offline or you have a very bad connection) and then replaces it automatically with the cached image. The user of the webapp doesn't even notice it's offline.
As for html pages.. if they're html static files, they could be stored locally in the web app (file:// in phonegap).
If they're dynamically generated pages, check the localStorage API if you have a small amount of data, otherwise the filesystem API.
For my web app I retrieve only json data from my server (and process/render it using Backbone+Underscore). The json payload is stored into the localStorage. If the application gets offline, it will fetch json data from the localStorage instead of the server (home-baked fork of Backbone.dualStorage)
You then get the full offline experience: pages+images.
Caching like you might need for simple offline operation is not exactly that easy.
Your first option is the cache manifest. It has some limitations (like the size of the cache) but might work for you since it was designed to do what you want.
Another options is that you can store content on the disk of the device using the file system APIs. This has some drawbacks like security and the fact that you have to load the file from a path / url that is different than you might normally load it from on the web. Check out the hydra plugin for an example of this.
One final option might be to store stuff in localStorage (which has the benefit of being private on all platforms) and then pull it out of there when needed ... that means base64'ing all your images tho so that is a pretty big departure from just standard caching.
Caching is very much possible on Android OS. but on Apple as stated above there are limitations with the size of the images and cache size etc.
If you are willing to integrate and allow the caching on iOS you can use "cache manifest" to do so. but keep the draw backs and limitations in mind.
Also
if you want to save the file to Documents folder under my App, Apple will reject your App. The reason is the system backup all data under Documents folder to iCould after iOS6, so Apple does not allow big data like images or JSON file which could sync from your server again to keep in this folder.
So there is another work around which is good So one can use LocalFileSystem.TEMPORARY instead. It does not save the data to Library/Cache, but it save data to temp folder of App, which does not been auto backup to iCloud and not auto deleted either.
Regards
Rajeev

Azure Blob storage, CDN and cache expiring

We're using Azure CDN, but we've stumbled upon a problem. Before, content could not be updated. But we added the option for our users to crop their picture, which changes the thumbnails. See, the image is not being created as new, instead we just update the stream of the blob.
There doesn't seem to be any method to clear the cache, update any headers or anything else.
Is the only answer here to make a new blob and delete the old?
Thanks.
CDN would still cache the content, unless the cache-expiry passes, or the file name changes.
CDN is best for static content with a high cache hit ratio.
Using CDN for dynamic content is not recommended, because it causes the user to wait for a double hop from storage to cdn and from cdn to user.
You also pay twice the bandwidth on the initial load.
I guess the only workaround right now is to pass a dummy parameter in the request from the client to force download the file every time.
http://resourceurl?dummy=dummyval

client-side file caching

If I understand correctly, a broswer caches images, JS files, etc. based on the file name. So there's a danger that if one such file is updated (on the server), the browser will use the cached copy instead.
A workaround for this problem is to rename all files (as part of the build), such that the file name includes an MD5 hash of it's contents, e.g.
foo.js -> foo_AS577688BC87654.js
me.png -> me_32126A88BC3456BB.png
However, in addition to renaming the files themselves, all references to these files must be changed. For exmaple a tag such as <img src="me.png"/> should be changed to <img src="me_32126A88BC3456BB.png"/>.
Obviously this can get pretty complicated, particularly when you consider that references to these files may be dynamically created within server-side code.
Of course, one solution is to completely disable caching on the browser (and any caches between the server and the browser) using HTTP headers. However, having no caching will create it's own set of problems.
Is there a better solution?
Thanks,
Don
The best solution seems to be to version filenames by appending the last-modified time.
You can do it this way: add a rewrite rule to your Apache configuration, like so:
RewriteRule ^(.+)\.(.+)\.(js|css|jpg|png|gif)$ $1.$3
This will redirect any "versioned" URL to the "normal" one. The idea is to keep your filenames the same, but to benefit from cache. The solution to append a parameter to the URL will not be optimal with some proxies that don't cache URLs with parameters.
Then, instead of writing:
<img src="image.png" />
Just call a PHP function:
<img src="<?php versionFile('image.png'); ?>" />
With versionFile() looking like this:
function versionFile($file){
$path = pathinfo($file);
$ver = '.'.filemtime($_SERVER['DOCUMENT_ROOT'].$file).'.';
echo $path['dirname'].'/'.str_replace('.', $ver, $path['basename']);
}
And that's it! The browser will ask for image.123456789.png, Apache will redirect this to image.png, so you will benefit from cache in all cases and won't have any out-of-date issue, while not having to bother with filename versioning.
You can see a detailed explanation of this technique here: http://particletree.com/notebook/automatically-version-your-css-and-javascript-files/
Why not just add a querystring "version" number and update the version each time?
foo.js -> foo.js?version=5
There still is a bit of work during the build to update the version numbers but filenames don't need to change.
Renaming your resources is the way to go, although we use a build number and embed that in to the file name instead of an MD5 hash
foo.js -> foo.123.js
as it means that all your resources can be renamed in a deterministic fashion and resolved at runtime.
We then use custom controls to generate links to resources at on page load based upon the build number which is stored in an app setting.
We followed a similar pattern to PJP, using Rails and Nginx.
We wanted user avatar images to be browser cached, but on an avatar's change we needed the cache to be invalidated ASAP.
We added a method to the avatar model to append a timestamp to the file name:
return "/images/#{sourcedir}/#{user.login}-#{self.updated_at.to_s(:flat_string)}.png"
In all places in the code where avatars were used, we referenced this method rather than an URL. In the Nginx configuration, we added this rewrite:
rewrite "^/images/avatars/(.+)-[\d]{12}.png" /images/avatars/$1.png;
rewrite "^/images/small-avatars/(.+)-[\d]{12}.png" /images/small-avatars/$1.png;
This meant if a file changed, its URL in the HTML changed, so the user's browser made a new request for the file. When the request reached Nginx, it got rewritten to the simple name of the file.
I would suggest using caching by ETags in this situation, see http://en.wikipedia.org/wiki/HTTP_ETag. You can then use the hash as the etag. A request will still be submitted for each resource, but the browser will only download items that have changed since last download.
Read up on your web server / platform docs on how to use etags properly, most decent platforms have built-in support.
Most modern browsers check the if-modified-since header whenever a cacheable resource is in a HTTP request. However, not all browsers support the if-modified-since header.
There are three ways to "force" the browser to load a cached resource.
Option 1 Create a query string with a version#. src="script.js?ver=21". The downside is many proxy servers wont cache a resource with query strings. It also requires site-wide updating for changes.
Option 2 Create a naming system for your files src="script083010.js". However the downside to option 1 is that this as well requires site-wide updates whenever a file changes.
Option 3 Perhaps the most elegant solution, simply set up the caching headers: last-modified and expires in your server. The main downside to this is users may have to recache resources because they expired yet never changed. Additionally, the last-modified header does not work well when content is being served from multiple servers.
Here a few resources to check out: Yahoo Google AskApache.com
This is really only an issue if your web server sets a far-future "Expires" header (setting something like ExpiresDefault "access plus 10 years" in your Apache config). Otherwise, a browser will make a conditional GET, based on the modified time and/or the Etag. You can verify what is happening on your site by using a web proxy or an extension like Firebug (on the Net panel). Your question doesn't mention how your web server is configured, and what headers it is sending with static files.
If you're not setting a far-future Expires header, there's nothing special you need to do. Your web server will usually handle conditional GETs for static files based on last modified time just fine. If you are setting a far-future Expires header then yes, you need to add some sort of version to the file name like your question and the other answers have mentioned already.
I have also been thinking about this for a site I support where it would be a big job to change all references. I have two ideas:
1.
Set distant cache expiry headers and apply the changes you suggest for the most commonly downloaded files. For other files set the headers so they expire after a very short time - eg. 10 minutes. Then if you have a 10 minute downtime when updating the application, caches will be refreshed by the time users go to the site. General site navigation should be improved as the files will only need downloading every 10 minutes not every click.
2.
Each time a new version of the application is deployed to a different context that contains the version number. eg. www.site.com/app_2_6_0/ I'm not really sure about this as users bookmarks would be broken on each update.
I believe that a combination of solutions works best:
Setting cache expiry dates for each type of resource (image, page, etc) appropreatly for that resource, for example:
Your static "About", "Contact" etc pages probably arn't going to change more than a few time a year, so you could easily put a cache time of a month on these pages.
Images used in these pages could have eternal cache times, as you are more likey to replace an image then to change one.
Avatar images might have an expiry time of a day.
Some resources need modified dates in their names. For example avatars, generated images, and the like.
Some things should never be caches, new pages, user content etc. In these cases you should cache on the server, but never on the client side.
In the end you need to carfully consider each type of resource to determine what cache time to instruct the browser to use, and always be conservitive if you are unsure. You can increase the time later, but it's much more pain to uncache something.
You might want to check out the approach taken by the grails "uiperformance" plugin, which you can find here. It does a lot of the things you mention, but automates them (set expiry time to a long time, then increments version numbers when files change).
So if you're using grails, you get this stuff for free. If you are not - maybe you can borrow the techniques employed.
Also - borrowed form the ui-performance page, - read the following 14 rules.
ETags seemingly provide a solution for this...
As per http://httpd.apache.org/docs/2.0/mod/core.html#fileetag, we can set the browser to generate ETags on file-size (instead of time/inode/etc). This generation should be constant across multiple server deployments.
Just enable it in (/etc/apache2/apache2.conf)
FileETag Size
& you should be good!
That way, you can simply reference your images as <img src='/path/to/foo.png' /> and still use all the goodness of HTTP caching.

Resources