Azure CDN "looses" requests - caching

We're using Azure CDN (Verizon Standard) to serve images to ecommerce sites, however, we're experiencing unreasonable amount of loads from origin, images which should've been cached in the CDN is requested again multiple times.
Images seems to stay in the cache if they're requested very frequently (setting up a pingdom page speed test doesn't show the problem, it's executing every 30 minutes).
Additionally, if I request an image (using the browser), the scaled image is requested from the origin and delivered, but the second request doesn't return a cached file from the CDN but origin is called again. The third request returns from the CDN.
The origin is a web app which scales and delivers the requested images. All requests for images have the following headers which might affect caching:
cache-control: max-age=31536000, s-maxage=31536000
ETag: e7bac8d5-3433-4ce3-9b09-49412ac43c12?cache=Always&maxheight=3200&maxwidth=3200&width=406&quality=85
Since we want the CDN to cache the scaled image, Azure CDN Endpoint is configured to cache every unique url and the caching behaviour is "Set if missing" (although all responses have the headers above).
Using the same origin with AWS Cloudfront works perfectly (but since we have everything else in Azure, it would be nice to make it work). I haven't been able to find if there's any limit or constraints for the ETag but since it works with AWS it seems like I'm missing something related to either Azure or Verizon.

Related

Multi-language URL cache and basic Cloudflare setup

This is about the cache strategy for serving multiple languages on the same URL, while using Cloudflare (not enterprise).
(Obviously that's not a problem when navigating with JavaScript with Ajax requests or when using an "hreflang" link)
Our server handles it correctly, the nginx cache is able to store the different caches per language and serve them by a cookie or accept-language value. Also the client browser will be able to differentiate caches by language using “Etags” header.
But Cloudflare (not Enterprise), only holds one cache per URL, and cannot serve a cache depending on a cookie value.
My fix for this, is using Cache-Control=“no-cache”, so Cloudflare will always validate if the cache is still valid, if the requested language is the same as the Cloudflare cache, that will be true, if not, Cloudflare will receive a new cache.
I guess you can see the problem, if a page is constantly requested in different languages, Cloudflare will be changing the cache all the time, on top of always be validating to my server.
Do you see any better strategy with the same setup? (Not including: using Cloudflare Enterprise, using my own mini-CDN network of proxy servers, or separating the URLs completely by language.)

Does it make sense to set Cache-Control max-age=0 and s-maxage= not zero?

Somebody commented on this question about caching:
...using a Cache-Control value of: max-age=0, s-maxage=604800 seems to get my desired behavior of instant client updates on new page contents, but still caching at the CDN level
Will I really get caching at CDN level and instant updates for my users?
Does it make sense? How does that combination work?
Yes, it makes sense.
With the configuration mentioned in that comment, your users will get instant stale responses, so they'll have to verify it the next time they make a resquest. And the CDN will cache a valid response for 604800 seconds. So repeated requests will be mostly served by CDN, instead of the Origin server.
But what if you update your app? What happens to the stale cache on the CDN?
After a new deployment, you need to make sure all of your stale cache from the CDN will be purged / cleared.
For example, see Purging cached resources from Cloudflare: it gives you numerous options on how to do that.
Purge by single-file (by URL)
Purging by single-file through your Cloudflare dashboard
Purge everything
Purge cached resources through the API
etc
Firebase Hosting, for example, will clear all CDN cache after a new deployment:
Any requested static content is automatically cached on the CDN. If you redeploy your site's content, Firebase Hosting automatically clears all your cached static content across the CDN until the next request.
As far as the setting suggested in the comment, I think Cache-Control: no-cache would do a better job.
From MDN - Cache Control:
no-cache
The response may be stored by any cache, even if the response is normally non-cacheable. However, the stored response MUST always go through validation with the origin server first before using it, therefore, you cannot use no-cache in-conjunction with immutable. If you mean to not store the response in any cache, use no-store instead. This directive is not effective in preventing caches from storing your response.

Cloudfront queues parallel requests - high and sequential time-to-first-byte (TTFB)

I have a web application that requests a lot of media assets in parallel using AJAX. All assets are coming from the same Cloudfront Origin, which is itself directly plugged into an S3 bucket.
I'm seeing requests from Cloudfront with TTFB of the order of seconds. Even more odd, it seems that those requests are basically queued until a previous request has been served:
Those two requests are initiated in parallel, and you can see that it's not Chrome queueing them, but Cloudfront not answering anything to the second (2KB) request until the first request has completed download. This is slowing down my application by a huge margin, and I cannot figure out what is going wrong... I see the same behavior when I check with Safari too.
Here are the two requests details
As you can see, they are also both Hit from cloudfront.
Finally, as it might be relevant, I'm using a lambda function in my Origin's behavior to add the proper Vary headers, to prevent Chrome from using cached requests without the CORS headers that will make subseqeuent CORS request fail (see details here).
Here is my complete Origin's behavior settings:
Any help is appreciated, and please feel free to ask more details if needed! Thanks a lot in advance.

Amazon Cloudfront: private content but maximise local browser caching

For JPEG image delivery in my web app, I am considering using Amazon S3 (or Amazon Cloudfront
if it turns out to be the better option) but have two, possibly opposing,
requirements:
The images are private content; I want to use signed URLs with short expiration times.
The images are large; I want them cached long-term by the users' browser.
The approach I'm thinking is:
User requests www.myserver.com/the_image
Logic on my server determines the user is allowed to view the image. If they are allowed...
Redirect the browser (is HTTP 307 best ?) to a signed Cloudfront URL
Signed Cloudfront URL expires in 60 seconds but its response includes "Cache-Control max-age=31536000, private"
The problem I forsee is that the next time the page loads, the browser will be looking for
www.myserver.com/the_image but its cache will be for the signed Cloudfront URL. My server
will return a different signed Cloudfront URL the second time, due to very short
expiration times, so the browser won't know it can use its cache.
Is there a way round this without having my webserver proxy the image from Cloudfront (which obviously negates all the
benefits of using Cloudfront)?
Wondering if there may be something I could do with etag and HTTP 304 but can't quite join the dots...
To summarize, you have private images you'd like to serve through Amazon Cloudfront via signed urls with a very short expiration. However, while access by a particular url may be time limited, it is desirable that the client serve the image from cache on subsequent requests even after the url expiration.
Regardless of how the client arrives at the cloudfront url (directly or via some server redirect), the client cache of the image will only be associated with the particular url that was used to request the image (and not any other url).
For example, suppose your signed url is the following (expiry timestamp shortened for example purposes):
http://[domain].cloudfront.net/image.jpg?Expires=1000&Signature=[Signature]
If you'd like the client to benefit from caching, you have to send it to the same url. You cannot, for example, direct the client to the following url and expect the client to use a cached response from the first url:
http://[domain].cloudfront.net/image.jpg?Expires=5000&Signature=[Signature]
There are currently no cache control mechanisms to get around this, including ETag, Vary, etc. The nature of client caching on the web is that a resource in cache is associated with a url, and the purpose of the other mechanisms is to help the client determine when its cached version of a resource identified by a particular url is still fresh.
You're therefore stuck in a situation where, to benefit from a cached response, you have to send the client to the same url as the first request. There are potential ways to accomplish this (cookies, local storage, server scripting, etc.), and let's suppose that you have implemented one.
You next have to consider that caching is only just a suggestion and even then it isn't a guarantee. If you expect the client to have the image cached and serve it the original url to benefit from that caching, you run the risk of a cache miss. In the case of a cache miss after the url expiry time, the original url is no longer valid. The client is then left unable to display the image (from the cache or from the provided url).
The behavior you're looking for simply cannot be provided by conventional caching when the expiry time is in the url.
Since the desired behavior cannot be achieved, you might consider your next best options, each of which will require giving up on one aspect of your requirement. In the order I would consider them:
If you give up short expiry times, you could use longer expiry times and rotate urls. For example, you might set the url expiry to midnight and then serve that same url for all requests that day. Your client will benefit from caching for the day, which is likely better than none at all. Obvious disadvantage is that your urls are valid longer.
If you give up content delivery, you could serve the images from a server which checks for access with each request. Clients will be able to cache the resource for as long as you want, which may be better than content delivery depending on the frequency of cache hits. A variation of this is to trade Amazon CloudFront for another provider, since there may be other content delivery networks which support this behavior (although I don't know of any). The loss of the content delivery network may be a disadvantage or may not matter much depending on your specific visitors.
If you give up the simplicity of a single static HTTP request, you could use client side scripting to determine the request(s) that should be made. For example, in javascript you could attempt to retrieve the resource using the original url (to benefit from caching), and if it fails (due to a cache miss and lapsed expiry) request a new url to use for the resource. A variation of this is to use some caching mechanism other than the browser cache, such as local storage. The disadvantage here is increased complexity and compromised ability for the browser to prefetch.
Save a list of user+image+expiration time -> cloudfront links. If a user has an non-expired cloudfront link use it for an image and don't generate a new one.
It seems you already solved the issue. You said that your server is issuing a redirect http 307 to the cloudfront URL (signed URL) so the browser caches only the cloudfront URL not your URL(www.myserver.com/the_image). So the scenario is as follows :
Client 1 checks www.myserver.com/the_image -> is redirect to CloudFront URL -> content is cached
The CloudFront url now expires.
Client 1 checks again www.myserver.com/the_image -> is redirected to the same CloudFront URL-> retrieves the content from cache without to fetch again the cloudfront content.
Client 2 checks www.myserver.com/the_image -> is redirected to CloudFront URL which denies its accesss because the signature expired.

Cache busting a local browser cache but ensuring response from Azure CDN and not from origin server (Web Role)

I am trying to make a HEAD request to an item on Azure CDN (production site) but want to avoid the response coming from either my local browser cache, nor from the origin server (my web role). This is going to be a heavily trafficked web site and the content is all static and thus cached on Azure CDN from a /cdn folder in my web role.
I have solved the problem of avoiding my local browser cache by calling:
$.ajaxSetup({cache: false});
Also my HEAD request is being used to simply retrieve the Response Date as all I want is a guaranteed current time in GMT (Azure is all set to GMT):
$.ajax({
type: "HEAD",
async: true,
url: "small.png",
success: function (message, text, response) {
doSomething(response.getResponseHeader("Date"));
}
});
Now I am guaranteeing that my response is absolutely not being fulfilled by a cached copy on my browser, but I am not sure how to verify if the response is coming from Azure CDN or the origin server (web role). I want to guarantee that if "small.png" is on Azure CDN, that my response comes from there. Basically, I need to confirm that my origin server will not be bombarded by requests and that the CDN will throttle 99.999999% of the requests including this one. However, because of my cache-busting prior to the HEAD request ($.ajaxSetup({cache: false}); which appends a unique querystring onto the request, I am not sure if Azure CDN is deciding to forward the request on to the origin server.
NOTE that via the Azure portal, I have left "Enable Querystring" unchecked on my CDN. I THINK this is sufficient to satisfy me, but I want the warm and fuzzy feeling that indeed my response is coming from Azure CDN and not origin. Is there any indicator in Fiddler that will prove to me my response is from CDN(proxy server) rather than the origin server?
Currently I have 30 minute cache expires on everything but I will adjust/optimize this when we go live.
Keeping "enable query string" unchecked does exactly what you want.
As to determining whether the CDN went back to the origin server on a given request, you own the origin server, right? So you can see there if you were hit or not. I'm not sure there's a way to tell from just looking at the CDN's response whether it was a cache hit or a cache miss.

Resources