Can't make CloudFront supply cache-control headers - caching

I've done lots of reading on this but none of the solutions I have seen seem to work for IIS websites - most seem to suggest some server-side solution but none of that works for me.
I'm optimising one of our sites, and PageSpeed, YSlow and Lighthouse all complain that images I'm serving from our CloudFront CDN don't have any cache headers. The CDN serves from an S3 bucket.
e.g. https://static.edie.net/webimages/new_new_new.png (expiration not specified)
Crops up as both 'There are static components without a far-future expiration date' and 'Leverage browser caching for the following cacheable resources'
I can't for the life of me work out how to make CloudFront serve a cache header for images like this.
I have set
Cache-Control: max-age=5500000
on the s3 bucket/file itself, and if you check the file via the bucket: https://devedienet.s3.amazonaws.com/webimages/new_new.png then it has the cache header present.
But that doesn't seem to affect the CloudFront image, which only has these headers:
Age: 12153
Connection: keep-alive
Date: Mon, 22 Oct 2018 11:18:49 GMT
ETag: "940fd4d68428cf3e4f88a45aab4d7157"
Server: AmazonS3
Via: 1.1 4f95eb10423b781564e79d7c85f85795.cloudfront.net (CloudFront)
X-Amz-Cf-Id: TZAWy8U12-ohhe-dwTkCLqXHbJKI7CJqQd21I-lvq-8rloZjTew6aw==
x-amz-meta-s3b-last-modified: 20181017T105350Z
X-Cache: Hit from cloudfront
I've tried adding custom behavours into AWS' Control Panel for the CloudFront distribution:
webimages/*.png
Minimum TTL: 5500000
But again this seems to have no effect.
Note that I invalidated all the images in the folder after adding the new rule above, this but no dice.
Am I missing something or misunderstanding what is required?

Since you are serving content from S3 through cloudfront, then you need to add the following headers to objects in S3 while uploading files to S3. Expires: {some future date}
Bonus: You do not need to specify this header for every object individually. You can upload a bunch of files together on S3, click next, and then on the screen that asks S3 storage class, scroll down and add these headers. And don't forget to click save!

I was facing a similar problem, so after doing some reading and experimenting what I figured out is that Cloudfront's Object Caching values of Minimum TTL, Max TTL, Default TTL doesn't add the Cache-Control header explicitly in the Response Headers for the resource if the resource doesn't have one at Server level. Secondlly even if the resource has aCache-control metadata added at S3, it should fall in between
MinTTL < s3 Cache < Max TTL .
The Object caching values state that for the provided value the resources will be cached for that much time at edge location and no Cache-Control will be added in Response headers for the resource.
What I did instead was to create a Lambda function and add it under Lambda associations by updating the Cache Behaviour settings for Viewer Response. Here is my Lambda fn. This added the Cache-Control header in the requested resource.
'use strict';
exports.handler = (event, context, callback) => {
const response = event.Records[0].cf.response;
const headers = response.headers;
const headerCache = 'Cache-Control';
headers[headerCache.toLowerCase()] = [{
key: headerCache,
value: 'max-age=1096000'
}];
callback(null, response);
};

Related

ASP.NET MVC Page with ResponseCache on action DOES return new content instead of cache

I followed this tutorial (https://learn.microsoft.com/en-us/aspnet/core/performance/caching/response?view=aspnetcore-2.1) to implement ResponseCache on my controller-action.
In short, I added
services.AddResponseCaching(); and app.UseResponseCaching(); in the startup and this tag [ResponseCache( Duration = 30)] on my controller.
Then I added a <h2>#DateTime.Now</h2> in my view and what I expected.... was the same datetime.now for 30 seconds.
But it doesn't, it just shows the new time on every reload (F5).
I made sure my devtools in chrome do not say 'disable cache'.
It's both with and without the chrome devtools open, on my local machine, now trying on a brandnew .net core mvc project.
One thing I noticed (with devtools open) is that the request has this header: Cache-Control: max-age=0. Does this influence the behaviour?
I thought it would mean something because it looks like the request says 'no cache' but that strikes me as weird because I didn't put the header in and I would say the default behaviour of chrome wouldn't be to ignore caches?
A header like Cache-Control: max-age=0 effectively disables all caching. Resources are basically expired as soon as they come off the wire, so they are always fetched. This header originates from the server. The client has nothing to do with it.
Assuming you haven't disabled response caching manually in some way by accident. Then, the most likeliest situation is that you're doing something where the response caching middleware will never cache. The documentation lists the following conditions that must be satisfied before responses will be cached, regardless of what you do:
The request must result in a server response with a 200 (OK) status code.
The request method must be GET or HEAD.
Terminal middleware, such as Static File Middleware, must not process the response prior to the Response Caching Middleware.
The Authorization header must not be present.
Cache-Control header parameters must be valid, and the response must be marked public and not marked private.
The Pragma: no-cache header must not be present if the Cache-Control header isn't present, as the Cache-Control header overrides the Pragma header when present.
The Set-Cookie header must not be present.
Vary header parameters must be valid and not equal to *.
The Content-Length header value (if set) must match the size of the response body.
The IHttpSendFileFeature isn't used.
The response must not be stale as specified by the Expires header and the max-age and s-maxage cache directives.
Response buffering must be successful, and the size of the response must be smaller than the configured or default SizeLimit.
The response must be cacheable according to the RFC 7234 specifications. For example, the no-store directive must not exist in request or response header fields. See Section 3: Storing Responses in Caches of RFC 7234 for details.
However, in such situations, the server should be sending Cache-Control: no-cache, not max-age=0. As a result, I'm leaning towards some misconfiguration somewhere, where you have set this max age value and either forgot or overlooked it.
This is working for me in a 3.1 app to not let F5/Ctrl+F5 or Developer Tools in Firefox or Chrome bypass server cache for a full response.
In startup add this little middleware before UseResponseCaching().
// Middleware that fixes server caching on F5/Reload
app.Use(async (context, next) =>
{
const string cc = "Cache-Control";
if (context.Request.Headers.ContainsKey(cc))
{
context.Request.Headers.Remove(cc);
}
const string pragma = "Pragma";
if (context.Request.Headers.ContainsKey(pragma))
{
context.Request.Headers.Remove(pragma);
}
await next();
});
app.UseResponseCaching();
Haven't noticed any problems...

why is my cloudfront caching not working?

Okay, so I noticed my Cloudfront isn't caching when I ran Google Page Tools and it told me that my images had no expiration set. I use Amazon S3 through Cloudfront. There's no headers set on S3 because I have hundreds of folders and thousands of image files with new ones uploaded every hour.
I went to my Cloudfront console, to Behaviours, edited the only one there and set:
Minimum TTL: 86400
Maximum TTL: 31536000
Default TTL: 86400
And I checked the 'Customize' option for 'Object Caching'. I then went to invalidate and invalidated all my objects (*). I waited until it was done, but my headers when requesting a file still shows:
Age:8
Connection:keep-alive
Date:Mon, 07 Dec 2015 00:44:39 GMT
ETag:"429d87a5fd35288d207635d2a853fa0b"
Server:AmazonS3
Via:1.1 (my-ID-here).cloudfront.net (CloudFront)
X-Amz-Cf-Id:RxHlfhhnrSk9YwIqpFySnPVrscndnknZ9RKlIryXCLwh4RCK9vK6Vw==
X-Cache:Hit from cloudfront
What am I doing wrong?
Was this under "Leverage browser caching" or a similar section of Page Tools?
If under 'Leverage Browser Caching', - this doesn't mean the files aren't being cached, but it means that the requested files aren't requesting end user browsers to cache them - for instance using the 'Cache-Control' or 'Expires' headers. CloudFront, unless configured otherwise, caches files from S3 -- so the absence of these headers doesn't affect Cloudfront's caching.
You can manually add these in S3 for the individual objects.
Some more info can be found in the CloudFront documentation

How to tell cloudfront to not cache 302 responses from S3 redirects, or, how else to workaround this image caching generation issue

I'm using Imagine via the LIIPImagineBundle for Symfony2 to create cached versions of images stored in S3.
Cached images are stored in an S3 web enabled bucket served by CloudFront. However, the default LIIPImagineBundle implementation of S3 is far too slow for me (checking if the file exists on S3 then creating a URL either to the cached file or to the resolve functionality), so I've worked out my own workflow:
Pass client the cloudfront URL where the cached image should exist
Client requests the image via the cloudfront URL, if it does not exist then the S3 bucket has a redirect rule which 302 redirects the user to an imagine webserver path which generates the cached version of the file and saves it to the appropriate location on S3
The webserve 301 redirects the user back to the cloudfront URL where the image is now stored and the client is served the image.
This is working fine as long as I don't use cloudfront. The problem appears to be that cloudfront is caching the 302 redirect response (even though the http spec states that they shouldn't). Thus, if I use cloudfront, the client is sent in an endless redirect loop back and forth from webserver to cloudfront, and every subsequent request to the file still redirects to the webserver even after the file has been generated.
If I use S3 directly instead of cloudfront there are no issues and this solution is solid.
According to Amazon's documentation S3 redirect rules don't allow me to specify custom headers (to set cache-control headers or the like), and I don't believe that CloudFront allows me to control the caching of redirects (if they do it's well hidden). CloudFront's invalidation options are so limited that I don't think they will work (can only invalidate 3 objects at any time)...I could pass an argument back to cloudfront on the first redirect (from the Imagine webserver) to fix the endless redirect (eg image.jpg?1), but subsequent requests to the same object will still 302 to the webserver then 301 back to cloudfront even though it exists. I feel like there should be an elegant solution to this problem but it's eluding me. Any help would be appreciated!!
I'm solving this same issue by setting the "Default TTL" in CloudFront "Cache Behavior" settings to 0, but still allowing my resized images to be cached by setting the CacheControl MetaData on the S3 file with max-age=12313213.
This way redirects will not be cached (default TTL behavior) but my resized images will be (CacheControl max-age on s3 cache hit).
If you really need to use CloudFront here, the only thing I can think of is that you don’t directly subject the user to the 302, 301 dance. Could you introduce some sort of proxy script / page to front S3 and that whole process? (or does that then defeat the point).
So a cache miss would look like this:
Visitor requests proxy page through Cloudfront.
Proxy page requests image from S3
Proxy page receives 302 from S3, follows this to Imagine web
server
Ideally just return the image from here (while letting it update
S3), or follow 301 back to S3
Proxy page returns image to visitor
Image is cached by Cloudfront
TL;DR: Make use of Lambda#Edge
We face the same problem using LiipImagineBundle.
For development, an NGINX serves the content from the local filesystem and resolves images that are not yet stored using a simple proxy_pass:
location ~ ^/files/cache/media/ {
try_files $uri #public_cache_fallback;
}
location #public_cache_fallback {
rewrite ^/files/cache/media/(.*)$ media/image-filter/$1 break;
proxy_set_header X-Original-Host $http_host;
proxy_set_header X-Original-Scheme $scheme;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_pass http://0.0.0.0:80/$uri;
}
As soon as you want to integrate CloudFront things get more complicated due to caching. While you can easily add S3 (static website, see below) as a distribution, CloudFront itself will not follow the resulting redirects but return them to the client. In the default configuration CloudFront will then cache this redirect and NOT the desired image (see https://stackoverflow.com/a/41293603/6669161 for a workaround with S3).
The best way would be to use a proxy as described here. However, this adds another layer which might be undesirable. Another solution is to use Lambda#Edge functions as (see here). In our case, we use S3 as a normal distribution and make use of the "Origin Response"-Event (you can edit them in the "Behaviors" tab of your distribution). Our Lambda function just checks if the request to S3 was successful. If it was, we can just forward it. If it was not, we assume that the desired object was not yet created. The lambda function then calls our application that generates the object and stores it in S3. For simplicity, the application replies with a redirect (to CloudFront again), too - so we can just forward that to the client. A drawback is that the client itself will see one redirect. Also make sure to set the cache headers so that CloudFront does not cache the lambda redirect.
Here is an example Lambda Function. This one just redirects the client to the resolve url (which then redirects to CloudFront again). Keep in mind that this will result in more round trips for the client (which is not perfect). However, it will reduce the execution time of your Lambda function. Make sure to add the Base Lambda#Edge policy (related tutorial).
env = {
'Protocol': 'http',
'HostName': 'localhost:8000',
'HttpErrorCodeReturnedEquals': '404',
'HttpRedirectCode': '307',
'KeyPrefixEquals': '/cache/media/',
'ReplaceKeyPrefixWith': '/media/resolve-image-filter/'
}
def lambda_handler(event, context):
response = event['Records'][0]['cf']['response']
if int(response['status']) == int(env['HttpErrorCodeReturnedEquals']):
request = event['Records'][0]['cf']['request']
original_path = request['uri']
if original_path.startswith(env['KeyPrefixEquals']):
new_path = env['ReplaceKeyPrefixWith'] + original_path[len(env['KeyPrefixEquals']):]
else:
new_path = original_path
location = '{}://{}{}'.format(env['Protocol'], env['HostName'], new_path)
response['status'] = env['HttpRedirectCode']
response['statusDescription'] = 'Resolve Image'
response['headers']['location'] = [{
'key': 'Location',
'value': location
}]
response['headers']['cache-control'] = [{
'key': 'Cache-Control',
'value': 'no-cache' # Also make sure that you minimum TTL is set to 0 (for the distribution)
}]
return response
If you just want to use S3 as a cache (without CloudFront). Using static website hosting and a redirect rule will redirect clients to the resolve url in case of missing cache files (you will need to rewrite S3 Cache Resolver urls to the website version though):
<RoutingRules>
<RoutingRule>
<Condition><HttpErrorCodeReturnedEquals>403</HttpErrorCodeReturnedEquals>
<KeyPrefixEquals>cache/media/</KeyPrefixEquals>
</Condition>
<Redirect>
<Protocol>http</Protocol>
<HostName>localhost</HostName>
<ReplaceKeyPrefixWith>media/image-filter/</ReplaceKeyPrefixWith>
<HttpRedirectCode>307</HttpRedirectCode>
</Redirect>
</RoutingRule>
</RoutingRules>

Symfony2 reverse proxy - separating the caching of the same URL based on a cookie or other setting

I'm using the default Symfony2 reverse proxy, and I need to separate caching of the same URL based on a cookie setting.
The site allows for a 'basic' site view by scaling down images and removing JavaScript. As the content is the same I have used the same URL, but of course caching is an issue.
I need to be able to cache them separately (or just ensure the cache is cleared).
I have tried changing the Vary header which normally I have set to:
Vary: Accept-Encoding
..and have set it to either:
Vary: Accept-Encoding, basic
..or:
Vary: Accept-Encoding, normal
That actually works brilliant in Chrome on my Mac, but Safari ignores it. I stopped checking other browsers at this point.
What's the best way to do it?
Vary: Accept-Encoding tells the client or your reverse proxy to seperate caching of the url for different encodings. ( i.e. with/without gzip).
This is especially useful if you have older browsers not supporting gzip being served the page without gzip and for newer browsers with gzip ... so your reverse proxy will cache both variants of the same url. Without this setting your reverse proxy might end up serving gzipped content to browsers not supporting it... giving unwanted results.
What you are looking for is probably the ETag header which is kind of like a "cookie" for caching.
The client will send the etag of his cached version and you can then choose from your application wether the client's cached version is valid or not.
$response = new Response();
$response->setETag(md5('some_identifier'));
if( $response->isNotModified($this->get('request')) )
{
// automatically returns null content response with http 304 ( not modified ) header
return $response;
}
else
{
// .. otherwise return a new response, possibly with a different ETag
// $reponse->setEtag(md5('another_identifier'));
return $this->renderView('MyBundle:Main:index.html.twig', array(), $response);
}
inspired by this blogpost.

Not able to cache the main_frame requests

I am working on a chrome extension which modifies the http response headers.
https://chrome.google.com/webstore/detail/super-cache/fglobbnbihckpkodmeefhagijjcjnbeh/details
I am not able to cache main_frame requests. I am able to control the caching of the static requests though.
For example if I hit http://apple.com I receive the following headers for the main_frame.
Accept-Ranges:bytes
Cache-Control:max-age=276
Connection:keep-alive
Content-Encoding:gzip
Content-Length:3310
Content-Type:text/html; charset=UTF-8
Date:Tue, 12 Mar 2013 09:24:12 GMT
Expires:Tue, 12 Mar 2013 09:28:48 GMT
Server:Apache
Vary:Accept-Encoding
But every time I hit the url the browser tries to access the server and ultimately receive a 200 response. I have tried all the possible combinations that the headers can be set to enable caching on the main_frame.
I want that when the user hits the url from the navigation bar of chrome and no requests are made.
You're missing some sort of cache validation in your response headers. ETag header can be used to control that, by adding values to it that would identify a unique response. You can read a bit about it in the Apache ETag documentation, but I'd simply include ETag: [filename] in your response headers in your example:
Accept-Ranges:bytes
Cache-Control:max-age=276
Connection:keep-alive
Content-Encoding:gzip
Content-Length:3310
Content-Type:text/html; charset=UTF-8
Date:Tue, 12 Mar 2013 09:24:12 GMT
Expires:Tue, 12 Mar 2013 09:28:48 GMT
Server:Apache
ETag: File:"somefile.html"
Vary:Accept-Encoding
These ETag values can include pretty much anything, such as file name, file size, custom values,... that can be separated by a semicolon ;. If the values include spaces, then enclose them in double quotation marks ". For Example:
ETag: File:"YouTube_cd_Fdly3rX8.jpg"; Size:12169
Together with Cache-Control, Expires and some other header values that might change (when included and browser knows how to interpret them), will form a basis for Browser's cache validator.
Looking at your sample response headers, you might want to increase the max-age value in your Cache-Control to a lot higher value, as your example suggest they should be cached client-side for only 276 seconds. The Expires header value also seems a bit short.
More on how to set these values and how browsers are expected to validate cache control headers can be read in the RFC2616, Section 14.9.
EDIT: After further debugging, checking and re-checking the behavior of Chrome's cache validation, it turns out it indeed doesn't respect properly set Cache-Control response headers. On request of the OP, I've reported this issue to the Chrome support:
Chrome, Version 25.0.1364.172 m
Disrespecting Cache-Control on main document requests when serving
static files from a web server, while respecting same header response
on linked contents.
Test setup:
Requesting a static HTML document from a web server (MIME text/html),
that contains another static HTML document withing an IFRAME (also
MIME text/html). The IFRAME served document has same response headers
attached to it by a web server response as the main document:
Date: Thu, 21 Mar 2013 16:29:28 GMT
Expires: Thu, 21 Mar 2013 16:33:59 GMT
Cache-Control: max-age=301, max-stale=299, only-if-cached
Expected behavior:
Main document and the document served within an IFRAME will be cached
locally with initial request for the duration of at least 301
(max-age) seconds, and additional 299 (max-stale) seconds for normal
(non-forced) load requests. Any subsequent requests within this
time-frame of 301 seconds that aren't expected to invalidate local
cache (such as forced-refresh with CTRL+F5 or Reload context menu
command) and are initiated by a normal page load request (e.g.
re-entering relevant URL in the address bar) will be loaded from the
local cache with a status message 200 OK (from cache), if none of the
local cache controlling information indicates it otherwise (same URL,
requested within valid cache time-frame, document was tagged to be
cached in its response headers correctly).
Problem:
The main document is not loaded via its cached copy and an additional
request is made to the web server, resulting in a status code 304 Not
Modified. The document within an IFRAME however is loaded from the
local cache correctly and results in a status message 200 OK (from
cache).
Notes:
None of the cache-control tags or any combination of their values seem
to have any positive effect on the behavior of local cache for the
main document. Including a non-unique ETag value does not resolve the
problem of caching main document either. Other major vendor browsers
(tested in IE, Firefox, Opera) respect Cache-Control headers on main
document.

Resources