How to properly disable CloudFront caching for API requests - caching

I have AWS CloudFront set up to serve static content and an API server from the same domain. This means I have two behaviors, one serving the API gateway from /api and one serving content from S3 for all other paths.
The problem is that I don't want CF to cache anything from the API server. I was surprised to find that there does not seem to be a "master setting" to completely disable caching behavior; instead the docs refer to using Cache-control: no-cache on the origin or turning on "Cache based on all headers" in the CF behavior.
However, none of these solutions completely satisfy my requirement on simply not caching and passing through all headers. If I add Cache-control: no-cache to my origin, CF seems to respect that, but there is still the question of CF settings. CF has a setting "Cache based on headers: All/None/Whitelist". The docs says that to disable caching, "All" should be used, which makes sense (although a bit vague as compared to having an actual setting: Disable caching: on/off). However, as soon as I set this setting to "All", the entire behavior is disabled and my API requests will not reach the API gateway at all, but default to the S3 behavior used for non-API requests. I cannot find any explanation for this, it's like the entire behavior fails or is disabled without explanation.
The other problem is that headers not present in the "Cache based on" will not only be excluded from caching (which I don't want anyway), but also stripped out of the request before it's forwarded. This may make sense for a cache to work as intended, but since I don't want any caching it's quite frustrating to have to make sure to white-list all the headers I ever use. It would feel much better to rely on "All" than having to make sure the white-list is always up to date.
So:
1) is there a better, clearer way to disable caching completely for one path of a CloudFront distribution? Ideally it shouldn't even rely on the origin setting certain headers, it should just completely disable any attempt to cache requests within the configured path.
2) Why is my entire API gateway target disabled when I select "All" in the "Cache based on headers" box? What's happening here?

Just had the same issue and ended up contacting AWS help on it.
Based on the AWS associate, the reason why "Cache based on headers = All" doesn't work on API gateway is that, forwarding of "Host" header makes it non-applicable.
The way that it worked for us is by setting the TTL on the API gateway behavior to zero, for both Max and min value.

Related

Problem with caching in cloudFront (CDN- AWS), cache and collapse forward are acting the same

Users are sending requests for me for information, some times this info is personalized, and sometimes it's common to all. When it's common to all I want the CDN to cache the answer. I distinguish between the users by query params.
The problem - The problem is when I want them to stop using the cache, and each to get their personalized content.
I thought that if I'll send the response with cache flag off (max-age = 0) the user's requests won't use the cache, the requests would come to me, and I would give them their personalized answer.
But the CDN in that case doing collapse forwarding and all the users continue getting not personalized answers.
I didn't found a way to disable the collapse forwarding, and I couldn't find a way to continue serving the client personally after they start using the cache.
Any ideas?
I had the same issue and chatting with support (detailed here). It turns out that you can't stop CloudFront from sending non-personalized results just by using the Cache-Control header. Instead you have to have separate "Behaviors" and namely a CachingDisabled behavior for routes where you need personalized responses.

Why is it considered dangerous to overwrite cache control headers?

I am using Retrofit/OkHTTP to consume a REST API which doesn't provide proper cache headers. In order to work around this, I've written a cache interceptor which will add cache control headers to the response.
I have seen in multiple places this is considered dangerous, for example the okhttp recipe for this has the following comment:
/** Dangerous interceptor that rewrites the server's cache-control header. */
(source)
Why exactly is this considered to be dangerous? I'd like to understand the risks of doing this.
You're making decisions on the client that should instead be made on the server. The risk is that the client ends up caching something it shouldn't, which will result in stale data being returned.

How to make clients request over HTTPS without HSTS preload?

If I request our website using HTTP http://example.com, the reponse is 301 Moved Permanently with the Location header set to https://example.com - which, of course, is insecure due to MIM attack.
Is there not a way to just repond to the browser something along "make the same request again but this time over HTTPS" insted of explicitly telling the browser the URL?
I was expecting to find this kind of solution on Troy Hunt's blog post but the only suggestion there is to use HSTS preload (ie. register our site with Google) which we do not want to do.
HTTP Strict-Transport-Security (HSTS) allows you to send a HTTP Header to say “next time you use this domain - make sure it’s over HTTPS even if the user types http:// or uses a link beginning http://“.
In Apache it is set with the following config:
Header always set Strict-Transport-Security "max-age=60;"
This sends the message telling the browser to remember this header for 60 seconds. You should increase this as you confirm there are no issues. A setting of 63072000 (2 years) is often recommended.
So this is more secure than a redirect as it happens automatically without needing an insecure HTTP request to be sent which could be intercepted, read and even changed on an insecure network.
For example let’s imagine you have logged on to your internet banking previously on your home WiFi, the browser has remembered the HSTS setting and then you visit your local coffee shop. Here you try to connect to the free WiFi but actually connect to a hackers WiFi instead. If you go to your internet banking with a HTTP link, bookmark or by typing the URL, then HSTS will kick in and you will go over HTTPS straight away and the hacker cannot unencrypt your traffic (within reason).
So. All is good. You can also add the includeSubDomains attribute:
Header always set Strict-Transport-Security "max-age= 63072000; includeSubDomains"
Which adds extra security.
The one flaw with HSTS is it requires that initial connection to load this HTTP header and protect you in future. It also times out after the max-age time. That’s where preload comes in. You can submit your domain to the browsers and they will load this domain’s HSTS setting into the browser code and make this permanent so even that first connection is secure.
However I really don’t like preload to be honest. I just find the fact it’s out of your control dangerous. So if you discover some domain is not using HTTPS (e.g. http://blog.example.com or http://intranet.example.com or http://dev.example.com) then as soon as the preload comes into affect - BANG you’ve forced yourself to upgrade these and quickly as they are inaccessible until then. Reversing from browser takes months at least and few can live with that downtime. Of course you should test this, but that requires going to https://example.com (instead of https://www.example.com) and using includeSubDomains to fully replicate what preload will do and not everyone does that. There are many, many examples of sites getting this wrong.
You’ve also got to ask what you are protecting against and what risks you are exposing yourself to? With a http:// link a hacker intercepting could get access to cookies (which the site can protect against by using the secure attribute on cookies) and possibly intercept the traffic by keeping you on http:// instead of upgrading to https:// (which is mostly mitigated with HSTS and is increasingly flagged by the browser anyway). Remember that even on an attackers WiFi network the green padlock means the connection is secure (within reasonable limitations). So as long as you look for this (and your users do, which is more difficult I admit) the risks are reasonably small. This is why the move to HTTPS everywhere and then HTTPS by default is so important. So for most sites I think HSTS without preload is sufficient, and leaves the control with you the site owner.

Caching with SSL certification

I read if the request is authenticated or secure, it won't be cached. We previously worked on our cache and now planning to purchase a SSL certificate.
If caching cannot be done with SSL connection then is that mean our work on caching is useless?
Reference: http://www.mnot.net/cache_docs/
Your reference is wrong. Content sent over https will be cached in modern browsers, but they obviously cannot be cached in intermediate proxies. See http://arstechnica.com/business/2011/03/https-is-great-here-is-why-everyone-needs-to-use-it-so-ars-can-too/ or https://blog.httpwatch.com/2011/01/28/top-7-myths-about-https/ for example.
You can use the Cache-Control: public header to allow a representation served over HTTPS to be cached.
While the document you refer to says "If the request is authenticated or secure (i.e., HTTPS), it won’t be cached.", it's within a paragraph starting with "Generally speaking, these are the most common rules that are followed [...]".
The same document goes into more details after this:
Useful Cache-Control response headers include:
public — marks authenticated responses as cacheable; normally, if HTTP authentication is required, responses are automatically private.
(What applies to HTTP with authentication also applies to HTTPS.)
Obviously, documents that actually contain sensitive information only aimed for the authenticated user should not be served with this header, since they really shouldn't be cached. However, using this header for items that are suitable for caching (e.g. common images and scripts) should improve the performance of your website (as expected for caching over plain HTTP).
What will never happen with HTTPS is the caching of resources by intermediate proxy servers (between the client and your web-server, at least the external part, if you have a load-balancer or similar). Some CDNs will serve content over HTTPS (assuming it's suitable for your system to trust these CDNs). In general, these proxy servers wouldn't fall under the control of your cache design anyway.

When does a browser send a conditional get

My understanding is a browser sends a conditional get if it is not sure if the compoonent it has is up to date. The question is what defines "not sure". I presume it varys on browser and maybe other conditions. I also presume it's not something you can control, i.e. I can do anything to make browser change the not sure criteria. I can't set something in the way I can set an expires header to what I want on a Http server. Is this correct?
Note:P if you can answer this question with just areally good link that's fine. I couldn't find one.
The HTTP has an expiration model. It defines how servers can specify their responses to expire, and how the age and freshness of a response can be determined by caches. Additionally to that, there are further Cache-Control directives that can modify the behavior for how responses are to be handled dependent or independent of their freshness.
To conclude, HTTP caching is quite complex and the actual behavior depends on multiple factors:
The cache-control directives can be broken down into these general categories:
Restrictions on what are cacheable; these may only be imposed by the origin server.
Restrictions on what may be stored by a cache; these may be imposed by either the origin server or the user agent.
Modifications of the basic expiration mechanism; these may be imposed by either the origin server or the user agent.
Controls over cache revalidation and reload; these may only be imposed by a user agent.
Control over transformation of entities.
But in the end, it all depends on the user agent’s obedience of these rules.

Resources