When does a browser send a conditional get - performance

My understanding is a browser sends a conditional get if it is not sure if the compoonent it has is up to date. The question is what defines "not sure". I presume it varys on browser and maybe other conditions. I also presume it's not something you can control, i.e. I can do anything to make browser change the not sure criteria. I can't set something in the way I can set an expires header to what I want on a Http server. Is this correct?
Note:P if you can answer this question with just areally good link that's fine. I couldn't find one.

The HTTP has an expiration model. It defines how servers can specify their responses to expire, and how the age and freshness of a response can be determined by caches. Additionally to that, there are further Cache-Control directives that can modify the behavior for how responses are to be handled dependent or independent of their freshness.
To conclude, HTTP caching is quite complex and the actual behavior depends on multiple factors:
The cache-control directives can be broken down into these general categories:
Restrictions on what are cacheable; these may only be imposed by the origin server.
Restrictions on what may be stored by a cache; these may be imposed by either the origin server or the user agent.
Modifications of the basic expiration mechanism; these may be imposed by either the origin server or the user agent.
Controls over cache revalidation and reload; these may only be imposed by a user agent.
Control over transformation of entities.
But in the end, it all depends on the user agent’s obedience of these rules.

Related

Why is it considered dangerous to overwrite cache control headers?

I am using Retrofit/OkHTTP to consume a REST API which doesn't provide proper cache headers. In order to work around this, I've written a cache interceptor which will add cache control headers to the response.
I have seen in multiple places this is considered dangerous, for example the okhttp recipe for this has the following comment:
/** Dangerous interceptor that rewrites the server's cache-control header. */
(source)
Why exactly is this considered to be dangerous? I'd like to understand the risks of doing this.
You're making decisions on the client that should instead be made on the server. The risk is that the client ends up caching something it shouldn't, which will result in stale data being returned.

How to properly disable CloudFront caching for API requests

I have AWS CloudFront set up to serve static content and an API server from the same domain. This means I have two behaviors, one serving the API gateway from /api and one serving content from S3 for all other paths.
The problem is that I don't want CF to cache anything from the API server. I was surprised to find that there does not seem to be a "master setting" to completely disable caching behavior; instead the docs refer to using Cache-control: no-cache on the origin or turning on "Cache based on all headers" in the CF behavior.
However, none of these solutions completely satisfy my requirement on simply not caching and passing through all headers. If I add Cache-control: no-cache to my origin, CF seems to respect that, but there is still the question of CF settings. CF has a setting "Cache based on headers: All/None/Whitelist". The docs says that to disable caching, "All" should be used, which makes sense (although a bit vague as compared to having an actual setting: Disable caching: on/off). However, as soon as I set this setting to "All", the entire behavior is disabled and my API requests will not reach the API gateway at all, but default to the S3 behavior used for non-API requests. I cannot find any explanation for this, it's like the entire behavior fails or is disabled without explanation.
The other problem is that headers not present in the "Cache based on" will not only be excluded from caching (which I don't want anyway), but also stripped out of the request before it's forwarded. This may make sense for a cache to work as intended, but since I don't want any caching it's quite frustrating to have to make sure to white-list all the headers I ever use. It would feel much better to rely on "All" than having to make sure the white-list is always up to date.
So:
1) is there a better, clearer way to disable caching completely for one path of a CloudFront distribution? Ideally it shouldn't even rely on the origin setting certain headers, it should just completely disable any attempt to cache requests within the configured path.
2) Why is my entire API gateway target disabled when I select "All" in the "Cache based on headers" box? What's happening here?
Just had the same issue and ended up contacting AWS help on it.
Based on the AWS associate, the reason why "Cache based on headers = All" doesn't work on API gateway is that, forwarding of "Host" header makes it non-applicable.
The way that it worked for us is by setting the TTL on the API gateway behavior to zero, for both Max and min value.

Caching with SSL certification

I read if the request is authenticated or secure, it won't be cached. We previously worked on our cache and now planning to purchase a SSL certificate.
If caching cannot be done with SSL connection then is that mean our work on caching is useless?
Reference: http://www.mnot.net/cache_docs/
Your reference is wrong. Content sent over https will be cached in modern browsers, but they obviously cannot be cached in intermediate proxies. See http://arstechnica.com/business/2011/03/https-is-great-here-is-why-everyone-needs-to-use-it-so-ars-can-too/ or https://blog.httpwatch.com/2011/01/28/top-7-myths-about-https/ for example.
You can use the Cache-Control: public header to allow a representation served over HTTPS to be cached.
While the document you refer to says "If the request is authenticated or secure (i.e., HTTPS), it won’t be cached.", it's within a paragraph starting with "Generally speaking, these are the most common rules that are followed [...]".
The same document goes into more details after this:
Useful Cache-Control response headers include:
public — marks authenticated responses as cacheable; normally, if HTTP authentication is required, responses are automatically private.
(What applies to HTTP with authentication also applies to HTTPS.)
Obviously, documents that actually contain sensitive information only aimed for the authenticated user should not be served with this header, since they really shouldn't be cached. However, using this header for items that are suitable for caching (e.g. common images and scripts) should improve the performance of your website (as expected for caching over plain HTTP).
What will never happen with HTTPS is the caching of resources by intermediate proxy servers (between the client and your web-server, at least the external part, if you have a load-balancer or similar). Some CDNs will serve content over HTTPS (assuming it's suitable for your system to trust these CDNs). In general, these proxy servers wouldn't fall under the control of your cache design anyway.

Geo location / filtering and HTTP Caching

I'm trying to add cache support (both HTTP and server) for a ASP.NET Web Api solution.
The solution is geo located, meaning that I can get different results based on the caller IP address.
The question can be trivially solved for the server side cache, using an approach similar to VaryByCustom (like this one). However that does not solve the problem with the client side HTTP caches. Here are the alternatives
I'm considering the following options:
Enforcing a must-revalidate in the cache
Keep the validation server side using the same algorithm to VaryByCustom, but include the extra cache revalidate calls on the server side with ETAGS or any mechanism that keep track of the originally cached value country of origin.
Creating country specific routes HTTP 302
In this scenario an application invoking
http://site/UK/content
Redirects to US version if originating from an US IP address when the cache has expired
http://site/US/content
It might present out-of-date contents that do not match the IP of origin local. That is not a serious problem if the cache expires is a small value (< 1 hour), since country changes are fairly uncommon.
What is the recommended solution?
I'm not sure I understand the problem.
For client caching, if you enable private caching then a user in UK will cache the UK version of http://site/content and the US user will cache the US version of http://site/content.
The only problem I can see is if a user travels from the US to the UK and accesses the content. Or if you allow public caching and some intermediary is shared by US and UK users.
After detailed evaluation first approach was chosen. Actual implementation is:
Create a cache key that depends on the country of origin IP address
Create a ETag for that cache key and store it in Server cache
Additional requests that include ETag If-None-Match header are evaluates in server for cache freshness:
If the country of origin is the same, the cache key will be the same and ETag is valid, returning a HTTP 304 not modified
If the country of origin is different, cache key will be different and such the ETag is not valid, returning a HTTP 200 and returning a new ETag.
Agree with Poul-Henning Kamp geolocation should be a transport level thing, but unfortunately is not, so this is the only way we could come up with to ensure cache freshness for a given country.
The disadvantage is that cannot have any infrastructure cache, e.g., all requests need to check the server for cache freshness.

Whats the difference between these difference cache-control params?

cache-control:no-cache;
cache-control:max-age:0;
cache-control:no-store;
are they different from browser to browser. What should then be considered a standard?
no-cache
If the no-cache directive does not specify a field-name, then a cache MUST NOT use the response to satisfy a subsequent request without successful revalidation with the origin server. This allows an origin server to prevent caching even by caches that have been configured to return stale responses to client requests.
If the no-cache directive does specify one or more field-names, then a cache MAY use the response to satisfy a subsequent request, subject to any other restrictions on caching. However, the specified field-name(s) MUST NOT be sent in the response to a subsequent request without successful revalidation with the origin server. This allows an origin server to prevent the re-use of certain header fields in a response, while still allowing caching of the rest of the response.
max-age
Indicates that the client is willing to accept a response whose age is no greater than the specified time in seconds. Unless max- stale directive is also included, the client is not willing to accept a stale response.
no-store
The purpose of the no-store directive is to prevent the inadvertent release or retention of sensitive information (for example, on backup tapes). The no-store directive applies to the entire message, and MAY be sent either in a response or in a request. If sent in a request, a cache MUST NOT store any part of either this request or any response to it. If sent in a response, a cache MUST NOT store any part of either this response or the request that elicited it. This directive applies to both non- shared and shared caches. "MUST NOT store" in this context means that the cache MUST NOT intentionally store the information in non-volatile storage, and MUST make a best-effort attempt to remove the information from volatile storage as promptly as possible after forwarding it.
Even when this directive is associated with a response, users might explicitly store such a response outside of the caching system (e.g., with a "Save As" dialog). History buffers MAY store such responses as part of their normal operation.
The purpose of this directive is to meet the stated requirements of certain users and service authors who are concerned about accidental releases of information via unanticipated accesses to cache data structures. While the use of this directive might improve privacy in some cases, we caution that it is NOT in any way a reliable or sufficient mechanism for ensuring privacy. In particular, malicious or compromised caches might not recognize or obey this directive, and communications networks might be vulnerable to eavesdropping.
more information # http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.2

Resources