How to force specific HTTP headers of cacheable resources served by Windows Azure Storage? - caching

In the document "Optimize Cache - Make the Web Faster - Google Developers", Google states that
It is important to specify ONE of Expires or Cache-Control
max-age, AND ONE of Last-Modified or ETag, for all cacheable
resources. It is redundant to specify both Expires and Cache-Control:
max-age, or to specify both Last-Modified and ETag.
I'm using the classes in Microsoft.WindowsAzure.StorageClient to upload images to a blob container, pratically the same code as can be seen in the open source project Azure Storage Explorer.
The resulting image is served with BOTH Last-Modified and ETag:
ETag: 0x8CFED5D3384112F
Last-Modified: Tue, 12 Mar 2013 17:21:43 GMT
So the next browser requests sends HTTP headers:
If-Modified-Since: Tue, 12 Mar 2013 17:21:43 GMT
If-None-Match: 0x8CFED5D3384112F
How can I force Azure Storage to use only one of the two directives to eliminate this redudancy?

The short answer is you can't.
When thinking about this it's important to remember that when you access blob storage you not accessing a file on a web server, you're using a rest API that happens to return files.
Microsoft offer no way to remove headers that they deem as essential to the storage API.
If you're worried about excessive headers, the response also includes several x-ms-... headers which are intended for clients of the API that aren't browsers.

Personally I would not worry that much about both tags being send back as this is actually recommended by RFC 2616.
13.3.4 Rules for When to Use Entity Tags and Last-Modified Dates
...
HTTP/1.1 origin servers:
...
... the preferred behavior for an HTTP/1.1 origin server is to send both a strong entity tag and a Last-Modified value.
An HTTP 1.1 client MUST use the Entity Tags in any cache-conditional requests, and if both an Entity Tags and Last-Modified are present, it SHOULD use both.
I hope that will clarify why both tags are sent back from the Azure Storage server.

Related

Browser serving an obsolete Authorization header from cache

I'm experiencing my client getting logged out after an innocent request to my server. I control both ends and after a lot of debugging, I've found out that the following happens:
The client sends the request with a correct Authorization header.
The server responds with 304 Not Modified without any Authorization header.
The browser serves the full response including an obsolete Authorization header as found in its cache.
From now on, the client uses the obsolete Authorization and gets kicked out.
From what I know, the browser must not cache any request containing Authorization. Nonetheless,
chrome://view-http-cache/http://localhost:10080/api/SearchHost
shows
HTTP/1.1 200 OK
Date: Thu, 23 Nov 2017 23:50:16 GMT
Vary: origin, accept-encoding, authorization, x-role
Cache-Control: must-revalidate
Server: 171123_073418-d8d7cb0 =
x-delay-seconds: 3
Authorization: Wl6pPirDLQqWqYv
Expires: Thu, 01 Jan 1970 00:00:00 GMT
ETag: "zUxy1pv3CQ3IYTFlBg3Z3vYovg3zSw2L"
Content-Encoding: gzip
Content-Type: application/json;charset=utf-8
Content-Length: 255
The funny server header replaces the Jetty server header (which shouldn't be served for security reasons) by some internal information - ignore that. This is what curl says:
< HTTP/1.1 304 Not Modified
< Date: Thu, 23 Nov 2017 23:58:18 GMT
< Vary: origin, accept-encoding, authorization, x-role
< Cache-Control: must-revalidate
< Server: 171123_073418-d8d7cb0 =
< ETag: "zUxy1pv3CQ3IYTFlBg3Z3vYovg3zSw2L"
< x-delay-seconds: 3
< Content-Encoding: gzip
This happens in Firefox, too, although I can't reproduce it at the moment.
The RFC continues, and it looks like the answer linked above is not exact:
unless a cache directive that allows such responses to be stored is present in the response
It looks like the response is cacheable. That's fine, I do want the content to be cached, but I don't want the Authorization header to be served from cache. Is this possible?
Explanation of my problem
My server used to send the Authorization header only when responding to a login request. This used to work fine, problems come with new requirements.
Our site allows users to stay logged in arbitrarily long (we do no sensitive business). We're changing the format of the authorization token and we don't want to force all users to log in again because of this. Therefore, I made the server to send the updated authorization token whenever it sees an obsolete but valid one. So now any response may contain an authorization token, but most of them do not.
The browser cache combining the still valid response with an obsolete authorization token comes in the way.
As a workaround, I made the server send no etag when an authorization token is present. It works, but I'd prefer some cleaner solution.
The quote in the linked answer is misleading because it omitted an important part: "if the cache is shared".
Here's the correct quote (RFC7234 Section 3):
A cache MUST NOT store a response to any request, unless: ... the Authorization header field (see Section 4.2 of [RFC7235]) does not appear in the request, if the cache is shared,
That part of the RFC is basically a summary.
This is the complete rule (RFC7234 Section 3.2) that says essentially the same thing:
A shared cache MUST NOT use a cached response to a request with an Authorization header field (Section 4.2 of [RFC7235]) to satisfy any subsequent request unless a cache directive that allows such responses to be stored is present in the response.
Is a browser cache a shared cache?
This is explained in Introduction section of the RFC:
A private cache, in contrast, is dedicated to a single user; often, they are deployed as a component of a user agent.
That means a browser cache is private cache.
It is not a shared cache, so the above rule does not apply, which means both Chrome and Firefox do their jobs correctly.
Now the solution.
The specification suggests the possibility of a cached response containing Authorization to be reused without the Authorization header.
Unfortunately, it also says that the feature is not widely implemented.
So, the easiest and also the most future-proof solution I can think of is make sure that any response containing Authorization token isn't cached.
For instance, whenever the server sees an obsolete but valid Authorization token, send a new valid one along with Cache-Control: no-store to disallow caching.
Also you must never send Cache-Control: must-revalidate with Authorization header because the must-revalidate directive actually allows the response to be cached, including by shared caches which can cause even more problems in the future.
... unless a cache directive that allows such responses to be stored is present in the response.
In this specification, the following Cache-Control response directives (Section 5.2.2) have such an effect: must-revalidate, public, and s-maxage.
My current solution is to send an authorization header in every response; using a placeholder value of - when no authorization is wanted.
The placeholder value is obviously meaningless and the client knows it and happily ignores it.
This solution is ugly as it adds maybe 20 bytes to every response, but that's still better than occasionally having to resend a whole response content as with the approach mentioned in my question. Moreover, with HTTP/2 it'll be free.

How can I force browsers use expire (rather than etags/modification time)

I have a server serving static files with an expire of 1 year but my browsers still get the file and receive a 304 - not modified. I want to prevent the browser from even attempting the connection. I realize that that happens in several different setup (Ubuntu Linux) with Chrome and Firefox.
My test is as follows:
$ wget -S -O /dev/null http://trepalchi.it/static/img/logo-trepalchi-black.svg
--2016-03-14 19:56:14-- http://trepalchi.it/static/img/logo-trepalchi-black.svg
Risoluzione di trepalchi.it (trepalchi.it)... 213.136.85.40
Connessione a trepalchi.it (trepalchi.it)|213.136.85.40|:80... connesso.
Richiesta HTTP inviata, in attesa di risposta...
HTTP/1.1 200 OK
Server: nginx/1.2.1
Date: Mon, 14 Mar 2016 18:55:29 GMT
Content-Type: image/svg+xml
Content-Length: 25081
Last-Modified: Sun, 13 Mar 2016 23:03:53 GMT
Connection: keep-alive
Expires: Tue, 14 Mar 2017 18:55:29 GMT
Cache-Control: max-age=31536000
Cache-Control: public
Accept-Ranges: bytes
Lunghezza: 25081 (24K) [image/svg+xml]
Salvataggio in: "/dev/null"
100%[==================================================================================================================================================================>] 25.081 --.-K/s in 0,07s
2016-03-14 19:56:14 (328 KB/s) - "/dev/null" salvato [25081/25081]
That shows correctly providing expires and cache control (via nginx).
If I go to the browser and enable cache and open diagnostic tools, the first hit I notice a 200 return code, then I refresh the page (Control-r) and find a connection with 304 - not modified return code.
Inspecting firefox cache (about:cache) I found it with correct expire and clicking on the link in that page I was able to see it w/o hitting the remote server.
I also tested pages where the images are loaded from image tags (as opposed as directly called as in the example above).
All the letterature I read state that with such an expire the browser should not even try a connection. What's wrong? RFC 2616 states:
HTTP caching works best when caches can entirely avoid making requests
to the origin server. The primary mechanism for avoiding requests is
for an origin server to provide an explicit expiration time in the
future, indicating that a response MAY be used to satisfy subsequent
requests. In other words, a cache can return a fresh response without
first contacting the server.
Note another question addresses the problem of how 304 is generated, I just want to prevent the connection to be made
Sandro
Thanks

Will ETag work without cache-control header set by web server

My server returns the following headers for a file:
Accept-Ranges:bytes
Connection:Keep-Alive
Content-Length:155
Content-Type:text/css
Date:Thu, 06 Feb 2014 18:32:44 GMT
ETag:"99000000061b06-9b-4f1c118fdd2f1"
Keep-Alive:timeout=5, max=100
Last-Modified:Thu, 06 Feb 2014 18:32:37 GMT
As you can see, it doesn't return cache-control header, however it returns ETag and Last-Modified headers.
My question is whether browser is going to cache the requested file? I can observr that during the following requests the browser sends ETag:"99000000061b06-9b-4f1c118fdd2f1" in headers and server returns status code 304.
And second question: Will browser cache resource and request it with ETag if Cache-control is set to no-cache?
For first part of question - It is up to your browser (its implementation and configuration) if the response will be cached and when will be revalidated. The only (standardized) difference between browser behaviour with validation headers and behaviour without validation headers is that former one can reduce traffic with server using validation.
Second question: Yes. Browser will cache resource but every time you open the page browser will ask origin server if resource was not modified. If not modified server will respond 304 and browser will display cached content. Otherwise server will send new content.
My guess would be ETag can serve as cache-control: no-cache.

How does Image Cache works in Browser

How can I make my Image cached by browser and expire after particular period of time
There are several HTTP headers that you can use to effect changes to the content caching policies.
This one:
Cache-control: no-cache
instructs the browser not to cache the content at all.
This one:
Expires: Tue, 20 Mar 2024 02:00:00 GMT
instructs the browser to expire its cached copy by the given time.
This one:
ETag: ab10be20
instructs the browser to consider ab10be20 as a hash of the contents and only if the value changes upon subsequent requests should it need to download the new contents.
Note that all of these are effectively advisory only and there's no possible way to enforce the purging of caches remotely.

How do I know if image from my site is getting cached by proxy servers?

The following is a http response header from a image on our company's website.
HTTP/1.1 200 OK
Content-Type: image/png
Last-Modified: Thu, 03 Dec 2009 15:51:57 GMT
Accept-Ranges: bytes
ETag: "1e61e38a3074ca1:0"
Date: Wed, 06 Jan 2010 22:06:23 GMT
Content-Length: 9140
Is there anyway to know if this image is publicly cacheable in some proxy server? The RFC definition seems to be ambiguous http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.1 and http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.4.
Run RED on your URL and it'll tell you whether the response is cacheable, among other information.
The headers you show appear to be cacheable.
If you would like to control the caching behavior of correctly configured proxies and web browsers, you might investigate using the Cache-Control and Expires headers to gain additional control.
Here is a webpage I had bookmarked that has one person's opinion of how to intepret the specifications you list (plus some other ones):
http://www.web-caching.com/mnot_tutorial/how.html
If you need to guarantee that someone sees a completely new image each time (even with misconfigured devices between you and them), you may want to consider using a randomized or GUID value as part of the URL.
Here is a tutorial on setting headers for proxy caching. Be sure to read the part about setting cookies!

Resources