I want to find a minimal set of headers, that work with "all" caches and browsers (also when using HTTPS!)
On my web site, I'll have three kinds of resources:
(1) Forever cacheable (public / equal for all users)
Example: 0A470E87CC58EE133616F402B5DDFE1C.cache.html (auto generated by GWT)
These files are automatically assigned a new name, when they change content (based on the MD5).
They should get cached as much as possible, even when using HTTPS (so I assume, I should set Cache-Control: public, especially for Firefox?)
They shouldn't require the client to make a round-trip to the server to validate, if the content has changed.
(2) Changing occasionally (public / equal for all users)
Examples: index.html, mymodule.nocache.js
These files change their content without changing the URL, when a new version of the site is deployed.
They can be cached, but probably need a round-trip to be revalidated every time.
(3) Individual for each request (private / user specific)
Example: JSON responses
These resources should never be cached unencrypted to disk under no circumstances. (Except maybe I'll have a few specific requests that could be cached.)
I have a general idea on which headers I would probably use for each type, but there's always something I could be missing.
I would probably use these settings:
Cache-Control: max-age=31556926 – Representations may be cached by any cache. The cached representation is to be considered fresh for 1 year:
To mark a response as "never expires," an origin server sends an
Expires date approximately one year from the time the response is
sent. HTTP/1.1 servers SHOULD NOT send Expires dates more than one
year in the future.
Cache-Control: no-cache – Representations are allowed to be cached by any cache. But caches must submit the request to the origin server for validation before releasing a cached copy.
Cache-Control: no-store – Caches must not cache the representation under any condition.
See Mark Nottingham’s Caching Tutorial for further information.
Cases one and two are actually the same scenario.
You should set Cache-Control: public and then generate a URL with includes the build number / version of the site so that you have immutable resources that could potentially last forever.
You also want to set the Expires header a year or more in the future so that the client will not need to issue a freshness check.
For case 3, you could all of the following for maximum flexibility:
"Cache-Control", "no-cache, must-revalidate"
"Expires", 0
"Pragma", "no-cache"
Related
Sometimes, when an object is not in the cache, varnish will send an object that has a real size smaller than the size declared in the content-length header. For example - only part of the picture.
Is it possible to construct such a rule...?
if (beresp.http.content-lenght <> real_object_body_size) { return(retry); }
I wrote a script that tests the same request against the varnish and the backend. It compares the downloaded size with the content-lenght header. The backend, unlike varnish, sometimes ends up with a timeout but the size is always fine. The problem is rare but annoying because the objects are set to long user cache time.
After a few days I can say that the problem was in occasional backend problems with varnish's ability to send a chunked transfer if the object is not in the cache.
Thank you #Thijs Feryn for pointing this out. I knew about that property but until I read it here, I didn't connect it to my problem at all.
It seems that "set beresp.do_stream = false;" solved the problem.
Let's say if the bundle.js is 750kb, is it true that if you set it to "never expire" or "expire in 10 years", then the browser doesn't need to fetch that file and therefore can make your website and pageload faster?
(By the way, it is said that Cache-control: max-age is HTTP/1.1 and if it and Expires: are both present, Cache-control overrides the Expires header.)
What if you have a weekly or 2-month release cycle, should you set the Expires header to 1 week or 2 months? I was thinking that with a weekly release cycle, some of your JavaScript or CSS files may stay the same, so it may be good to just set it to expire in 6 months or even 10 years? (because one week later it hasn't changed?)
But what if your file changed but the browser doesn't fetch it? Is it true that if you use the ETag mechanism, or if you use bundle.53ae823.js or bundle.2020-03-12-08.js, then you can "force" it to expire by the ETag being different, or the file name being different? What is a good way to set the Expires header length?
You could use all of those techniques at the same time. If you can include a hash in your filenames and the references to them (example of how to do it in webpack), then you can be sure that you can bust the cache whenever you change something.
Then you could set Cache-control: max-age to 31536000 (a year, the max) because you can trust the file with URI to not change (because of the hash, if the file changes ,the URI will too).
And of course you can implement ETAG in order to enable revalidation of the cache. This will still require a request to be sent to your server/CDN, but will save the user having to download the whole file if it hasn't changed.
There is a nice article on this topic, here.
I use Rack:Etag to genereate proper etag values based on the response from the server and for development I use Rack::Cache to verify that that caching I expect to happen really does
But i have a slight predicament:
I send a request and get these headers back
Age →0
Cache-Control →public, max-age=10
Connection →keep-alive
Content-Length →4895
Content-Type →application/json; charset=UTF-8
Date →Wed, 02 Oct 2013 06:55:42 GMT
ETag →"dd65de99f4ce58f9de42992c4e263e80"
Server →thin 1.5.1 codename Straight Razor
X-Content-Digest →0879e41b0d8e9b351f517dd46823095e0e99abd8
X-Rack-Cache →stale, invalid, store
If i after 11 seconds send a new request with If-None-Match=dd65de99f4ce58f9de42992c4e263e80 then i expect to get a 304 but always get 200 with the above headers.
What am I missing ?
Could it be due to max-age directive being set to 10
When the max-age cache-control directive is present in a cached response, the response is stale if its current age is greater than the age value given (in seconds) at the time of a new request for that resource.
Although, did you already know that? As you tried after 11 secs!
I think the solution was to load the rack middleware as follows for coorect chaining
use Rack::Cache
use Rack::ConditionalGet
use Rack::ETag
And also send If-None-Match with "" around hash, which i think seems pretty fragile
context:
My first project with COSM is recording datapoints from my electric meter. When I look at the graph of the feed, it's flatlined at zero even though the datapoints appear to be correctly received.
Any idea what's wrong, or things I should look for in order to debug it?
more info:
When I debug my feed, I see it receiving approximately eight API requests per minute expected.
Here's an instance of a received datapoint as viewed by COSM's 'debug feed' interface. Note in particular that the response is 200 [ok], and the request body has a sensible timestamp and a non-zero value:
200 POST /api/v2/feeds/129722/datastreams/1/datapoints 06-05-2013 | 08:16:54 +0000
Request Headers
Version HTTP/1.0
Host api.cosm.com
X-Request-Start 1367828214422267
X-Apikey <expunged>
Accept-Encoding gzip, deflate, compress
Accept */*
User-Agent python-requests/1.2.0 CPython/2.7.3 Linux/3.6.11+
Origin
Request Body
{"at": "2013-05-06T08:16:57", "value": 164.0}
Response Headers
X-Request-Id 245ee3ca6bd99efd156bff2416404c33f4bb7f0f
Cache-Control max-age=0
Content-Type application/json; charset=utf-8
Content-Length 0
Response Body
[No Body]
update
Even though the docs specify that JSON is the default, I explicitly added a ".json" to the POST URL (/api/v2/feeds/129722/datastreams/1/datapoints.json) but that didn't appear to make any difference.
update 2
I enclosed the "value" value in strings, so the request body now reads (for example):
{"at": "2013-05-06T15:37:06", "value": "187.0"}
Still behaving the same: I see updates in the debug view, but only zeros are reported in the graph view.
update 3
I tried looking at the data using the API rather than the COSM-supplied graph. My guess is that the datapoints are not being stored for some reason (despite the 200 OK return status). If I put this URL in the web browser:
http://api.cosm.com/v2/feeds/129722.json?interval=0
I get this in response:
{"id":129722,
"title":"Rainforest Automation RAVEn",
"private":"false",
"tags":["power"],
"feed":"https://api.cosm.com/v2/feeds/129722.json",
"status":"frozen",
"updated":"2013-05-06T05:07:30.169344Z",
"created":"2013-05-06T00:16:56.701456Z",
"creator":"https://cosm.com/users/fearless_fool",
"version":"1.0.0",
"datastreams":[{"id":"1",
"current_value":"0",
"at":"2013-05-06T05:07:29.982986Z",
"max_value":"0.0",
"min_value":"0.0",
"unit":{"type":"derivedSI","symbol":"W","label":"watt"}}],
"location":{"disposition":"fixed","exposure":"indoor","domain":"physical"}
}
Note that the status is listed as "frozen" (last update received > 15 minutes ago) despite the fact that the debug tool is showing seven or eight updates per minute. Where are my datapoints going?
Resolved. As #Calum at cosm.com support kindly pointed out, I wasn't sending a properly formed request. I was sending the following JSON:
{"at": "2013-05-06T08:16:57", "value": 164.0}
when I should have be sending:
{
"datapoints":[
{"at": "2013-05-06T08:16:57", "value": 164.0}
]
}
Calum also points out that I could batch up several points at a time to cut down the number of transactions. I'll get to that, but for now, suffice it to say that fixing the body of the request made everything start working.
That sounds like a bug in the graphs, I have seen something very similar a few times.
I often use Cosm Feed Viewer Chrome extension, which displays the latest values in real-time using the WebSocket endpoint.
It should be not too hard to put together custom graphs with Rickshaw and CosmJS.
On cache invalidation, the HTTP spec says:
Some HTTP methods MUST cause a cache to invalidate an entity. This is either the entity referred to by the Request-URI, or by the Location or Content-Location headers (if present).
I am trying to invalidate an entry in my cache through the use of the Location header, but it doesn't appear to be working. Here's my use case:
15:13:23.9988 | GET | folders/folder.34/contents - 200 (OK)
15:13:24.1318 | PUT | folders/folder.34/contents/test.docx - 201 (Created)
15:13:24.1548 | GET | folders/folder.34/contents - 200 (OK) (cached)
The response of (2) contains a Location header with the URI used in requests (1) and (3). I believe this should invalidate the cached entry for folders/folder.34/contents, but the response in (3) appears to be coming from cache anyway according to the HttpWebResponse.IsFromCache property.
I have tried numerous URI formats in the Location header, including:
Location: ../../../folders/folder.34/contents (and other assorted '../' counts)
Location: folders/folder.34/contents
Location: /folders/folder.34/contents
Location: http://myhostname/folders/folder.34/contents
But still (3) always seems to come from cache. What am I doing wrong here?
HTTPBis is much clearer:
https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-p6-cache-22#section-6
Because unsafe request methods (Section 4.2.1 of [Part2]) such as
PUT, POST or DELETE have the potential for changing state on the
origin server, intervening caches can use them to keep their contents
up-to-date.
A cache MUST invalidate the effective Request URI (Section 5.5 of
[Part1]) as well as the URI(s) in the Location and Content-Location
response header fields (if present) when a non-error response to a
request with an unsafe method is received.
So if this is not the behavior you're seeing, my assumption would simply be that the particular HTTP client you are using does not have the correct behavior.
I'd especially expect:
Location: /folders/folder.34/contents
To have the correct behavior.