On cache invalidation, the HTTP spec says:
Some HTTP methods MUST cause a cache to invalidate an entity. This is either the entity referred to by the Request-URI, or by the Location or Content-Location headers (if present).
I am trying to invalidate an entry in my cache through the use of the Location header, but it doesn't appear to be working. Here's my use case:
15:13:23.9988 | GET | folders/folder.34/contents - 200 (OK)
15:13:24.1318 | PUT | folders/folder.34/contents/test.docx - 201 (Created)
15:13:24.1548 | GET | folders/folder.34/contents - 200 (OK) (cached)
The response of (2) contains a Location header with the URI used in requests (1) and (3). I believe this should invalidate the cached entry for folders/folder.34/contents, but the response in (3) appears to be coming from cache anyway according to the HttpWebResponse.IsFromCache property.
I have tried numerous URI formats in the Location header, including:
Location: ../../../folders/folder.34/contents (and other assorted '../' counts)
Location: folders/folder.34/contents
Location: /folders/folder.34/contents
Location: http://myhostname/folders/folder.34/contents
But still (3) always seems to come from cache. What am I doing wrong here?
HTTPBis is much clearer:
https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-p6-cache-22#section-6
Because unsafe request methods (Section 4.2.1 of [Part2]) such as
PUT, POST or DELETE have the potential for changing state on the
origin server, intervening caches can use them to keep their contents
up-to-date.
A cache MUST invalidate the effective Request URI (Section 5.5 of
[Part1]) as well as the URI(s) in the Location and Content-Location
response header fields (if present) when a non-error response to a
request with an unsafe method is received.
So if this is not the behavior you're seeing, my assumption would simply be that the particular HTTP client you are using does not have the correct behavior.
I'd especially expect:
Location: /folders/folder.34/contents
To have the correct behavior.
Related
I want to find a minimal set of headers, that work with "all" caches and browsers (also when using HTTPS!)
On my web site, I'll have three kinds of resources:
(1) Forever cacheable (public / equal for all users)
Example: 0A470E87CC58EE133616F402B5DDFE1C.cache.html (auto generated by GWT)
These files are automatically assigned a new name, when they change content (based on the MD5).
They should get cached as much as possible, even when using HTTPS (so I assume, I should set Cache-Control: public, especially for Firefox?)
They shouldn't require the client to make a round-trip to the server to validate, if the content has changed.
(2) Changing occasionally (public / equal for all users)
Examples: index.html, mymodule.nocache.js
These files change their content without changing the URL, when a new version of the site is deployed.
They can be cached, but probably need a round-trip to be revalidated every time.
(3) Individual for each request (private / user specific)
Example: JSON responses
These resources should never be cached unencrypted to disk under no circumstances. (Except maybe I'll have a few specific requests that could be cached.)
I have a general idea on which headers I would probably use for each type, but there's always something I could be missing.
I would probably use these settings:
Cache-Control: max-age=31556926 – Representations may be cached by any cache. The cached representation is to be considered fresh for 1 year:
To mark a response as "never expires," an origin server sends an
Expires date approximately one year from the time the response is
sent. HTTP/1.1 servers SHOULD NOT send Expires dates more than one
year in the future.
Cache-Control: no-cache – Representations are allowed to be cached by any cache. But caches must submit the request to the origin server for validation before releasing a cached copy.
Cache-Control: no-store – Caches must not cache the representation under any condition.
See Mark Nottingham’s Caching Tutorial for further information.
Cases one and two are actually the same scenario.
You should set Cache-Control: public and then generate a URL with includes the build number / version of the site so that you have immutable resources that could potentially last forever.
You also want to set the Expires header a year or more in the future so that the client will not need to issue a freshness check.
For case 3, you could all of the following for maximum flexibility:
"Cache-Control", "no-cache, must-revalidate"
"Expires", 0
"Pragma", "no-cache"
I'm new with JMeter and I'm working with a script for checking the cache, the scenario was to:
do a GET request
verify that it has been cached
do a DELETE request
verify that the cache has been deleted
However since there are three instances in the environment I'm working on my script is having intermittent failures because of the different x-internal-service-host being returned.
My test results looks like this:
do a GET request (x-internal-service-host returned is
c3c8021a51a9:8080 - PASS)
verify that it has been cached
(x-internal-service-host returned is 4eb7ac9d4a76:8080 - FAILED
because the call made was for c3c8021a51a9:8080)
do a DELETE
request (x-internal-service-host returned
is c3c8021a51a9:8080 - PASS)
verify that the cache has been deleted
(request x-internal-service-host returned is c3c8021a51a9:8080 -
FAILED because the delete was made for 4eb7ac9d4a76:8080)
I'm thinking of extracting the Response Header x-internal-service-host in step 1 but I'm not sure how to proceed once extracted, is it possible to send a GET request until the Response Header extracted matches the Response Header result for steps 2 to 4 or is there a better way to do this?
EDIT: Later on it seemed that some browsers are confusing the term "response headers" with "response message (without response body)". So that's where this question was coming wrong. The browsers were just incorrect. Meanwhile I gave answer to my own question.
In Firefox you can check the "raw headers" via "Firefox Developer Tools" > "Network".
Example of the raw response headers:
Date: Thu, 23 Nov 2017 12:43:21 GMT
Server: Apache/2.4.17 (Unix) OpenSSL/1.0.1e-fips PHP/5.6.16
Connection: Keep-Alive
Keep-Alive: timeout=1, max=100
Cache-Control: max-age=9, public
Vary: User-Agent
But I miss (for example): "HTTP/1.1 304 Not Modified". Firefox shows somewhere else "304 Not Modified", but not RAW.
So they let me think I have the response headers in raw form, but actually it's only a part of the response headers, excluding the status code. This can be really confusing for people.
In my opinion it would make much more sense to add the "status code" too at that place. Now it's not really logical.
Is this a bug or how I have to see it?
Meanwhile I can answer my own question.
At this moment, some browsers are usings terms wrong. They are confusing "reponse headers" and "response message (without message body)". So that's why I was confused when asking the question and that was what the question was about.
See: https://www.rfc-editor.org/rfc/rfc7230#page-8 (2.1. Client/Server Messaging)
A server responds to a client's request by sending one or more HTTP
response messages, each beginning with a status line that includes
the protocol version, a success or error code, and textual reason
phrase (Section 3.1.2), possibly followed by header fields containing
server information, resource metadata, and representation metadata
(Section 3.2), an empty line to indicate the end of the header
section, and finally a message body containing the payload body (if
any, Section 3.3).
So with other words.
RESPONSE MESSAGE:
Status line (ending with CRLF, so 2 extra Bytes)
Header field 1, in case of (ending with CRLF, so 2 extra Bytes)
Header field 2, in case of (ending with CRLF, so 2 extra Bytes)
Header field 3, et cetera (ending with CRLF, so 2 extra Bytes)
Empty line to indicate the end of the header section (CRLF, so 2 extra Bytes)
Message body / Response body, if any
RESPONSE HEADERS / RESPONSE HEADER FIELDS:
Header field 1 in case of (ending with CRLF, so 2 extra Bytes)
Header field 2 in case of (ending with CRLF, so 2 extra Bytes)
Header field 3 et cetera (ending with CRLF, so 2 extra Bytes)
So officially the status-line is not part of the "reponse headers", but only part of the "response message".
Firefox is for example showing the size of the "reponse headers" in: "Firefox Developer Tools" > "Network" > click on row > Headers tab.
This size also includes the size of i.a. the status-line. The size of the raw response headers must correspond this size, but that's not the case at this moment. So or they need to change the size, or they must extra include the "status-line" + "empty line" in the (raw) response headers and they must give it another name (for example: response message - message body).
Chrome is also doing this wrong. For example see: https://developers.google.com/web/tools/chrome-devtools/network-performance/reference#requests
They are saying there:
Size. The combined size of the response headers plus the response body, as delivered by the server.
But they actually mean something different (also according the value of size in practise). They actually mean this:
Size. The combined size of the response message, without the message body (instead of response headers) plus the response body, as delivered by the server.
So actually with other words:
Size. The combined size of the response message, as delivered by the server.
So that's where my origin question was coming from. Apparently it's a difficult subject for browsers at this moment, because I tested it in 2 browsers and both are making mistakes with it.
So because of that, it's not weird if people would think the status-line is part of the response headers.
context:
My first project with COSM is recording datapoints from my electric meter. When I look at the graph of the feed, it's flatlined at zero even though the datapoints appear to be correctly received.
Any idea what's wrong, or things I should look for in order to debug it?
more info:
When I debug my feed, I see it receiving approximately eight API requests per minute expected.
Here's an instance of a received datapoint as viewed by COSM's 'debug feed' interface. Note in particular that the response is 200 [ok], and the request body has a sensible timestamp and a non-zero value:
200 POST /api/v2/feeds/129722/datastreams/1/datapoints 06-05-2013 | 08:16:54 +0000
Request Headers
Version HTTP/1.0
Host api.cosm.com
X-Request-Start 1367828214422267
X-Apikey <expunged>
Accept-Encoding gzip, deflate, compress
Accept */*
User-Agent python-requests/1.2.0 CPython/2.7.3 Linux/3.6.11+
Origin
Request Body
{"at": "2013-05-06T08:16:57", "value": 164.0}
Response Headers
X-Request-Id 245ee3ca6bd99efd156bff2416404c33f4bb7f0f
Cache-Control max-age=0
Content-Type application/json; charset=utf-8
Content-Length 0
Response Body
[No Body]
update
Even though the docs specify that JSON is the default, I explicitly added a ".json" to the POST URL (/api/v2/feeds/129722/datastreams/1/datapoints.json) but that didn't appear to make any difference.
update 2
I enclosed the "value" value in strings, so the request body now reads (for example):
{"at": "2013-05-06T15:37:06", "value": "187.0"}
Still behaving the same: I see updates in the debug view, but only zeros are reported in the graph view.
update 3
I tried looking at the data using the API rather than the COSM-supplied graph. My guess is that the datapoints are not being stored for some reason (despite the 200 OK return status). If I put this URL in the web browser:
http://api.cosm.com/v2/feeds/129722.json?interval=0
I get this in response:
{"id":129722,
"title":"Rainforest Automation RAVEn",
"private":"false",
"tags":["power"],
"feed":"https://api.cosm.com/v2/feeds/129722.json",
"status":"frozen",
"updated":"2013-05-06T05:07:30.169344Z",
"created":"2013-05-06T00:16:56.701456Z",
"creator":"https://cosm.com/users/fearless_fool",
"version":"1.0.0",
"datastreams":[{"id":"1",
"current_value":"0",
"at":"2013-05-06T05:07:29.982986Z",
"max_value":"0.0",
"min_value":"0.0",
"unit":{"type":"derivedSI","symbol":"W","label":"watt"}}],
"location":{"disposition":"fixed","exposure":"indoor","domain":"physical"}
}
Note that the status is listed as "frozen" (last update received > 15 minutes ago) despite the fact that the debug tool is showing seven or eight updates per minute. Where are my datapoints going?
Resolved. As #Calum at cosm.com support kindly pointed out, I wasn't sending a properly formed request. I was sending the following JSON:
{"at": "2013-05-06T08:16:57", "value": 164.0}
when I should have be sending:
{
"datapoints":[
{"at": "2013-05-06T08:16:57", "value": 164.0}
]
}
Calum also points out that I could batch up several points at a time to cut down the number of transactions. I'll get to that, but for now, suffice it to say that fixing the body of the request made everything start working.
That sounds like a bug in the graphs, I have seen something very similar a few times.
I often use Cosm Feed Viewer Chrome extension, which displays the latest values in real-time using the WebSocket endpoint.
It should be not too hard to put together custom graphs with Rickshaw and CosmJS.
Do browsers join concurrent identical HTTP GET requests? At least, for static or cache-able content?
That is, if something like this happens:
| AJAX/HTTP-GET(resourceX)
| [start download]------------------------------------------->[finish download]
|
| AJAX/HTTP-GET(resourceX)
| [start download]--------->etc...
|
+------------------------------------------------------------------> Time
Will the browser figure out "Hey you're already trying to download resourceX! Don't try downloading it twice, it won't do anything!"?
**Update:
Now of course, I can go to some site and try downloading a big file (e.g., "BigFile"), and click the link twice; this will (duplicately) download both BigFile and BigFile(1). Granted, it's an error on the user's part, but still...
For cache-able resources (e.g., downloading some javascript file), it seems pretty inefficient if browsers couldn't figure out these duplicates...
The browser won't notice. It acts just like regular HTTP traffic. It might cache the request once the first one is finished (if the proper cache-control fields are set), but concurrently, no.