Does 'cache-control: public' actually have any effect? [duplicate] - caching

This question already has an answer here:
Cache-Control public or private by default?
(1 answer)
Closed 1 year ago.
Is cache-control: public, max-age=60 handled any differently by any known caches than cache-control: max-age=60?
I've struggled to verify it, but I assume that if any cache-control instructions exist on a response, then it is assumed that that response is cacheable by the browser and any intermediate caches unless cache-control: private is set.
Does this mean that cache-control: public is redundant? Isn't this the behaviour you'd get anyway?

On more careful reading of MDN, I think I've found the answer to my own question.
TL;DR: cache-control: public will explicitly override the default rules for which sort of responses are considered cacheable, so shouldn't be used lightly. Many responses normally shouldn't be cached - e.g. POSTs or 302 redirects. See below for the full set of rules.
From the cache-control page:
public
The response may be stored by any cache, even if the response is normally non-cacheable (emphasis mine).
So what does "cacheable" mean? From the "cacheable" page on the MDN glossary:
A cacheable response is an HTTP response that can be cached, that is stored to be retrieved and used later, saving a new request to the server. Not all HTTP responses can be cached, these are the following constraints for an HTTP response to be cached:
The method used in the request is itself cacheable, that is either a GET or a HEAD method. A response to a POST or PATCH request can also be cached if freshness is indicated and the Content-Location header is set, but this is rarely implemented. (For example, Firefox does not support it per https://bugzilla.mozilla.org/show_bug.cgi?id=109553.) Other methods, like PUT or DELETE are not cacheable and their result cannot be cached.
The status code of the response is known by the application caching, and it is considered cacheable. The following status code are cacheable: 200, 203, 204, 206, 300, 301, 404, 405, 410, 414, and 501.
There are (I assume this should be aren't) specific headers in the response, like Cache-Control, that prevents caching.
So it looks like one should only use cache-control: public when they explicitly want to override these rules for cacheability, which in general is probably not a good idea.

Related

ResponseBodyEmitter with mutliple HTTP code status

For some security reason, we have added a common spring filter that perform a cross-cutting check and drop a 403 when a security rule is broken.
The solution is working fine in case of synchronous endpoint (i.e. returning a ResponseBody<SomeDTO>)
However, we have several asynchronous endpoints which return ResponseBodyEmitter.
If (checks == false){
// Then the filter should erase the response body and override the http status to 403.
}
For asynchronous method,the checks are performed during the response sending.
Hence the endpoint may send at the beginning 200 with a response body and then drops the connection with 403 and empty body.
Question:
From design wise, is this behavior coherent with REST ? (i.e. OK OK OK and then FORBIDDEN)
A HTTP request can only return 1 HTTP response, so "OK OK OK and then FORBIDDEN" is not actually possible. So REST doesn't care if you have some internal state that defaults to 200, eventually becomes 403 and then the last one gets sent back to the client. HTTP and REST doesn't know what goes on in your server leading to that 403.
However, if you have some mechanism that does permission checks after processing the entire request, and if the user doesn't have permission it erases the response body and sets a 403 response, that only seems reasonable for safe methods (e.g.: read-only methods like GET).

How do I instruct Varnish to cache based on response header?

I have a series of videos on various URLs across the site. I would like to cache them all with Varnish, even if the user is logged in. I can use the VCL configuration in order to whitelist certain URLs for caching. But I don't know how I can whitelist all videos.
Is there a way to say that all responses that return with a content type of video/mp4 are cached?
Deciding to serve an object from cache, and deciding to store an object in cache are 2 different things in Varnish. Both situations need to be accounted for.
Built-in VCL
In order to understand what happens out-of-the-box, you need to have a look at the following VCL file: https://github.com/varnishcache/varnish-cache/blob/master/bin/varnishd/builtin.vcl
This is the built-in VCL that is executed. For each subroutine this logic is executed when you don't do an explicit return(xyz) in your VCL file for the corresponding subroutine.
This means you have some sort of safety net to protect you.
From a technical perspective, the Varnish Compiler will add the built-in VCL parts to the subroutines you extend in your VCL prior to compiling the VCL file into a C code.
What do we learn from the built-in VCL
The built-in VCL teaches us the following things when it comes to cacheability:
Varnish will only serve an object from cache for GET and HEAD requests (see vcl_recv)
Varnish will not serve an object from cache if a Cookie or Authorization header is present (see vcl_recv)
Varnish will not store an object in cache if a Set-Cookie header is present (see vcl_backend_response)
Varnish will not store an object in cache if TTL is zero or less (see vcl_backend_response)
Varnish will not store an object in cache if the Cache-Control header contains no-store(see vcl_backend_response)
Varnish will not store an object in cache if the Surrogate-Control header contains no-cache, no-store or private (see vcl_backend_response)
Varnish will not store an object in cache if the Vary header performs cache variations on all headers via a * (see vcl_backend_response)
How to make sure video files are served from cache
In vcl_recv you have to make sure Varnish is willing to lookup video requests from cache. In practical terms, this means taking care of the cookies.
My advice would be to remove all cookies, except the ones you really need. The example below will remove all cookies, except the PHPSESSID cookie, which is required by my backend:
vcl 4.1;
sub vcl_recv {
if (req.http.Cookie) {
set req.http.Cookie = ";" + req.http.Cookie;
set req.http.Cookie = regsuball(req.http.Cookie, "; +", ";");
set req.http.Cookie = regsuball(req.http.Cookie, ";(PHPSESSID)=", "; \1=");
set req.http.Cookie = regsuball(req.http.Cookie, ";[^ ][^;]*", "");
set req.http.Cookie = regsuball(req.http.Cookie, "^[; ]+|[; ]+$", "");
if (req.http.cookie ~ "^\s*$") {
unset req.http.cookie;
}
}
This example will remove tracking cookies from the request, which is fine, because they are processed by Javascript.
When the PHPSESSID cookie is not set, vcl_recv will fall back on the built-in VCL, and the request will be served from cache.
But in your case, you want them to be served from cache, even when users are logged in. This is fine, because videos are static files that aren't influenced by state.
The problem is that in the request context you cannot specify Content-Type information. You'll have to use the URL.
Here's example:
sub vcl_recv {
if(req.url ~ "^/video") {
return(hash);
}
}
This snippet will bypass the built-in VCL, and will explicitly look the object up in cache, if the URL matches the ^/video regex pattern.
How to make sure video files are stored in cache
When you do an explicit return(hash) in vcl_recv, the hash will be created and a cache lookup takes place. But if the object is not stored in cache, you'll still have a miss, which results in a backend request.
When the backend response comes back, it needs to be stored in cache for a certain amount of time. Given the built-in VCL, you have to make sure you don't specify a zero-TTL, and the Cache-Control response header must return cacheable syntax.
This is how I would set the Cache-Control header if for example we want to cache video files for a day:
Cache-Control: public, max-age=86400
Varnish will respect this header, and will set the TTL to 1 day based on the max-age syntax.
Even if you don't specify a Cache-Control header, Varnish will still store it in cache, but for 2 minutes, which is the default TTL.
Here's an example where Varnish will not store the object in cache, based on the Cache-Control header:
Cache-Control: private, max-age=0, s-maxage=0 ,no-cache, no-store
If either of these expressions is in Cache-Control, Varnish will make the object uncacheable.
If Set-Cookie headers are part of the response, the object becomes uncacheable as well.
In case you don't have full control over the headers that are returned by the backend server, you can still force your luck in VCL.
Here's a VCL snippet where we force objects to be stored in cache for images and videos:
sub vcl_backend_response {
if(beresp.http.Content-Type ~ "^(image|video)/") {
set beresp.ttl = 1d;
unset beresp.http.set-cookie;
return (deliver);
}
}
This example will strip off Set-Cookie headers, will override the TTL to a day, and it will explicitly store and deliver the object. This is only the case when the Content-Type response headers either starts with image/, or with video/

file upload with G-WAN

I'm trying to upload an image file with XMLHttpRequest and FormData API to my G-WAN server but I can't retrieve the file contents.Here is the output in h_entities:
-----------------------------75940917410019849751723987620 Content-Disposition: form-data; name="test_param" test_value
-----------------------------75940917410019849751723987620 Content-Disposition: form-data; name="uploadedFile"; filename="test.png" Content-Type: image/png PNG
"Content-type" is multipart/form-data. Has anyone managed to upload files to G-WAN? I couldn't find an example. Thanks!
I have spent the last hours to write a dedicated example for G-WAN v3.10+ called entity.html (a form with a [browse] button to POST a file) which will call the entity.c servlet (which reports everything about the POST entity and lists its first 1,000 bytes).
The first thing that your G-WAN version 3.3 will hit is the POST entity size limit - and you will get an HTTP error 413 (request entity too large).
I wrote a servlet example called entity_size.c to show how to modify this limit (this can be done in a handler or in a servlet, and at any time).
The second thing that you would have to do (and which is done automatically with G-WAN v3.10+) is to load any missing part of the entity that was not already loaded with the HTTP request (in v3.3 this would require a handler playing with the return codes to read more until all is loaded).
These two points were recurring questions (on the now defunct forum) so that was time to give an example.
So, unless you are very courageous (and are willing to follow the above indications), you have understood that it's probably better to wait for v3.10 which will come later this month: it will do the job for you and you will have a couple of tested examples to learn from.
Note that entity.c can also be called to analyse any kind of request, not only multipart/form-data encodings, and not only POST requests (it also demonstrates PUT and DELETE).
A last note: I have also modified the code to make sure that BOTH the URI parameters and a POST/PUT entity are listed in the servlet argc/argv main().
This allows things like: POST /?blog/user/1245/day/24 where all you need to access user=1245 and day=24 (as well as the entity) is to read argv[].
Hope that this will help you in your projects!

YES or NO: Can a server send an HTTP response, while still uploading the file from the correlative HTTP request?

If a website user submits an HTML form with: (1) a post method; (2) a multipart/form-data enctype; and, (3) a large attached file, can the server upload a posted file, and send a server generated HTTP response before the file upload is completed, without using AJAX?
That's pretty dense. So, I wrote an example to illustrate what I mean. Let's say there is an image upload form with a caption field.
<form action="upload-with-caption/" method="post" enctype="multipart/form-data">
<input type="hidden" id="hiddenInfo" name="hiddenInfo" />
File: <input type="file" name="imgFile" id="imgFile" /><br />
Caption: <input type="text" name="caption" id="caption" />
<input type="submit" />
</form>
I want to store the caption in a database table with the the definition:
[files_table]
file_id [uniqueidentifier]
file_caption [varchar(500)]
file_status [int]
Then I want to upload the file to /root/{unique-id}/filename.ext.
file_status is mapped to a C# enum with the following definition:
enum FileUploadStatus{
Error = 0,
Uploading = 1,
Uploaded = 2
}
When the form submits, if the file is too large to process in 1 second, I want to send the webpage back a response that says it is currently uploading.
Can I do this with a single synchronous HTTP post?
Note: I will obviously want to check for the status updates later using AJAX, but that is not what this question is asking. I am specifically asking if the file can continue to upload after the response is sent.
HTTP is a synchronous protocol.
You cannot send a response until you receive the entire request.
Looking at the HTTP specifications alone (RFC's 753x), then the answer is Yes (and, the currently accepted answer is wrong). HTML specifically I don't think have anything to add.
The HTTP/1.1 protocol "relies on the order of response arrival to correspond exactly to the order in which requests are made on the same connection" (RFC 7230 §5.6). Timing has nothing to do with it.
Not only does the protocol allow for early responses, but some message semantics from categories 4xx (Client Error) and 5xx (Server Error) actually expects the response to be sent before the request has completed.
Let's take an example. If you intend to send five trillion billion million gigabytes to a web server (let's assume this number fit whatever data types are in use for the Content-Length header), when would you expect to receive a "413 Payload Too Large" response back? As soon as possible or only after a couple of decades when the request transfer completes? Obviously the sooner the better!
2xx (Successful) responses are a bit different. These responses "indicates that the client's request was successfully received, understood, and accepted" (RFC 7231 §6.3). Sending back this type of response early is likely to confuse the client.
Instead, what you probably want to send back as an early response belongs to the 1xx (Informational) category. These are referred to as "interim responses" meant to supersede but not obsolete the final response.
RFC 7231 §6.2:
The 1xx (Informational) class of status code indicates an interim
response for communicating connection status or request progress
prior to completing the requested action and sending a final
response.
RFC 7230 §5.6:
More than one response message per request only occurs
when one or more informational responses precede a
final response to the same request.
RFC 7231 §5.1.1 has a great example where a client is about to send a "presumably large" message but instead of immediately sending the body after the head, the client includes an Expect: 100-continue header and then goes into a short paus whilst expecting the server to either reject the message or welcoming the client to carry on by means of responding a "100 Continue" interim response. This then potentially avoids the client having to transmit bytes for nothing. Smart!
Finally, I thought long and hard about when would we ever want to send a 2xx (Successful) response back to the client before the request has completed? I can only come up with one single scenario - and this is certainly not a common case, but I am going to have it stated: If the server has consumed enough of the request in order to take action and the server wish to discard the remaining body because the residue is sufficiently large and at the same time of no more use to the server, then respond 202 Accepted and include a "Connection: close" header.
This is obviously not good for connection re-use and could also easily lead to confused clients and so the payoff why we're responding early should be 1) advantageous enough to mitigate the overhead of establishing a new connection, 2) advantageous enough to offset the danger of crashing clients that was not prepared for an early response, and 3) be well documented.
The "Connection: close" header will explicitly instruct the client to stop sending the request (RFC 7230 §6.3). And due to message framing, the connection is dead anyways as there is no way for the communication to resume with a new message exchange pair over the same connection. Technically speaking, the client can cleanly abort a chunked transfer (RFC 7230 §4.1) and thus save the connection, but this is details and not applicable in the general case.

Does an HTTP Status code of 0 have any meaning?

It appears that when you make an XMLHttpRequest from a script in a browser, if the browser is set to work offline or if the network cable is pulled out, the request completes with an error and with status = 0. 0 is not listed among permissible HTTP status codes.
What does a status code of 0 mean? Does it mean the same thing across all browsers, and for all HTTP client utilities? Is it part of the HTTP spec or is it part of some other protocol spec? It seems to mean that the HTTP request could not be made at all, perhaps because the server address could not be resolved.
What error message is appropriate to show the user? "Either you are not connected to the internet, or the website is encountering problems, or there might be a typing error in the address"?
I should add to this that I see the behavior in FireFox when set to "Work Offline", but not in Microsoft Internet Explorer when set to "Work Offline". In IE, the user gets a dialog giving the option to go online. FireFox does not notify the user before returning the error.
I am asking this in response to a request to "show a better error message". What Internet Explorer does is good. It tells the user what is causing the problem and gives them the option to fix it. In order to give an equivalent UX with FireFox I need to infer the cause of the problem and inform the user. So what in total can I infer from Status 0? Does it have a universal meaning or does it tell me nothing?
Short Answer
It's not a HTTP response code, but it is documented by WhatWG as a valid value for the status attribute of an XMLHttpRequest or a Fetch response.
Broadly speaking, it is a default value used when there is no real HTTP status code to report and/or an error occurred sending the request or receiving the response. Possible scenarios where this is the case include, but are not limited to:
The request hasn't yet been sent, or was aborted.
The browser is still waiting to receive the response status and headers.
The connection dropped during the request.
The request timed out.
The request encountered an infinite redirect loop.
The browser knows the response status, but you're not allowed to access it due to security restrictions related to the Same-origin Policy.
Long Answer
First, to reiterate: 0 is not a HTTP status code. There's a complete list of them in RFC 7231 Section 6.1, that doesn't include 0, and the intro to section 6 states clearly that
The status-code element is a three-digit integer code
which 0 is not.
However, 0 as a value of the .status attribute of an XMLHttpRequest object is documented, although it's a little tricky to track down all the relevant details. We begin at https://xhr.spec.whatwg.org/#the-status-attribute, documenting the .status attribute, which simply states:
The status attribute must return the response’s status.
That may sound vacuous and tautological, but in reality there is information here! Remember that this documentation is talking here about the .response attribute of an XMLHttpRequest, not a response, so this tells us that the definition of the status on an XHR object is deferred to the definition of a response's status in the Fetch spec.
But what response object? What if we haven't actually received a response yet? The inline link on the word "response" takes us to https://xhr.spec.whatwg.org/#response, which explains:
An XMLHttpRequest has an associated response. Unless stated otherwise it is a network error.
So the response whose status we're getting is by default a network error. And by searching for everywhere the phrase "set response to" is used in the XHR spec, we can see that it's set in five places:
To a network error, when:
the open() method is called, or
the response's body's stream is errored (see the algorithm described in the docs for the send() method)
the timed out flag is set, causing the request error steps to run
the abort() method is called, causing the request error steps to run
To the response produced by sending the request using Fetch, by way of either the Fetch process response task (if the XHR request is asychronous) or the Fetch process response end-of-body task (if the XHR request is synchronous).
Looking in the Fetch standard, we can see that:
A network error is a response whose status is always 0
so we can immediately tell that we'll see a status of 0 on an XHR object in any of the cases where the XHR spec says the response should be set to a network error. (Interestingly, this includes the case where the body's stream gets "errored", which the Fetch spec tells us can happen during parsing the body after having received the status - so in theory I suppose it is possible for an XHR object to have its status set to 200, then encounter an out-of-memory error or something while receiving the body and so change its status back to 0.)
We also note in the Fetch standard that a couple of other response types exist whose status is defined to be 0, whose existence relates to cross-origin requests and the same-origin policy:
An opaque filtered response is a filtered response whose ... status is 0...
An opaque-redirect filtered response is a filtered response whose ... status is 0...
(various other details about these two response types omitted).
But beyond these, there are also many cases where the Fetch algorithm (rather than the XHR spec, which we've already looked at) calls for the browser to return a network error! Indeed, the phrase "return a network error" appears 40 times in the Fetch standard. I will not try to list all 40 here, but I note that they include:
The case where the request's scheme is unrecognised (e.g. trying to send a request to madeupscheme://foobar.com)
The wonderfully vague instruction "When in doubt, return a network error." in the algorithms for handling ftp:// and file:// URLs
Infinite redirects: "If request’s redirect count is twenty, return a network error."
A bunch of CORS-related issues, such as "If httpRequest’s response tainting is not "cors" and the cross-origin resource policy check with request and response returns blocked, then return a network error."
Connection failures: "If connection is failure, return a network error."
In other words: whenever something goes wrong other than getting a real HTTP error status code like a 500 or 400 from the server, you end up with a status attribute of 0 on your XHR object or Fetch response object in the browser. The number of possible specific causes enumerated in spec is vast.
Finally: if you're interested in the history of the spec for some reason, note that this answer was completely rewritten in 2020, and that you may be interested in the previous revision of this answer, which parsed essentially the same conclusions out of the older (and much simpler) W3 spec for XHR, before these were replaced by the more modern and more complicated WhatWG specs this answers refers to.
status 0 appear when an ajax call was cancelled before getting the response by refreshing the page or requesting a URL that is unreachable.
this status is not documented but exist over ajax and makeRequest call's from gadget.io.
Know it's an old post. But these issues still exist.
Here are some of my findings on the subject, grossly explained.
"Status" 0 means one of 3 things, as per the XMLHttpRequest spec:
dns name resolution failed (that's for instance when network plug is pulled out)
server did not answer (a.k.a. unreachable or unresponding)
request was aborted because of a CORS issue (abortion is performed by the user-agent and follows a failing OPTIONS pre-flight).
If you want to go further, dive deep into the inners of XMLHttpRequest. I suggest reading the ready-state update sequence ([0,1,2,3,4] is the normal sequence, [0,1,4] corresponds to status 0, [0,1,2,4] means no content sent which may be an error or not). You may also want to attach listeners to the xhr (onreadystatechange, onabort, onerror, ontimeout) to figure out details.
From the spec (XHR Living spec):
const unsigned short UNSENT = 0;
const unsigned short OPENED = 1;
const unsigned short HEADERS_RECEIVED = 2;
const unsigned short LOADING = 3;
const unsigned short DONE = 4;
from documentation http://www.w3.org/TR/XMLHttpRequest/#the-status-attribute
means a request was cancelled before going anywhere
Since iOS 9, you need to add "App Transport Security Settings" to your info.plist file and allow "Allow Arbitrary Loads" before making request to non-secure HTTP web service. I had this issue in one of my app.
Yes, some how the ajax call aborted. The cause may be following.
Before completion of ajax request, user navigated to other page.
Ajax request have timeout.
Server is not able to return any response.

Resources