Why is expires header response set to future date with max-age=0? - firefox

I noticed an odd behaviour with the expires property in the HTTP response header sent by Google Cloud Storage.
Though, the cache-control is defined with max-age:0 for the file in the metadata (as visible in the screenshot), the expires property is set to a date one year in the future (second screenshot). Why is this date set to the future?
The problematic thing with this behaviour is, that the most recent Firefox versions (v.77 and v.78) seem to interprete the expires property, though it is stated in the documentation, that it will not if max-age is defined (see https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Expires). For livestream video playback with HLS, this behaviour leads to buffering after short time, because the manifest is cached by the browser. There is already a bug report on mozilla#bugzilla on this behaviour (see https://bugzilla.mozilla.org/show_bug.cgi?id=1648075).
Update
Further investigation showed, that Firefox is not the problem in this case, they just changed the interpretation of the header properties and stick to the documentation since v.77, where '=' is defined as needed character, not ':'. Obviously, other browsers (and FF up to v.76) interprete it also correctly with ':'.
Therefore, in our case the issue needs to be solved inside the service writing the files to our GCS.

It's hard to say why Google Cloud Storages is doing this. Perhaps the Expires is a default, while the Cache-Control is used for custom user settings? More likely, it's just an oversight.
The important point is that this behavior is allowed, and would be harmless with compliant browsers due to the explicit precedence of max-age:
If a response includes a Cache-Control field with the max-age directive (Section 5.2.2.8), a recipient MUST ignore the Expires field.
So the real issue is that Firefox is not conforming to the HTTP specification. Hopefully that bug will be fixed soon.

Related

[play framework]Leverage browser caching

I ran a test run for speed test of my page. It said "Setting an expiry date or a maximum age in the HTTP headers for static resources instructs the browser to load previously downloaded resources from local disk rather than over the network.".
My Page using Play Framework. Came across a lot of answers regarding .htaccess file but it is not supported in Play Framework. How to cache the static files on browser level?
When using Play in production mode, it already sets the ETag header, so whenever a browser requests a file matching that eTag, play just returns 304 Not Modified. This will save you data (the browser will not download the file again if it has the right version), but still requires a request to the server.
If you want to specify a expiracy date, you can use assets.defaultCache="max-age=3600" to your application.conf (adapt the value for your needs: 3600 is one hour in seconds).
I can't check this right now, but I think Play also sets Cache-Control: max-age=3600, so probably the warning you are getting is because this value is too low for the tool you are using to check the caching.
You can also set the expiracy time to individual assets (see https://www.playframework.com/documentation/2.5.x/AssetsOverview#Additional-Cache-Control-directive)
Note that you should only specify a high expiracy time to assets that you are sure that don't change a lot...

URL::previous() not working as expected

The URL::previous() function is always returning my base URL.
Has somebody encountered this issue as well?
The URL::previous() method uses the HTTP_REFERER Header.
However this header isn't reliable since the browser sends it (or the browser doesn't).
More information on that topic
So either your browser doesn't send the (correct) referer header or you maybe are entering the URL manually (in which case there is no previous URL at all)
This is a known problem with URL::previous(), due to inconsistent usage of HTTP_REFERER across browsers (Chrome, in particular, likes to ignore it). You can handle this behavior manually with a bit of a workaround, by storing the current URL in a Session variable before redirecting, then retrieving it (and clearing it) from Session when it's time to redirect back. You can see an implementation of that at http://gist.github.com/msurguy/5158026.
The downside of this approach is that you will get incorrect behavior if the user has multiple tabs open while viewing your site, since only one URL will be stored in the Session variable.
To make this as accurate as possible, you could use a both the built-in URL::previous(), if it's available, or else the Session variable workaround as a fallback if it's not. Just check the value of Request::header('referer') and if it is empty or contains your root URL, then use the fallback stored in the Session variable.

Asking Chrome to bypass local cache for XmlHttpRequest like it's possible in Firefox?

As some of you may already know, there are some caching issues in Firefox/Chrome for requests that are initiated by XmlHttpRequest object. These issues mean that browser does not strictly follow the rules and does not go to server for the new XSLT file (for example). Response does not have Expires header (for performance reasons we can't use it).
Firefox has additional parameter in the XHR object "channel" to which you put value Components.interfaces.nsIRequest.LOAD_BYPASS_CACHE to go to server explicitly.
Does something like that exist for Chrome?
Let me immediatelly stop everyone who would recommend adding timestamp as a value of GET parameter or random integer - I don't want server to get different URL requests. I want it to get the original URL. Reason is that I want to protect server from getting too many different requests for simple static files and sending too much data to clients when it is not needed.
If you hit static file with generated GET parameter (like '?forcenew=12314') would render 200 response each first time and 304 for every following request for that value of random integer. I want to make requests that will always return 304 if the target static file is identical to client version. This is BTW how web browsers should work out-of-the-box but XHR objects tend to not go to server at all to ask is file changed or not.
In my main project at work I had the same exact problem. My solution was not to append random strings or timestamps to GET requests, but to append a specific string to GET requests.
If you have a revision number e.g. subversion revision or likewise from git/mer or whatever you are using, append that. Static files will get 304 responses until the moment a new revision is released. When the new release happens a single 200 response is granted and it is back to happily generating 304 responses. :-)
This has the added bonus of being browser independent.
Should you be unlucky and not have a revision number, then make one up and increment it each time you make a release.
You should look into Etags, etags are keys that can be generated from the contents of the file therefore once the file on the server changes the system will be a new etag. Obviously this will be a service-side change which is something that you will need to do given that you want a 200 and then subsequent 304's. Chrome and FF should respect these etags so you shouldn't need to do any crazy client-side hacks.
Chrome now supports Cache-Control: max-age=0 request HTTP header. You can set it after you open an XMLHttpRequest instance:
xhr.setRequestHeader( "Cache-Control", "max-age=0" );
This will instruct Chrome to not use cached response without revalidation.
For more information check The State of Browser Caching, Revisited by Mark Nottingham and RFC 7234 Hypertext Transfer Protocol (HTTP/1.1): Caching.

Can I clear a specific URL in the browser's cache (using POST, or otherwise)?

The Problem
There's an item (foo.js) that rarely changes. I'd like this item to be stored in the browser's cache (using Expires header). However, when it does change, I'd like the browser to update to the newest version.
The Attempt
Foo.js is returned with a far future Expires header. It's cached on the browser and requires no round trip query to the server. Just the way I like it. Now, when it changes....
Let's assume I know that the user's version of foo.js is outdated. How can I force a fresh copy of it to be obtained? I use xhr to perform a POST to foo.js. This should, in theory, force the browser to get a newer version of foo.js.
Unfortunately, this only seems to work in Firefox. Other browsers will use their cached version of the copy, even if other POST paramters are set.
WTF
First off, is there a way to do what I'm trying to do?
Second, why is there no sensible key/value type of cache that browser's have? Why can I not simply not include in headers: "Cache: some_key, some_expiration_time" and also specify "Clear-Cache: key1, key2, key3" (the keys must be domain specific, of course). Instead, we're stuck with either expensive round-trips that ask "is content new?", or the ridiculous "guess how long it'll be before you modify something" Expires header.
Thanks
Any comments on this matter are greatly appreciated.
Edits
I realize that adding a version number to the file would solve this. However, in my case it is not possible -- the call to "foo.js" is hardcoded into a bookmarklet.
You can just add a querystring to the end of the file, the server can ignore it, but the browser can't, it must treat it as a new request:
http://www.site.com/foo.js?v=1.12345
Many people use this approach, SO uses a hash of some sort, I use the build number (so users get a new version each build). If either of these is an option, you get the benefit of long duration cache headers, but still force a fetch of a new copy when needed.
Why set your cache expiration so far in the future? If you set it to one day for instance, the only overhead that you will incur (once a day) is the browser revalidating that it is the same file. If you still have not changed it, then you will not re-download the file, the server will respond with a not-modified response.
All caches have a set of rules that
they use to determine when to serve a
representation from the cache, if it’s
available. Some of these rules are set
in the protocols (HTTP 1.0 and 1.1),
and some are set by the administrator
of the cache (either the user of the
browser cache, or the proxy
administrator).
Generally speaking, these are the most
common rules that are followed (don’t
worry if you don’t understand the
details, it will be explained below):
If the response’s headers tell the cache not to keep it, it won’t.
If the request is authenticated or secure (i.e., HTTPS), it won’t be
cached.
A cached representation is considered fresh (that is, able to be
sent to a client without checking with
the origin server) if:
* It has an expiry time or other age-controlling header set, and
is still within the fresh period, or
* If the cache has seen the representation recently, and it was
modified relatively long ago.
Fresh representations are served directly from the cache, without
checking with the origin server.
If an representation is stale, the origin server will be asked to
validate it, or tell the cache whether
the copy that it has is still good.
Under certain circumstances — for example, when it’s disconnected
from a network — a cache can serve
stale responses without checking with
the origin server.
If no validator (an ETag or
Last-Modified header) is present on a
response, and it doesn't have any
explicit freshness information, it
will usually — but not always — be
considered uncacheable.
Together, freshness and validation are
the most important ways that a cache
works with content. A fresh
representation will be available
instantly from the cache, while a
validated representation will avoid
sending the entire representation over
again if it hasn’t changed.
http://www.mnot.net/cache_docs/#BROWSER
There is an excellent suggestion made in this thread: How can I make the browser see CSS and Javascript changes?
See the accepted answer by user, "grom".
The idea is to use the "modified" time stamp from the server to note when the file has been modified, and adding a version parameter to the end of the URL, making your CSS and JS files have URLs like this: my.js?version=12345678
This makes the browser think it is a new file, and so it does not refer to the cached version.
I am using a similar method in my app. It works pretty well. Of course, this would assume you are using something like PHP to process your HTML.
Here is another link with a more simple implementation for WordPress: http://markjaquith.wordpress.com/2009/05/04/force-css-changes-to-go-live-immediately/
With these constraints I guess your only option is to use window.location.reload(true) and force the browser to fresh all the cached items.. it's not pretty
You can invalidate cache on a specific url, using Cache-Control HTML header.
On your desired URL you can run (with xhr/ajax for instance) a request with following headers :
headers: {
'Cache-Control': 'no-cache, no-store, must-revalidate, max-age=0',
Pragma: 'no-cache',
Expires: '0',
}
Your cache will be invalidated, and next GET requests will return a brand new result.

Check if web page is modifed / has expired with Ruby

I'm writing a crawler for Ruby, and I want to honour the headers that the server sends out in order to make the crawl more efficient. Is there a straightforward way in Ruby of determining whether a page needs to be re-downloaded by the client? I know I need to consider at least these headers:
Last Modified
Etags
Cache Control
Expires
What's the definitive way of determining this - is it specified anywhere?
You are right on the headers you will need to look at, but you need to consider that the server is what is setting these. If they are set correctly, then you can use them to make the decision, but none of them are required.
Personally, I would probably start with tracking the expires value as I do the initial download, as well as logging the etag. Finally I'd look at last modified as I did the next pass, assuming the expires or etag showed some sign that I might need to re-download (or if they aren't even set). I wouldn't expect Cache Control to be all the useful.
You'll want to read about the head method in Net::HTTP -- http://www.ruby-doc.org/stdlib/

Resources