Does Cache work for partially loaded files? - image

This is not a "coding question", but more something like "how does it work?".
Let's consider I want to show an heavy pic on page 2.
If I'm preloading this pic on a page 1 (no display) and click on the page-2-link before it's fully loaded... What happens?
=> The page 2 loads and the end of heavy pic is also loaded, or cache doesn't work for partially loaded files?
Thanks for your explanations,
CH

In theory its very possible that part of the response gets cached, either by the web browser or by a proxy server between the end user and the web server. http supports range requests, where the client can ask for a specific slice of the total resource(like an image). All the big name web servers support range requests.
I really don't know off hand if any web browsers cache a partially downloaded resource, although it would be a simple test - clear the web browsers cache, hit a web page that loads a large external object, stop loading midway through. Make sure the webserver sends the following headers along with the response.
cache-control: max-age=10000
accept-ranges: bytes
Now make the request again but look at the http headers of the request to look for the browser asking for partial contents like Range: bytes=100000-90000000. It would obviously only ask for partial content if it had partially cached the file.
The max-age header tells the browser the file is cachable for a while, and the accept-ranges headers tells the browser the web server is capable of servicing partial content requests.

Related

Google Cloud CDN "Force Cache All Content" NOT Cache All Content

I am using Google Cloud CDN for my WordPress website https://cdn.datanumen.com. I have enabled "Force Cache All Content" option. However, the web pages, css files, javascript files are still not cached. Only the images are cached.
For example, I test the page at https://cdn.datanumen.com/, I have used Ctrl + F5 to refresh the webpage for many times, but always get the same results.
Below is the web page I try to load:
There are "Cache-Control" field in the response header, but no "Age" field. Based on Google document, if a cache hits and cached content is served, there will be a "Age" field. So without "Age" means the file is not cached.
I also check the log:
In the log, cacheFillBytes is 26776 and cacheLookup is true. It seems that Google CDN is trying to lookup cache and fill cache with the contents. But the statusDetails shows "response_sent_by_backend", so the contents are still served from the backend. Normally this should only occur for the first time when I visit the website. But for my case, even if I press Ctrl + F5 to refresh my website for many times, I will always get the same result, the statusDetails never shows "response_sent_by_cache" for page such as https://cdn.datanumen.com/
Why?
Update:
I notice there is a "Vary" field in the response header:
Based on https://cloud.google.com/cdn/docs/caching#non-cacheable_content, if Vary header Has a value other than Accept, Accept-Encoding, or Origin, then the content will not be cached, since for my case "Vary" header is "Accept-Encoding,Cookie,User-Agent", it is not cached. But my question is how to deal with issue and let the content be cached forcely?
Update 2
I have changed the site to a real WordPress site, since that is what I need finally. I plan to use Google Cloud CDN purchased support to see if they can help on this case.
According to the Google Cloud CDN's documentation, the best way to solve your problem is actually using the CACHE_ALL_STATIC cache mode:
CACHE_ALL_STATIC: Automatically caches static content that doesn't have the no-store or private directive. Origin responses that set valid caching directives are also cached. This is the default behavior for Cloud CDN-enabled backends created by using the gcloud command-line tool or the REST API.
USE_ORIGIN_HEADERS: Requires origin responses to set valid cache directives and valid caching headers. Responses without these directives are forwarded from the origin.
FORCE_CACHE_ALL: Unconditionally caches responses, overring any cache directives set by the origin. This mode is not appropriate if the backend serves private, per-user content, such as dynamic HTML or API responses.
But in the case of the last cache mode, there are two warnings about its usage:
When you set the cache mode to FORCE_CACHE_ALL, the default time to live (TTL) for content caching is 3600 seconds (1 hour), unless you explicitly set a different TTL. Accepting the new default TTL of 1 hour might cause some entries that were previously considered fresh (due to having longer TTLs from origin headers) to now be considered stale.
The FORCE_CACHE_ALL mode overrides cache directives (Cache-Control and Expires) but does not override other origin response headers. In particular, a Vary header is still honored, and may suppress caching even in the presence of FORCE_CACHE_ALL. For more information, see Vary headers.

How firefox fetches correct data from Browser Cache

Once we open a link in a new tab in Firefox, the data corresponding to that web page(static or dynamic) gets stored in Browser Cache. Then, when we switches at that tab again, it extracts data of that page from Cache(not requesting from the server of that site) and paints it at the frame buffer of the screen.
I want to know that how Firefox fetches this data in correct sequence?
What kind of mapping does the Firefox uses to extract the page data from its Cache?
Firefox (like any other browser) uses heuristics to decide when and what to cache. This is assuming no caching information is included in the resources. When no caching information is provided, Firefox might still decide to cache the files for certain period of time.
If you want to avoid Firefox to cache your resources altogether, you must include the following response header on your resources:
Cache-Control:no-cache, no-store
Now, the exact algorithm that Firefox uses to fetch from cache I don't think is public. Maybe somebody from Mozilla is able to answer this.

Force Browser Caching Across Browser Sessions

I help maintain several Wordpress-based websites that publish news and reference information.
We have been working hard to make pages at the websites load as fast as possible.
One of the things we've done is implement very long "max-age" times in the "cache-control" http headers for most of our static files, such as images and css files.
The particular cache-control setting we're using is "public, max-age=31536000". 31,536,000 seconds is 365 days.
The upside is that this setting does, in fact, cause the static files to be cached as visitors browse through different pages of our sites.
But here's the rub. This cache-control setting doesn't do much for us across browser sessions. Even though the setting is supposed to tell the browser "cache this file for an entire year", if a visitor to our site shuts down their browser, then starts it up just five minutes later and comes back to our site, the browser insists on re-loading all the static files, even though it still has them in its cache.
I've checked this carefully in Firefox, viewing the headers with Live HTTP Headers. But I can also qualitatively see the same thing happening in other browsers.
Apparently, browsers insist on re-loading all content for a website if the content hasn't been loaded once during the current browser session.
So ... Is there any way we can "politely suggest" to browsers that they always load cached content from the cache, even if the browser hasn't been to our site during the current browser session?
Check the ETag, Expires, and Last-Modified headers as well.
You need an Expires header, and sometimes ETag and Last-Modified can defeat caching.

Lazy HTTP caching

I have a website which is displayed to visitors via a kiosk. People can interact with it. However, since the website is not locally hosted, and uses an internet connection - the page loads are slow.
I would like to implement some kind of lazy caching mechanism such that as and when people browse the pages - the pages and the resources referenced by the pages get cached, so that subsequent loads of the same page are instant.
I considered using HTML5 offline caching - but it requires me to specify all the resources in the manifest file, and this is not feasible for me, as the website is pretty large.
Is there any other way to implement this? Perhaps using HTTP caching headers? I would also need some way to invalidate the cache at some point to "push" the new changes to the browser...
The usual approach to handling problems like this is with HTTP caching headers, combined with smart construction of URLs for resources referenced by your pages.
The general idea is this: every resource loaded by your page (images, scripts, CSS files, etc.) should have a unique, versioned URL. For example, instead of loading /images/button.png, you'd load /images/button_v123.png and when you change that file its URL changes to /images/button_v124.png. Typically this is handled by URL rewriting over static file URLs, so that, for example, the web server knows that /images/button_v124.png should really load the /images/button.png file from the web server's file system. Creating the version numbers can be done by appending a build number, using a CRC of file contents, or many other ways.
Then you need to make sure that, wherever URLs are constructed in the parent page, they refer to the versioned URL. This obviously requires dynamic code used to construct all URLs, which can be accomplished either by adjusting the code used to generate your pages or by server-wide plugins which affect all text/html requests.
Then, you then set the Expires header for all resource requests (images, scripts, CSS files, etc.) to a date far in the future (e.g. 10 years from now). This effectively caches them forever. This means that all requests loaded by each of your pages will be always be fetched from cache; cache invalidation never happens, which is OK because when the underlying resource changes, the parent page will use a new URL to find it.
Finally, you need to figure out how you want to cache your "parent" pages. How you do this is a judgement call. You can use ETag/If-None-Match HTTP headers to check for a new version of the page every time, which will very quickly load the page from cache if the server reports that it hasn't changed. Or you can use Expires (and/or Max-Age) to reload the parent page from cache for a given period of time before checking the server.
If you want to do something even more sophisticated, you can always put a custom proxy server on the kiosk-- in that case you'd have total, centralized control over how caching is done.

Can't the browser just use its cache from prior ajax calls?

I am trying to rely upon the browser cache to hold
JSON data returned from AJAX calls in jQuery.
Normal browser activity relies upon the browser cache all the time.
Example: jpg and gif images are not refetched on a page reload.
But when I try using jQuery getJSON ajax calls, I cannot seem to avoid fetching the data from the server.
My returned headers look like this (confirmed with firebug):
Transfer-Encoding: chunked
Date: Wed, 05 Aug 2009 02:55:39 GMT
Content-Type: text/plain; charset=ISO-8859-1
Expires: Wed, 05 Aug 2009 03:55:39 GMT
Cache-Control: max-age=3600
Yet an immediate refresh of the page causes identical requests to hit the server.
I've seen several postings about avoiding caching behavior, which isn't what I need.
I've seen several postings about utilizing caching, but those all seem to rely
upon saving data in the DOM. I want something that behaves just like cached images do during a page reload.
Cant the browser just fetch it from it's own cache?
--x--x--x--x UPDATE --x--x--x--
Much to my disappointment, several respectable folks agree that this isn't just possible.
Some even argue that it shouldn't be (which still baffles me).
Stubburn to a fault, I tried the following:
I set the Etag header on all outgoing pages I want to be cached
(I pick a few choice URL arguments that represent the data I'm requesting and just use that for the Etag value)
At the beginning of the next request, I simply check if the 'If-None-Match' header is in the request. If so, then the browser isn't caching the request like I wanted, so I sent a 304 Not Modified response.
Testing shows that Firefox won't cache my request
(but I can still avoid the 'fetch the expensive data' part of my cgi),
while IE6 will actually cache it (and wont even attempt fetching back from the server).
It's not a pretty answer, but it's working for me for now
(those pesty full-page refreshes of graph data wont be so slow or expensive now).
(What? I'm running IE6! OMG! Oh look a squirrel!)
Ajax caching is possible and predictable (at least in IE and Firefox).
This blog post discusses Ajax caching and has a demo web page:
http://blog.httpwatch.com/2009/08/07/ajax-caching-two-important-facts/
There's also a follow up by Steve Souders on the F5 issue:
http://stevesouders.com/tests/ajax_caching.php
The short answer is no. Unfortunately, browsers do not reliably cache AJAX requests in the same way that they do "normal" pages. (Although the data may in fact be cached, the browser often doesn't use the cache when handling AJAX requests the way you would expect.) This should change in the future, but for now you have to work around it.
You may want to check your resources using the Resource Expert Droid, to make sure they’re doing what you intend. You should also run a network trace to double-check the request and response headers, using something like Wireshark, in case Firebug isn’t telling the full story.
It’s possible that jQuery is including some request headers in a way that the browser decides should bypass the cache. Have you tried a plain XMLHTTPRequest without a framework?
While not the "browser cache" what about session state or some other form of client side saving. You will still have to look into an if modified since situation as mentioned in your comment.
The browser won't natively know if the data has been changed or not since json did retrieve dynamically and what is in the cache is static. I think?
Found a relevant link here after author claimed ajax browser caches are indeed reliable.
claim found here: http://davidwalsh.name/cache-ajax
linked to here: http://ajaxref.com/ch6/builtincache.html

Resources