Google Cloud CDN "Force Cache All Content" NOT Cache All Content - caching

I am using Google Cloud CDN for my WordPress website https://cdn.datanumen.com. I have enabled "Force Cache All Content" option. However, the web pages, css files, javascript files are still not cached. Only the images are cached.
For example, I test the page at https://cdn.datanumen.com/, I have used Ctrl + F5 to refresh the webpage for many times, but always get the same results.
Below is the web page I try to load:
There are "Cache-Control" field in the response header, but no "Age" field. Based on Google document, if a cache hits and cached content is served, there will be a "Age" field. So without "Age" means the file is not cached.
I also check the log:
In the log, cacheFillBytes is 26776 and cacheLookup is true. It seems that Google CDN is trying to lookup cache and fill cache with the contents. But the statusDetails shows "response_sent_by_backend", so the contents are still served from the backend. Normally this should only occur for the first time when I visit the website. But for my case, even if I press Ctrl + F5 to refresh my website for many times, I will always get the same result, the statusDetails never shows "response_sent_by_cache" for page such as https://cdn.datanumen.com/
Why?
Update:
I notice there is a "Vary" field in the response header:
Based on https://cloud.google.com/cdn/docs/caching#non-cacheable_content, if Vary header Has a value other than Accept, Accept-Encoding, or Origin, then the content will not be cached, since for my case "Vary" header is "Accept-Encoding,Cookie,User-Agent", it is not cached. But my question is how to deal with issue and let the content be cached forcely?
Update 2
I have changed the site to a real WordPress site, since that is what I need finally. I plan to use Google Cloud CDN purchased support to see if they can help on this case.

According to the Google Cloud CDN's documentation, the best way to solve your problem is actually using the CACHE_ALL_STATIC cache mode:
CACHE_ALL_STATIC: Automatically caches static content that doesn't have the no-store or private directive. Origin responses that set valid caching directives are also cached. This is the default behavior for Cloud CDN-enabled backends created by using the gcloud command-line tool or the REST API.
USE_ORIGIN_HEADERS: Requires origin responses to set valid cache directives and valid caching headers. Responses without these directives are forwarded from the origin.
FORCE_CACHE_ALL: Unconditionally caches responses, overring any cache directives set by the origin. This mode is not appropriate if the backend serves private, per-user content, such as dynamic HTML or API responses.
But in the case of the last cache mode, there are two warnings about its usage:
When you set the cache mode to FORCE_CACHE_ALL, the default time to live (TTL) for content caching is 3600 seconds (1 hour), unless you explicitly set a different TTL. Accepting the new default TTL of 1 hour might cause some entries that were previously considered fresh (due to having longer TTLs from origin headers) to now be considered stale.
The FORCE_CACHE_ALL mode overrides cache directives (Cache-Control and Expires) but does not override other origin response headers. In particular, a Vary header is still honored, and may suppress caching even in the presence of FORCE_CACHE_ALL. For more information, see Vary headers.

Related

Why YSlow is not detecting my cookie-free domains?

I have moved all my static assets to cloudfront.net, and when I view my source code, my CSS and JS and images are already hosted in cloudfront.net. But when I check GTmetrix.com, my use cookie-free domains is still graded F, and my main domain is still showed in the list, instead of cloudfront.
I already cleared my cache, cloudflare cache, browser cache, and all kinds of cache, but Yslow in GTmetrix still doesn't detect that I'm using a CDN (cloudfront.net).
Anyone here who encountered the same problem?
Actual GTMetrix Result:
https://gtmetrix.com/reports/www.flyskyjetair.com/SgHBKXsJ
Actual Code:
view-source:https://www.flyskyjetair.com/
If you have the strip cookies and cache cookies options enabled however when running your site through YSlow are still receiving a warning, this is due to a YSlow false-positive. If you set your cookies on the top-level domain (e.g. yourwebsite.com) all of your subdomains will also include the cookies that are set. This also includes your custom CDN URL if using one (e.g. cdn.yourwebsite.com).
However, as long as you have the strip cookies option enabled, even if you receive this warning it will be incorrect. YSlow does not take into consideration that the CDN actually strips the cookie and therefore may continue to throw the error. However, if you run a cURL command on the asset or check it within the Chrome Dev tools Network tab, you won’t see any Set-Cookie headers. Therefore this YSlow warning can be safely ignored.
If you are using Cloudflare then you simply won’t be able to achieve 100 on YSlow. Cloudflare appends a __cfduid cookie to every request which cannot be removed due to security reasons.

If a JS script is cached, it there extra overhead in requesting it in a file that doesn't use it?

Basically I want to keep all the <link> and <script> requests in one php file, but some scripts only pertain to one page on the site. I don't want to create extra overhead requesting a script that isn't used on a page.
However, extra scripts are all on the home page. So for first-timers to the site, they would generally come through the home page. AFAIK, the script is then cached.
Would this offset the overhead problem I mentioned? i.e. if the file is cached, is there no noticeable overhead from requesting it on another page?
Yes, they will be cached.
However, an HTTP header request to check for a newer file will still be sent on every subsequent page request. The response from your server will be "You have the newest version already". So it will use the one from the cache.
I find that HTTP requests are the great time killers of the web - especially for mobile devices, so at the very least you should use Expires header so that when you give the files to the browser you can say "*Don't request this file again for 30 days", etc.
Lastly, you can "prefetch" extra assets after page load so that it doesn't effect the user load time but still caches things while the user is on your home page.
* As a caveat, if you do use Expires header you will then need to do versioning through folder path or query parameter because web browsers will no longer check for new files. Something like this:
<script src="http://yoursite.com/js/compressed_files.js?v=1"></script>

Force Browser Caching Across Browser Sessions

I help maintain several Wordpress-based websites that publish news and reference information.
We have been working hard to make pages at the websites load as fast as possible.
One of the things we've done is implement very long "max-age" times in the "cache-control" http headers for most of our static files, such as images and css files.
The particular cache-control setting we're using is "public, max-age=31536000". 31,536,000 seconds is 365 days.
The upside is that this setting does, in fact, cause the static files to be cached as visitors browse through different pages of our sites.
But here's the rub. This cache-control setting doesn't do much for us across browser sessions. Even though the setting is supposed to tell the browser "cache this file for an entire year", if a visitor to our site shuts down their browser, then starts it up just five minutes later and comes back to our site, the browser insists on re-loading all the static files, even though it still has them in its cache.
I've checked this carefully in Firefox, viewing the headers with Live HTTP Headers. But I can also qualitatively see the same thing happening in other browsers.
Apparently, browsers insist on re-loading all content for a website if the content hasn't been loaded once during the current browser session.
So ... Is there any way we can "politely suggest" to browsers that they always load cached content from the cache, even if the browser hasn't been to our site during the current browser session?
Check the ETag, Expires, and Last-Modified headers as well.
You need an Expires header, and sometimes ETag and Last-Modified can defeat caching.

Does Cache work for partially loaded files?

This is not a "coding question", but more something like "how does it work?".
Let's consider I want to show an heavy pic on page 2.
If I'm preloading this pic on a page 1 (no display) and click on the page-2-link before it's fully loaded... What happens?
=> The page 2 loads and the end of heavy pic is also loaded, or cache doesn't work for partially loaded files?
Thanks for your explanations,
CH
In theory its very possible that part of the response gets cached, either by the web browser or by a proxy server between the end user and the web server. http supports range requests, where the client can ask for a specific slice of the total resource(like an image). All the big name web servers support range requests.
I really don't know off hand if any web browsers cache a partially downloaded resource, although it would be a simple test - clear the web browsers cache, hit a web page that loads a large external object, stop loading midway through. Make sure the webserver sends the following headers along with the response.
cache-control: max-age=10000
accept-ranges: bytes
Now make the request again but look at the http headers of the request to look for the browser asking for partial contents like Range: bytes=100000-90000000. It would obviously only ask for partial content if it had partially cached the file.
The max-age header tells the browser the file is cachable for a while, and the accept-ranges headers tells the browser the web server is capable of servicing partial content requests.

Lazy HTTP caching

I have a website which is displayed to visitors via a kiosk. People can interact with it. However, since the website is not locally hosted, and uses an internet connection - the page loads are slow.
I would like to implement some kind of lazy caching mechanism such that as and when people browse the pages - the pages and the resources referenced by the pages get cached, so that subsequent loads of the same page are instant.
I considered using HTML5 offline caching - but it requires me to specify all the resources in the manifest file, and this is not feasible for me, as the website is pretty large.
Is there any other way to implement this? Perhaps using HTTP caching headers? I would also need some way to invalidate the cache at some point to "push" the new changes to the browser...
The usual approach to handling problems like this is with HTTP caching headers, combined with smart construction of URLs for resources referenced by your pages.
The general idea is this: every resource loaded by your page (images, scripts, CSS files, etc.) should have a unique, versioned URL. For example, instead of loading /images/button.png, you'd load /images/button_v123.png and when you change that file its URL changes to /images/button_v124.png. Typically this is handled by URL rewriting over static file URLs, so that, for example, the web server knows that /images/button_v124.png should really load the /images/button.png file from the web server's file system. Creating the version numbers can be done by appending a build number, using a CRC of file contents, or many other ways.
Then you need to make sure that, wherever URLs are constructed in the parent page, they refer to the versioned URL. This obviously requires dynamic code used to construct all URLs, which can be accomplished either by adjusting the code used to generate your pages or by server-wide plugins which affect all text/html requests.
Then, you then set the Expires header for all resource requests (images, scripts, CSS files, etc.) to a date far in the future (e.g. 10 years from now). This effectively caches them forever. This means that all requests loaded by each of your pages will be always be fetched from cache; cache invalidation never happens, which is OK because when the underlying resource changes, the parent page will use a new URL to find it.
Finally, you need to figure out how you want to cache your "parent" pages. How you do this is a judgement call. You can use ETag/If-None-Match HTTP headers to check for a new version of the page every time, which will very quickly load the page from cache if the server reports that it hasn't changed. Or you can use Expires (and/or Max-Age) to reload the parent page from cache for a given period of time before checking the server.
If you want to do something even more sophisticated, you can always put a custom proxy server on the kiosk-- in that case you'd have total, centralized control over how caching is done.

Resources