template engine and cache - caching

When using a template engine (pug, thymeleaf, etc...),
the server renders an html file dynamically and then delivers it to the client upon each page request.
Suppose there is a company proxy server or a cache server between the server and the client.
will there ever be a cache hit?
don't we lose all the benefits of internet cache when sending new versions of our html to clients all the time?

If the URL is the same for all users then yes, the CDN will be hit most of the time. You will need to do something like cache-control or set up the CDN to bypass the cache when a certain path is hit.
This is why a lot of sites use AJAX calls to fill the pages post-load. All of the HTML can be cached in the CDN and the CDN is configured to bypass cache for all /api paths.
Our site uses CDN for the public pages (which are still generated with pug), then when you sign in the CDN is instructed to never cache the "personal" pages that are rendered dynamically.

Related

Google Cloud CDN "Force Cache All Content" NOT Cache All Content

I am using Google Cloud CDN for my WordPress website https://cdn.datanumen.com. I have enabled "Force Cache All Content" option. However, the web pages, css files, javascript files are still not cached. Only the images are cached.
For example, I test the page at https://cdn.datanumen.com/, I have used Ctrl + F5 to refresh the webpage for many times, but always get the same results.
Below is the web page I try to load:
There are "Cache-Control" field in the response header, but no "Age" field. Based on Google document, if a cache hits and cached content is served, there will be a "Age" field. So without "Age" means the file is not cached.
I also check the log:
In the log, cacheFillBytes is 26776 and cacheLookup is true. It seems that Google CDN is trying to lookup cache and fill cache with the contents. But the statusDetails shows "response_sent_by_backend", so the contents are still served from the backend. Normally this should only occur for the first time when I visit the website. But for my case, even if I press Ctrl + F5 to refresh my website for many times, I will always get the same result, the statusDetails never shows "response_sent_by_cache" for page such as https://cdn.datanumen.com/
Why?
Update:
I notice there is a "Vary" field in the response header:
Based on https://cloud.google.com/cdn/docs/caching#non-cacheable_content, if Vary header Has a value other than Accept, Accept-Encoding, or Origin, then the content will not be cached, since for my case "Vary" header is "Accept-Encoding,Cookie,User-Agent", it is not cached. But my question is how to deal with issue and let the content be cached forcely?
Update 2
I have changed the site to a real WordPress site, since that is what I need finally. I plan to use Google Cloud CDN purchased support to see if they can help on this case.
According to the Google Cloud CDN's documentation, the best way to solve your problem is actually using the CACHE_ALL_STATIC cache mode:
CACHE_ALL_STATIC: Automatically caches static content that doesn't have the no-store or private directive. Origin responses that set valid caching directives are also cached. This is the default behavior for Cloud CDN-enabled backends created by using the gcloud command-line tool or the REST API.
USE_ORIGIN_HEADERS: Requires origin responses to set valid cache directives and valid caching headers. Responses without these directives are forwarded from the origin.
FORCE_CACHE_ALL: Unconditionally caches responses, overring any cache directives set by the origin. This mode is not appropriate if the backend serves private, per-user content, such as dynamic HTML or API responses.
But in the case of the last cache mode, there are two warnings about its usage:
When you set the cache mode to FORCE_CACHE_ALL, the default time to live (TTL) for content caching is 3600 seconds (1 hour), unless you explicitly set a different TTL. Accepting the new default TTL of 1 hour might cause some entries that were previously considered fresh (due to having longer TTLs from origin headers) to now be considered stale.
The FORCE_CACHE_ALL mode overrides cache directives (Cache-Control and Expires) but does not override other origin response headers. In particular, a Vary header is still honored, and may suppress caching even in the presence of FORCE_CACHE_ALL. For more information, see Vary headers.

Google AMP Cache - hot to force loading index.html from cache?

Is there any way how to force loading main homepage (index.html) to load from AMP Cache?
I have all images loaded from Cache according to manual: https://developers.google.com/amp/cache/overview
But in DevTools audit there is still an error for the homepage (not being served through http/2 - from the cache)
I’m not sure exactly what you mean but think you may be misunderstanding the point of the AMP cache.
The Google AMP Cache is not like a CDN (Content Delivery Network) that always sits in front of your site, though in certain instances it acts like one.
The Google AMP Cache is automatically populated by Google when it crawls your site. Any searches on Google while on mobile will then serve your AMP pages, rather than your normal pages, and will also serve them from the Google AMP cache rather than from your domain. This is done for a number of reasons, but primarily to create the “instant loading” effect that AMP gives when loaded from Google Search results (aka Search Engine Results Page or SERP). In this case the whole page including the index page is served from the Google AMP Cache.
Other sites and domains can also decide to display AMP pages instead of your HTML pages if they want, and can decide to serve them from the Google AMP cache, from their own AMP cache (though, other than Google, only Cloudflare have implemented their own AMP Cache AFAIK) or directly from your home page (in which case there is no cache used). Twitter for example automatically replaces links with their AMP equivalents but loads from the real domain so is fast (due to AMP) but not “instant” (like it is in the Google Search Results).
So you, as a site owner, don’t decide when to use the AMP Cache - the calling application (e.g. Google SERPS, Twitter) decides that. And if the calling app/page doesn’t use an AMP Cache, then it is served directly from your domain and therefore whatever technology your domain supports (e.g. HTTP/1.1 or HTTP/2). You can of course give out the AMP Cache URL instead of your real one if you want.
You seem to suggest you have altered your page to replace all images and the like with references to the AMP cache - is that so? If so that sounds like a bad idea, as the cache is loaded from your site which now depends on the cache, which is loaded from your site, which is... etc.

Should I use https for static files (images, css)

I use https protocol for my login, registration, admin pages of my web app.
If I don't write some htaccess rule, all my static files images, css, js, ect. are loaded through https too.
Does this decrease the performance of my app and is it better to use http for all static resources of my app?
If you attempt to include a static file over HTTP while the original dynamic page was served through HTTPS the browser might emit a warning that this webpage is trying to server non secure content over a secure channel. So you should avoid doing that. There's of course a penalty from serving a resource over HTTPS but static files are usually cached by browsers so that shouldn't be that much of a problem. Also you might consider minifying and combining your scripts into a single one in order to reduce the number of HTTP(S) requests made to the server. That's where you will gain most.
For your images you might also consider using a technique called CSS sprites.

Prevent certain folder from attaching cookies in request headers

I have an application built with codeigniter and am running tests for rendering time etc, i have noticed that some certain static files have cookies attached to them which are adding unnecessary loading times.
I was wondering if it was possible to prevent requests to the folder from attaching cookies to the headers.
my site structure looks like this;
application
system
assets
assets/js
assets/css
assets/img
profiles
I dont want requests to the assets and profiles folder to have cookies in their headers
If you're setting cookies at the root, you'll need a separate hostname to do this.
I wouldn't serve the CSS from a separate hostname though...
After HTML, CSS is the next most important resource on a webpage as browser can't start rendering the page until it has the CSS.
If you serve the CSS from a separate domain then there's the overhear of resolving that domain and in some browsers the overhead of TCP connection set up (Chrome, IE9 and possibly other browsers speculatively open a second TCP connection to the host the HTML came from before they know they need it)
The CSS will still have the cookie set on it but if you set a long cache time for it, the CSS should only be requested once per session.

Lazy HTTP caching

I have a website which is displayed to visitors via a kiosk. People can interact with it. However, since the website is not locally hosted, and uses an internet connection - the page loads are slow.
I would like to implement some kind of lazy caching mechanism such that as and when people browse the pages - the pages and the resources referenced by the pages get cached, so that subsequent loads of the same page are instant.
I considered using HTML5 offline caching - but it requires me to specify all the resources in the manifest file, and this is not feasible for me, as the website is pretty large.
Is there any other way to implement this? Perhaps using HTTP caching headers? I would also need some way to invalidate the cache at some point to "push" the new changes to the browser...
The usual approach to handling problems like this is with HTTP caching headers, combined with smart construction of URLs for resources referenced by your pages.
The general idea is this: every resource loaded by your page (images, scripts, CSS files, etc.) should have a unique, versioned URL. For example, instead of loading /images/button.png, you'd load /images/button_v123.png and when you change that file its URL changes to /images/button_v124.png. Typically this is handled by URL rewriting over static file URLs, so that, for example, the web server knows that /images/button_v124.png should really load the /images/button.png file from the web server's file system. Creating the version numbers can be done by appending a build number, using a CRC of file contents, or many other ways.
Then you need to make sure that, wherever URLs are constructed in the parent page, they refer to the versioned URL. This obviously requires dynamic code used to construct all URLs, which can be accomplished either by adjusting the code used to generate your pages or by server-wide plugins which affect all text/html requests.
Then, you then set the Expires header for all resource requests (images, scripts, CSS files, etc.) to a date far in the future (e.g. 10 years from now). This effectively caches them forever. This means that all requests loaded by each of your pages will be always be fetched from cache; cache invalidation never happens, which is OK because when the underlying resource changes, the parent page will use a new URL to find it.
Finally, you need to figure out how you want to cache your "parent" pages. How you do this is a judgement call. You can use ETag/If-None-Match HTTP headers to check for a new version of the page every time, which will very quickly load the page from cache if the server reports that it hasn't changed. Or you can use Expires (and/or Max-Age) to reload the parent page from cache for a given period of time before checking the server.
If you want to do something even more sophisticated, you can always put a custom proxy server on the kiosk-- in that case you'd have total, centralized control over how caching is done.

Resources