How do scripts and modules get cached? - performance

I understand for assets like images, there is a src associated with them, which means that a browser will check the expiration date of that asset before making a new request for that asset at the src to download that asset again and then render it onto the page. How does this work with a script or module such as React? If it is a CDN, does the browser first download the script and then run it the very first time it encounters the script? And then every time after that when it needs this script again, does it just load it from its cached (instead of downloading it again from the source) and running it? Is this the same thing that happens if you have React as a node module?

This is a very large topic, the basic answer is that browsers will cache assets how you tell them too. You mention that images have expiration dates, these dates are set in HTTP headers sent by the server. You can set the same headers for javascript and any other files you request from a server and the browser will cache them the same way.
After a javascript asset is fetched (from the server or the cache), the browser parses and runs your javascript.
Node modules live in node land. Usually, before you can use code in node_modules in the browser you run it through a tool like webpack or browersify. These tools bundles ALL the code (your application + react + whatever else) into one file (usually), which is then served to browser. The browser doesn't know anything about node_modules. It just parses and runs the javascript you provided.
The one bundled file is cached based on the headers it was sent with. A CDN is (basically) just a special server optimized in serving assets quickly.

Related

how to serve or force file download located from a different directory in meteor to client browser

A user comes to my site and inputs something, and my site generates a file as an output.
Unfortunately i cannot place the generated file on the public directory - as you all now Meteor watches this and restarts every time the public folder content is changed.
so my generated files lives in .meteor/local/build/programs/server/files
so for example i have document.pdf that lives in that directory, I'd like to serve/force/trigger a file download to my client's browser that lets his browser download this document.pdf file.
In general Its not a very good idea to do this. It makes it very hard to scale your app. Node isn't good at serving chunky static files either.
Then also if you have two servers there is a slight chance that the other one's data is requested (e.g if you use a download manager).
I'm not sure but I think Meteor's live code reload doesn't work/is switched off in when in production mode (when using meteor deploy or meteor bundle)
The best thing to do would be to upload your file to S3 and then redirect the user to the file there.
You can also use Iron Router and server side routes to create a dynamic file download.
See Iron Router Server Side docs. Then you set your content type to application/pdf and send back the file directly without saving it to the filesystem. If you need to you can also save it in some other folder and serve it up yourself.
Then have a peek at this answer for an example of reading in and streaming out a file:
Node JS file downloads using a stream.
Since this is a server side route, using express and Iron Router, you shouldn't have to mess with any of the fibers related async issues.

How do I set caching headers for my CSS/JS but ensure visitors always have the latest versions?

I'd like to speed up my site's loading time in part by ensuring all CSS/JS is being cached by the browser, as recommend by Google's PageSpeed tool. But I'd like to ensure that visitors have the latest CSS/JS files, if they are updated and the cache now contains old code.
From my research so far, appending something like "?459454" to the end of the CSS/JS url is popular. But wouldn't that force the visitor's browser to re-download the CSS/JS file every time?
Is there a way to set the files to be cached by the browser, but ensure the browser knows about updated versions of the cached files?
If you're using Apache, you can use mod_pagespeed (mentioned earlier by symcbean) to do this automatically.
It would work best if you also use the ModPagespeedLoadFromFile directive since that will create a new URL as soon as it detects that the resource has changed on disk, however it will work fine without that (it will use the cache expiry time returned when it fetches the resource to rewrite it).
If you're using nginx, you could use ngx_pagespeed.
If you're using IIS, you could use IISpeed, which is not a Google product and I don't know it's full feature set.
Version numbers will work, but you can also append a hash of the file to the filename with your web framework or asset build script:
<script src="script-5054a101c8b164cbfa570d97fe23cc0d.js"></script>
That way, once your HTML changes to reflect this new version, browsers will just download and cache the updated version of your script.
As you say, append a query string to the URL of the asset, but only change it if the content is different, or change it when you deploy a new version.
appending something like "?459454" to the end of the CSS/JS url is popular. But wouldn't that force the visitor's browser to re-download the CSS/JS file every time?
No it won't force them to download each time, however there are a lot of intermediate proxies out there which ignore query strings on cacheable content - hence many tools (including mod_pagespeed which does automatic url rewriting based on file conents, and content merging on the fly along with lots of other cool tricks) move the version information into the path / filename.
If you've only got .htaccess type access then you can strip the version information out to map direct to a file, or use a scripted 404 redirector (but this is probably only a good idea if you're behind a caching reverse proxy).

Firefox not re-validating static files from child frames

I'm working with an application that has an iframe - both the outer html body and the frame require certain javascript and CSS files. To cut down on load times, all these static files have expiry set to a year from now and essentially should be loaded from cache for normal page hits - which is expected behavior in IE8 and FF3.6
However, once I reload/refresh (F5) the page, I expect the browsers to send an 'If-Updated-Since' requests to the server for these files. IE8 sends the requests for all the files used outside as well as within the iframe. But, FF3.6 only sends the requests for files used outside (not for the files used within the iframe, it just loads those from cache!).
The response headers are exactly the same for all files regardless of whether they are in the iframe or not. Is there a reason for this behavior of FF? Any way to avoid it?
Note: I can append version parameters to the source, or add a version folder in the path, etc. But, I want to know if this quirk can be avoided/has a good reason behind it?
Firefox behaves correctly - the server indicated that the scripts are good for a year so there is no reason to send pointless requests which waste time, bandwidth and server resources. For debugging purposes you can keep the Shift key pressed while clicking the Reload button, it will make sure that all data is refreshed. However, for end users adding the version information to the URL (e.g. http://example.com/.../script.js?version=1.2.3) is probably the best solution. This makes sure that the cached version can be used as long as it is valid and the new version is downloaded as soon as you update the script.

?_escaped_fragment_= - headless browser

what do I have to do to add a ?_escaped_fragment_= support to my server? I want google to be able to crawl through my ajax site. My hashes are already in #! form
But I have no idea how to tell my server that when I enter mywebsite.com/?_escaped_fragment_=section to my browser so the url mywebsite.com/section and it would be equal to mywebsite.com/#!
thanks
Simple answer - my method (soon to be used for a site with ca. 50,000 AJAX-generated URLs) is to have a node.js server using a headless environment (try zombie, phantomjs, or any other) to load the site, making sure it's able to execute javascript and read the DOM - then at runtime, if it's google requesting the fragment, fire a request to the node.js server, which loads the site, executes the javascript, waits for the response, and delivers back the HTML, which is output to the browser.
If that sounds like a lot of work - I'm about 90% finished on the code that does it all for you, where you'd simply drop one line of (PHP) code at the top of your site/app and it does the rest for you, using a remote node.js server.
The code will be open source so if you want to set it up yourself on a node server, you can - or if it's a PITA to set it up yourself, I'll probably have a live server up and running which your app/website would fire ?_escaped_fragment_ requests to, and get the html snapshot back. It also implements caching so that these are only requested once every X days.
Watch this space - just got a few kinks to work out, and it'll be on my site (josscrowcroft.com) and I'll put it in a github repo too.

Lazy HTTP caching

I have a website which is displayed to visitors via a kiosk. People can interact with it. However, since the website is not locally hosted, and uses an internet connection - the page loads are slow.
I would like to implement some kind of lazy caching mechanism such that as and when people browse the pages - the pages and the resources referenced by the pages get cached, so that subsequent loads of the same page are instant.
I considered using HTML5 offline caching - but it requires me to specify all the resources in the manifest file, and this is not feasible for me, as the website is pretty large.
Is there any other way to implement this? Perhaps using HTTP caching headers? I would also need some way to invalidate the cache at some point to "push" the new changes to the browser...
The usual approach to handling problems like this is with HTTP caching headers, combined with smart construction of URLs for resources referenced by your pages.
The general idea is this: every resource loaded by your page (images, scripts, CSS files, etc.) should have a unique, versioned URL. For example, instead of loading /images/button.png, you'd load /images/button_v123.png and when you change that file its URL changes to /images/button_v124.png. Typically this is handled by URL rewriting over static file URLs, so that, for example, the web server knows that /images/button_v124.png should really load the /images/button.png file from the web server's file system. Creating the version numbers can be done by appending a build number, using a CRC of file contents, or many other ways.
Then you need to make sure that, wherever URLs are constructed in the parent page, they refer to the versioned URL. This obviously requires dynamic code used to construct all URLs, which can be accomplished either by adjusting the code used to generate your pages or by server-wide plugins which affect all text/html requests.
Then, you then set the Expires header for all resource requests (images, scripts, CSS files, etc.) to a date far in the future (e.g. 10 years from now). This effectively caches them forever. This means that all requests loaded by each of your pages will be always be fetched from cache; cache invalidation never happens, which is OK because when the underlying resource changes, the parent page will use a new URL to find it.
Finally, you need to figure out how you want to cache your "parent" pages. How you do this is a judgement call. You can use ETag/If-None-Match HTTP headers to check for a new version of the page every time, which will very quickly load the page from cache if the server reports that it hasn't changed. Or you can use Expires (and/or Max-Age) to reload the parent page from cache for a given period of time before checking the server.
If you want to do something even more sophisticated, you can always put a custom proxy server on the kiosk-- in that case you'd have total, centralized control over how caching is done.

Resources