In our web app, we have a web page that includes many components, each rendered with data from DB, server side cache is used to store the generated components for future requests. And we also maintained a global 'last-modified' timestamp for the whole page, which is the last time any data (in database) on this web is changed, and we return 304 HTTP response if the browser cache has a fresh version.
In a word, we use both server side cache and client side cache to improve performance.
This is all good until we consider deploying new code. When new code (say html) is deployed, not only client-side cache is invalid, server side cache has to be purged, too. We have to set the last-modified time to our code deployment time, and purge everything in server side cache.
This is not quite desirable, if we deploy code regularly. Because the data in database for the page is not changed regularly, we expect the caches to work over long period of time. But deploying new code defeats our purpose.
What should we do in this case? Is there any 'industry best practice' here?
For my projects, when I change a file such as a css file, I will add a parameter to where the file is included. For example,
<link href='default.css?1' type='text/css' rel='stylesheet'>
And change the number each time you want the file to be reloaded rather than extracted from the cache.
Related
I have a website with millions of pages. The content on the page stored in database but the data is not changed very frequently. so for the sake of improving the performance of the wesite and reducing the costs of deployment of web applications, I want to generate the static pages for the dynamic content and refresh the pages if the contents are changed. But I am very concerned about how to manage these large amount of pages. how should I store these pages? Is it possible that it will cause IO problems when the web server handle many requests? Is there any better solutions for this issue? Should I use varnish to handle this issue?
Varnish looks like a very good use case for that. Basically, you wouldn't be generating the full site statically, but incrementally, every time there's new requested content varnish hasn't cached yet.
EDIT to cover the comments:
if all the Varnish nodes are down, you can't get your content, same as if the database is down, or if your load-balancers are down. Just have two Varnish load-balanced for high availability with keepalived for example.
if varnish is restarted, the cache will get cleared, unless you are using Varnish Plus/Enterprise with MSE. It may not be an issue if you don't restart often (configuration changes don't need restarts), since the database still has the data to repopulate the cache.
Varnish has a ton of options to invalidate content: purges for just one object, revalidation, bans to target entire sub-domains or sub-trees, xkeys for tag-based invalidation.
Based on the description, your architecture looks like Webpages --> Services -->Database. The pages are generated dynamically based on the data in the database.
For Example, when you search for employed details, the services hits the database and get the details of employee and is rendered on the UI.
Now,if you create a content and store as webpage for every employee, this solution will not scale. Also, if employee information is changed in the database, you will have a stale data if you are not recreating the page.
My recommendation is the architecture should have a cache server and the new architecture should be like Webpages --> Services --> Cache server> Database. Services should query the database, create a page and store in the Cache. Key for cache should be page URL and value should be the page content. Now, when the URL hits the services, services will get the page from the cache rather than going to database. If the key is not available in the cache, services will query the database and fill the cache with key and value.
"Key is Url of the page. Value is the content of the page which has hidden updated date."
You can have the back-end job or a separate service to refresh the cache when data is updated in the database. Job can compare the updated date in the database vs the date in the cache value and flush the cache if the date is not matching. Job running in the back-end to refresh the cache will run behind the scene and will not impact user or UI performance.
In my company, we have a report generation team which maintains a local web application which is horribly slow. These reports get generated weekly. The data for these reports reside inside a database which gets queried through this report portal. I cannot suggest them to change the application in anyway (like memcache etc.) the only option I have is to somehow save these pages locally and relay.
As these are not static pages(they use database to fetch the data), I want to know is there anyway I can store these pages locally by running a cronjob and then have the super fast access for me and my team.
PS:This application doesn't have any authentication these are plain diffs of two files stored in the database.
There are lot of options, but the following one may be easy
Generate HTML page regularly and update the cache (cache entire generated html page with key obtained from the dynamic content uniqueness), with some kind of cronjob as you have mentioned. This job populates all the modified dynamic content # regular intervals.
Have a wrapper for every dynamic page content to lookup cache. If hit then simply return the already generated HTML page. Else, go through regular flow.
You can also choose to cache this newly generated page also.
Hope it helps!
I'm developing a web app in Dart, packaged in tomcat 6 as a deployable .war. This app is used by a bunch of clients, all with Google Chrome.
Every time I republish a new version, every single client must clear his browser cache before seeing the updated files: this is very annoying and I can't find any solution other than broadcast a mail to everyone "Please clear the browser cache".
The desirable solution is not a complete cache disable but that the browser keeps caching all stuff to be the quicker it could, and that I can control this at my wish.
I'm not sure what your question is about exactly.
There is nothing specific to Dart. Caching is handled by the browser depending on the expires headers the server returns with a response to a request.
What you can do is something like explained here Force browser to clear cache or Forcing cache expiration from a JavaScript file, and make the client application poll the server frequently for updates and then redirect to the new URL. You could implement some kind of redirection on the server or ignore the version URL query parameter, to be able to actually keep the same names of the resources.
Another possibility could be to use AppCache and serve the manifest file with immediate expiration. When you have an updated version modify the manifest file which makes the client reload the resources listed in the manifest (https://stackoverflow.com/a/13107058/217408, https://developer.mozilla.org/en-US/docs/Web/HTML/Using_the_application_cache, http://alistapart.com/article/application-cache-is-a-douchebag#section4).
Current browsers keep cached data in order to save re-fetching data again from server. However there are situations when we need to tell browser to re-fetch data from server i.e. when a new production release has gone live.
Many people know of Crtl+F5 but most people don't generally use that because they don't even know that a new version of site has gone live.
One common method used is to append version number or timestamp at end of file e.g. http://host/jquery.js?v=2, http://host/jquery.js?v=3 etc. But this will only work when browser fetches the latest HTML where the version number is updated. If browser still fetches from cache, then it would load http://host/jquery.js?v=2 instead of http://host/jquery.js?v=3.
Is there a way to force browser to invalidate all cache and reload from server (whenever a user loads the page after a new version of site has gone live)?
Please note: Using 'no-cache' meta is not an option here as this will make the page non-cacheable.
I have a website which is displayed to visitors via a kiosk. People can interact with it. However, since the website is not locally hosted, and uses an internet connection - the page loads are slow.
I would like to implement some kind of lazy caching mechanism such that as and when people browse the pages - the pages and the resources referenced by the pages get cached, so that subsequent loads of the same page are instant.
I considered using HTML5 offline caching - but it requires me to specify all the resources in the manifest file, and this is not feasible for me, as the website is pretty large.
Is there any other way to implement this? Perhaps using HTTP caching headers? I would also need some way to invalidate the cache at some point to "push" the new changes to the browser...
The usual approach to handling problems like this is with HTTP caching headers, combined with smart construction of URLs for resources referenced by your pages.
The general idea is this: every resource loaded by your page (images, scripts, CSS files, etc.) should have a unique, versioned URL. For example, instead of loading /images/button.png, you'd load /images/button_v123.png and when you change that file its URL changes to /images/button_v124.png. Typically this is handled by URL rewriting over static file URLs, so that, for example, the web server knows that /images/button_v124.png should really load the /images/button.png file from the web server's file system. Creating the version numbers can be done by appending a build number, using a CRC of file contents, or many other ways.
Then you need to make sure that, wherever URLs are constructed in the parent page, they refer to the versioned URL. This obviously requires dynamic code used to construct all URLs, which can be accomplished either by adjusting the code used to generate your pages or by server-wide plugins which affect all text/html requests.
Then, you then set the Expires header for all resource requests (images, scripts, CSS files, etc.) to a date far in the future (e.g. 10 years from now). This effectively caches them forever. This means that all requests loaded by each of your pages will be always be fetched from cache; cache invalidation never happens, which is OK because when the underlying resource changes, the parent page will use a new URL to find it.
Finally, you need to figure out how you want to cache your "parent" pages. How you do this is a judgement call. You can use ETag/If-None-Match HTTP headers to check for a new version of the page every time, which will very quickly load the page from cache if the server reports that it hasn't changed. Or you can use Expires (and/or Max-Age) to reload the parent page from cache for a given period of time before checking the server.
If you want to do something even more sophisticated, you can always put a custom proxy server on the kiosk-- in that case you'd have total, centralized control over how caching is done.