caching snippets (modX) - caching

I was simple cruising through the modx options and i noticed the option to cache snippets. I was wondering what kind of effect this would have (downsides) to my site. I know that caching would improve the loading time of the site by keeping them 'cached' after the first time and then only reloading the updates but this all seems to good to be true. My question is simple: are there any downsides to caching snippets? Cheers, Marco.

Great question!
The first rule of Modx is (almost) always cache. They've said so in their own blog.
As you said, the loading time will be lower. Let's just get the basics on the floor first. When you chose to cache a page, the page with all the output is stored as a file in your cache-folder. If you have a small and simple site, you might not see the biggest difference in caching and not, but if you have a complex one with lots of chunks-in-chunks, snippets parsing chunks etc, the difference is enormous. Some of the websites I've made goes down 15-30 levels to parse the content in come sections. Loading all this fresh from the database can take up to a coupe of seconds, while loading a flat-file would take only a few microseconds. There is a HUGE difference (remember that).
Now. You can cache both snippets and chunks. Important to remember. You can also cache one chunk while uncache the next level. Using Modx's brilliant markup, you can chose what to cache and what to uncache, but in general you want as much as possible cached.
You ask about the downside. There are none, but there are a few cases where you can't use cached snippets/chunks. As mentioned earlier, the cached response is divided into each page. That means that if you have a page (or url or whatever you want to call it), where you display different content based on for example GET-parameters. You can't cache a search-result (because the content changes) or a page with pagination (?page=1, ?page=2 etc would produce different output on the same page). Another case is when a snippet's output is random/different every time. Say you put a random quotes in your header, this needs to be uncached, or you will just see the first random result every time. In all other cases, use caching.
Also remember that every time you save a change in the manager, the cache will be wiped. That means that if you for example display the latest news-articles on your frontpage, this can still be cached because it will not display different content until you add/edit a resource, and then the cache will be cleared.
To sum it all up. Caching is GREAT and you should use it as much as possible. I usually make all my snippets/chunks cached, and if I crash into problems, that is the first thing I check.
Using caching makes your webserver respond quicker (good for the user) and produces fewer queries to the database (good for you). All in all. Caching is a gift. Use it.

There's no downsides to caching and honestly I wonder what made you think there were downsides to it?
You should always cache everything you can - there's no point in having something be executed on every page load when it's exactly the same as before. By caching the output and the source, you bypass the need for processing time and improve performance.
Assuming MODX Revolution (2.x), all template tags you use can be called both cached and uncached.
Cached:
[[*pagetitle]]
[[snippet]]
[[$chunk]]
[[+placeholder]]
[[%lexicon]]
Uncached:
[[!*pagetitle]] - this is pointless
[[!snippet]]
[[!$chunk]]
[[!+placeholder]]
[[!%lexicon]]
In MODX Evolution (1.x) the tags are different and you don't have as much control.
Some time ago I wrote about caching in MODX Revolution on my blog and I strongly encourage you to check it out as it provides more insight into why and how to use caching effectively: https://www.markhamstra.com/modx/2011/10/caching-guidelines-for-modx-revolution/
(PS: If you have MODX specific questions, I'd suggest posting them on forums.modx.com - there's a larger MODX audience there that can help)

Related

Magnolia "Page Aware" Caching

I am using Magnolia v5.7.1 and just configured the advanced cache module for site aware caching. Before that, the default behavior was to flush all caches in case of any (activation, import, edit) in a workspace. Using the advanced cache module, if any content on a specific site is changed, only the corresponding caches are flushed. So far, so good.
Now, let's say pages A and B are cached. If page A is changed, this will flush the cache for page A and B (as long as both pages are on the same site). I am wondering if there is a good reason that the default behavior isn't the following: If page A is changed, only the cache for page A gets flushed.
I know it's possible to implement my own FlushPolicy, however, this seems to be a difficult task and perhaps I miss a good reason why "page aware" caching can't be done.
The good reason to flush all is that change in one page might affect others. It is quite common to eg. generate menu from page strucuture, thus renaming one page would affect all pages that show menu. Or eg. have teaser component in a page that will take abstract text and image from the page it is teasing. And so on. In short, without calculating dependency graph there is no way for system to know which pages rendering might be affected by which other page changes. And in some cases it can be near impossible to know. Imagine event calendar page, with subpages for each event. And calendar being composed by query that searches for all events in current month. Since dynamic query is involved, calculating dependency graph just gets even more complicated.
That said it would still be possible to calculate dependencies and flush only affected pages, but in reality, for most cases, effort (cpu time) to calculate such graph is bigger than simply flushing all and re-rendering the pages as rendering is cheap (except special cases). On top of that it's also much faster to drop all items in the cache than retrieving them one by one and flushing only those that need to be flushed.
TLDR; For majority of the websites, it's not worth the effort to try to do very smart page cache management as costs outweigh the benefits w/o really affecting the performance.

How does varnish deal with dynamic content?

I am studying up on caching and I am looking into varnish for caching. I am wondering though how does varnish deal with dynamically generated content?
All over the place people are saying you shouldn't really cache content that might change a lot but on the other hand when I look at the response headers for stackoverflow I see pages being served up via varnish.
Content here changes by the second so how does this even work? Excuse me if it's a bit of a simple question, I will research some more while this question is up.
You need to define dynamic :
if the content depends on the user (through Cookies for example), it should not be cached as you'll have lots of different contents and your HIT/MISS ration will not be high since every user has a different content.
if the content changes in time, you can always cache the content a little, for example a few seconds.
if the content changes in time, a better option is to separate the "static content" from the dynamic one. You may cache the page template and do ajax calls to refresh the content. You may also use esi, it's an old technology but it lets you specify differents "zones" in your pages, each having its cache duration.
you can benefit from IMS requests. Telling the backend to send the response body only if it has changed since the last request can save you lots of processing time. I think varnish does this from version 4
As for the stackoverflow architecture, you may learn a lot reading Nick Craver's blog post about it : http://nickcraver.com/blog/2016/02/17/stack-overflow-the-architecture-2016-edition/

How does scary transcluding caching work on MediaWiki?

We are running a small wiki farm (same topic; six languages and growing) and have recently updated most templates to use several layers of meta-templates in order to facilitate maintenance and readability.
We wish to standardise those templates for all languages, therefore most of them are going to contain the exact same code on each wiki. This is why, in order to further simplify maintenance, we are considering the use of scary transcluding (more specifically, substitution) so that those meta-templates are only stored on one wiki and only have to be updated on that wiki, not on every single version.
(Note: if you can think of a better idea, don't hesitate to comment on this post!)
However, scary transcluding is called so for being scarily inefficient, therefore I need to know more about the way content included that way is cached by MediaWiki.
If I understand correctly, the HTML output of a page is stored in the parser cache for a duration of $wgParserCacheExpireTime. The default is 1 day, but it's safe to increase it on a small to medium wiki because the content will get updated anyway if the page itself or an included page is updated (and in some other minor cases).
There's also a cache duration for scary transcluding: $wgTranscludeCacheExpiry. Good, because you wouldn't want to make that HTTP call every time. However, the default value of 1 hour is not suitable for smaller wikis, on which an article may only be viewed every now and then, therefore rendering that cache absolutely useless.
If a page A uses a template B that includes template C from another wiki, does page A have to be entirely regenerated after $wgTranscludeCacheExpiry has been exceeded? Or can it still make use of the parser cache of template B until $wgParserCacheExpireTime has been exceeded?
You could then increase $wgTranscludeCacheExpiry to a month, just like the parser cache, but a page wouldn't get updated automatically if the transcluded template was, would it?
If yes, would refreshing the pages using that transcluded template be the only solution to update the other wikis?
IMHO the solution to find out is simple: try it! $wgScaryTranscluding is rarely used, but the few who tried enabling it reported having very few problems. There are also JavaScript-based alternatives, see the manual.
Purging is rarely a big issue: a crosswiki template is unlikely to contain stuff you absolutely want to get out right now. If the cache doesn't feel aggressive enough for you, set it to a week or month and see if something goes wrong. Ilmari Karonen suggests such a long cache even for HTML after all.

fastest etag algorithm

We want to make use of http caching on our website - in particular content validation.
Because our CMS constructs pages from smaller fragments of content, the last modified date of the actual page is not always an accurate indicator that the page has changed. Hence we also want to make use of etags. Because page construction is based on lots of other page fragments we think the only real way to provide an accurate etag is by performing some sort of digest on the content stream itself. This seems a little over cooked as caching is supposed to ease the load off the servers but a content digest is obviously CPU intensive.
I'm looking for the fastest algorithm to create a unique etag that is relevant to the content stream (inode etc just is a kludge and wont work). An MD5 hash is obviously going to get the best unique result but is anybody else making use of other algorithms that are faster in a similar situation?
Sorry forgot the important details... Using Java Servlets - running in websphere 6.1 on windows 2003.
I forgot to mention that there are also live database feeds (we're a bank and need to make sure interest rates are up to date) that can also change the content. So figuring out when content has changed can be tricky to determine.
I would generate a checksum for each fragment, but compute it when the fragment is changed, not when you render the page.
This way, you pay a one-time cost, which should be relatively small, unless we're talking hundreds of changes per second, and there is no additional cost per request.

Strategies for Caching on the Web? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
What concerns, processes, and questions do you take into account when deciding when and how to cache. Is it always a no win situation?
This presupposes you are stuck with a code base that has been optimized.
I have been working with DotNetNuke most recently for web applications and there are a number of things that I consider each time I implement caching solutions.
Do all users need to see cached content?
How often does each bit of content change?
Can I cache the entire page?
Do I need a manual way to purge the cache?
Can I use a single cache mechanism for the entire site, or do I need multiple solutions?
What impacts occur if informaiton is somehow out of date?
I would look at each feature of your website/application a decided for each feature:
Should it be cached?
How long should it be cached for?
When should the cache be expunged?
I would personally go against caching whole pages in favour of caching sections of the website/application.
First off, if your code is optimized as you said, you will only see noticable performance benefits when the site is being hammered with a lot of requests.
However, It is faster to pull resources from RAM than from the disk, so your web server will be able to handle more requests if you have a caching strategy in place.
As for knowing when you're going to need caching, consider that even low end modern web servers can handle hundreds of requests per second, so unless you expect a decent amount of traffic, caching is probably something you can just skip.
Also, if you are pulling content from your database (for example, StackOverflow probably does this) caching can be very helpful because database operations are relatively expensive and can be a huge bottleneck in high-volume situations.
As for a scenario when it's not appropriate to cache or when caching becomes difficult... If you try to cache a dynamic page that, say, displays the current date and time, you will constantly see an old date/time unless you get a little more involved with your caching strategy. So that's something to think about.
What language are you using? With ASP you have some very easy caching with only adding some property tag over the method and the value is cached depending of the time.
If you want more control over the cache, you can use some popular system like MemCached and have a control with time or by event.
Yahoo for example "versions" their JavaScript, so your browser downloads code-1.2.3.js and when a new version appears they reference that version. By doing this they can make their Javascript code cacheable for a very-very long time.
As for the general answer I think it depends on your data, on how often does it change. For example, images don't change very often, but html pages do. The "About us" page doesn't change too often, but the news section does.
You can cache by time. This is useful for data that change fast. You can set time for 30 sec or 1 min. Of course, this require some traffic. More traffic you have, more you can play with the time because if you have 1 visit every hour, this visit will be populate the cache and not using it...
You can cache by event... if your data change, you update the cache... this is one very useful if the data need to be accurate for the user very fast.
You can cache static content that you know that won't change ofen. If you have a top 10 of the day that refresh every day, than you can stock all in the cache and update every day.
Where available, look out for whole object memory caching. In ASPNET, this is a built-in feature where you can just plant your business logic objects in the IIS Application and access them from there.
This means you can store everything you need to generate a page in memory (persisting writes to database) and generate a page without ANY database IO.
You still need to use the page-building logic to generate the page, but you save a lot of time in getting the data.
Other techniques involve localised output caching, where you capture the output before sending and save it to file. This is great for static sections (like navigation on certain pages, or text bodies) and include them out when they're requested. Most implementations purge cached objects like this when a write happens or after a certain period of time.
Then there's the least "accurate": whole page caching. It's the highest performer but it's pretty useless unless you have very simple pages.
What kind of caching? Server side caching? Client side caching?
Client side caching is a no-brainer with certain things, like Static HTML, SWFs and images. Figure out how often the assets are likely to change, and set up "Expires" headers as appropriate. (2 days? 2 weeks? 2 months?)
Dynamic pages, by definition, are a little harder to cache. There have been some explorations in caching of certain chunks using Javascript (and degrading to IFrames if JS is not available.) This however, might be a little more difficult to retrofit into an existing site.
DB and application level caching may, or may not work, depending on your situation. That really depends on where your bottlenecks are. Figuring out where your application spends the most time on page-rendering is probably priority 1, then you can start looking at where and how to cache.

Resources