Magnolia "Page Aware" Caching - caching

I am using Magnolia v5.7.1 and just configured the advanced cache module for site aware caching. Before that, the default behavior was to flush all caches in case of any (activation, import, edit) in a workspace. Using the advanced cache module, if any content on a specific site is changed, only the corresponding caches are flushed. So far, so good.
Now, let's say pages A and B are cached. If page A is changed, this will flush the cache for page A and B (as long as both pages are on the same site). I am wondering if there is a good reason that the default behavior isn't the following: If page A is changed, only the cache for page A gets flushed.
I know it's possible to implement my own FlushPolicy, however, this seems to be a difficult task and perhaps I miss a good reason why "page aware" caching can't be done.

The good reason to flush all is that change in one page might affect others. It is quite common to eg. generate menu from page strucuture, thus renaming one page would affect all pages that show menu. Or eg. have teaser component in a page that will take abstract text and image from the page it is teasing. And so on. In short, without calculating dependency graph there is no way for system to know which pages rendering might be affected by which other page changes. And in some cases it can be near impossible to know. Imagine event calendar page, with subpages for each event. And calendar being composed by query that searches for all events in current month. Since dynamic query is involved, calculating dependency graph just gets even more complicated.
That said it would still be possible to calculate dependencies and flush only affected pages, but in reality, for most cases, effort (cpu time) to calculate such graph is bigger than simply flushing all and re-rendering the pages as rendering is cheap (except special cases). On top of that it's also much faster to drop all items in the cache than retrieving them one by one and flushing only those that need to be flushed.
TLDR; For majority of the websites, it's not worth the effort to try to do very smart page cache management as costs outweigh the benefits w/o really affecting the performance.

Related

Conditional Incremental builds in Nextjs

Context
I am learning Nextjs which is a framework for developing react applications quickly by providing many functionalities out of the box such as Server Side Rendering, Fast Refresh and many others out of the box without any configuration. It also provides a functionality to optionally generate some web pages statically which are pre rendered at build time instead of rendering on demand. It achieves it by querying the data required for the page at build time. Nextjs also provides an optional argument expressed in seconds after which the data is re queried and the page re rendered. All of it happens on page level rather than rebuilding the entire website.
Problems
We cannot know in advance how frequently data would change, the data may change after 1 second or 10 minutes and it is impossible to know in advance and extremely hard to predict. However, it is most certainly not a constant number of seconds. With this approach, I might show outdated information due to higher time limit or I might end up querying the database unnecessarily even if data hasn't changed.
Suppose I have implemented some sort of pagination and I want to exploit the fact that most users would only visit first few pages before going to a different link. I could statically pre render first 1000 pages, so the most visited pages are served statically without going to the database whereas the rest are server side rendered. Now, if my data might change frequently, I would have to re render the first 1000 pages after regular intervals and each page would issue a separate query against the same database or external API which would cause too many round trips. I am not aware of the details of Nextjs but I suspect this would be true because Nextjs does not assume anything about the function which pulls the data and a generic implementation would necessitate it.
Attempted Solution
Both problems can be solved by client or server side rendering because the data would be fetched on demand but we lose the benefits of static generation specifically serving static assets compared to querying the database. I believe static generation would be useful if mutations to my data happen infrequently most of the time but we still want to show the updated information as fast as we can when it becomes available.
If I forget about Nextjs for a a while, both problems can be solved by spawning a new process which listens for mutations to the relevant data and only rebuilds those static assets which needs to be updated; kind of like React updates components but on server side. However Nextjs offers a lot of functionalities which would be difficult to replicate, so I cannot use this approach.
If I want to use Nextjs, problem (1) seems impossible to solve due to (perceived?) limitation of Nextjs which only offers one way to rebuild static pages, periodically re render them after a predetermined time. However, (2) can be solved by using some sort of in memory cache which pulls all the required data from the data store in one round trip and structures it up for every page. Then every page will pull data from this cache instead of the database. However, it looks like a hack to me.
Questions
Are there other ways to deal with the problem I might have have missed?
Is there a built-in way to deal with problem (1) and (2) in Nextjs?
Is my assessment of attempted solutions and their viability correct?

what is the best practice for coding user page?

I just need a quick answer, I would like to know what is the best practice for coding an user page? Should i create 2 differents pages one for myself and one for other user's or only create one page, and modify the page programatically ( ex:if personnal page do this , else do that)?
A quick answer: from network/browser performance perspective it is better to create as consize pages as they could be. I.e. separate functionality (view-only/view-modify) should get a separate page since you would load different resources for each one and the functionality you don't need would affect the loading time (downloading, parsing, compiling, rendering).
Though it really depends on the environment of your website (network speed, user's device hardware) and size of your resources, since saving 5Kb between a view-only and a view-modify template would be an overkill.
So the best practice would be the following: understand your website requirements (page loading time, page navigation time), understand your website limitations (page size, network speed, user's device hardware), understand your time bounds (when your website need to go live) and basing on that choose your architecture.

caching snippets (modX)

I was simple cruising through the modx options and i noticed the option to cache snippets. I was wondering what kind of effect this would have (downsides) to my site. I know that caching would improve the loading time of the site by keeping them 'cached' after the first time and then only reloading the updates but this all seems to good to be true. My question is simple: are there any downsides to caching snippets? Cheers, Marco.
Great question!
The first rule of Modx is (almost) always cache. They've said so in their own blog.
As you said, the loading time will be lower. Let's just get the basics on the floor first. When you chose to cache a page, the page with all the output is stored as a file in your cache-folder. If you have a small and simple site, you might not see the biggest difference in caching and not, but if you have a complex one with lots of chunks-in-chunks, snippets parsing chunks etc, the difference is enormous. Some of the websites I've made goes down 15-30 levels to parse the content in come sections. Loading all this fresh from the database can take up to a coupe of seconds, while loading a flat-file would take only a few microseconds. There is a HUGE difference (remember that).
Now. You can cache both snippets and chunks. Important to remember. You can also cache one chunk while uncache the next level. Using Modx's brilliant markup, you can chose what to cache and what to uncache, but in general you want as much as possible cached.
You ask about the downside. There are none, but there are a few cases where you can't use cached snippets/chunks. As mentioned earlier, the cached response is divided into each page. That means that if you have a page (or url or whatever you want to call it), where you display different content based on for example GET-parameters. You can't cache a search-result (because the content changes) or a page with pagination (?page=1, ?page=2 etc would produce different output on the same page). Another case is when a snippet's output is random/different every time. Say you put a random quotes in your header, this needs to be uncached, or you will just see the first random result every time. In all other cases, use caching.
Also remember that every time you save a change in the manager, the cache will be wiped. That means that if you for example display the latest news-articles on your frontpage, this can still be cached because it will not display different content until you add/edit a resource, and then the cache will be cleared.
To sum it all up. Caching is GREAT and you should use it as much as possible. I usually make all my snippets/chunks cached, and if I crash into problems, that is the first thing I check.
Using caching makes your webserver respond quicker (good for the user) and produces fewer queries to the database (good for you). All in all. Caching is a gift. Use it.
There's no downsides to caching and honestly I wonder what made you think there were downsides to it?
You should always cache everything you can - there's no point in having something be executed on every page load when it's exactly the same as before. By caching the output and the source, you bypass the need for processing time and improve performance.
Assuming MODX Revolution (2.x), all template tags you use can be called both cached and uncached.
Cached:
[[*pagetitle]]
[[snippet]]
[[$chunk]]
[[+placeholder]]
[[%lexicon]]
Uncached:
[[!*pagetitle]] - this is pointless
[[!snippet]]
[[!$chunk]]
[[!+placeholder]]
[[!%lexicon]]
In MODX Evolution (1.x) the tags are different and you don't have as much control.
Some time ago I wrote about caching in MODX Revolution on my blog and I strongly encourage you to check it out as it provides more insight into why and how to use caching effectively: https://www.markhamstra.com/modx/2011/10/caching-guidelines-for-modx-revolution/
(PS: If you have MODX specific questions, I'd suggest posting them on forums.modx.com - there's a larger MODX audience there that can help)

When is it better to generate a static page or dynamically generate?

The title pretty much sums up my question.
When is it more efficient to generate a static page, that a user can access, as apposed to using dynamically generated pages that query a database? As in what situations would one be better than the other.
To serve up a static page, your web server just needs to read the page off the disk and send it. Virtually no processing will be required. If the page is frequently accessed, it will probably be cached in memory, so even the disk access will not be needed.
Generating pages dynamically obviously has more overhead. There is a cost for every DB access you make, no matter how simple the query is. (On a project I worked on recently, I measured a minimum overhead of 0.7ms for each query, even for SELECT 1;) So if you can just generate a static page and save it to disk, page accesses will be faster. How much faster? It just depends on how much work is being done to generate the page dynamically. We don't know what you are doing, so we can't comment on that.
Now, if you generate a static page and save it to disk, that means you need to re-generate it every time the data which went into generating that page changes. If the data changes more often than the page is actually accessed, you could be doing more work rather than less! But in most cases, that's a very unlikely situation.
More likely, the biggest problem you will experience from generating static pages and saving them to disk is coding (and maintaining) the logic for re-generating the pages whenever necessary. You will need to keep track of exactly what data goes into each page, and in every place in the code where data can be changed, you will need to invoke re-generation of all the relevant pages. If you forget just one, then your users may be looking at stale data some of the time.
If you mix dynamic generation per-request and caching generated pages on disk, then your code will be harder to read and maintain, because of mixing the two styles.
And you can't really cache generated pages on disk in certain situations -- like responding to POST requests which come from a form submission. Or imagine that when your users invoke certain actions, you have to send a request to a 3rd party API, and the data which comes back from that API will be used in the page. What comes back from the API may be different each time, so in this case, you need to generate the page dynamically each time.
Static pages (or better resources) are filled with content, that does not change or at least not often, and does not allow further queries on it: About Page, Contact, ...
In this case it doesn't make any sense to query these pages. On the other side we have Data (e.g. in a Database) and want to query it/give the user the opportunity to query it. In this case you give the User a page with the possibility to specify the query and return a rendered page with the dynamically generated data.
In my opinion it depends on the result you want to present to the user. Either it is only an information or it is the possibility to query a Datasource. The first result is known before you do something, the second (query data) is known after you have the query parameters, which means you don't know the result beforehand (it could be empty or invalid).
It depends on your architecture, but when you consider that GET Requests should be idempotent it should be also easy to cache dynamic Pages with a Proxy, and invalidate the cache, when something new happens to the data which is displayed on the cached path. In this case one could save a lot of time, because the system behaves like the cached pages would be static, but instead coming from the filesystem, they come from your memory, which is really fast.
Cheers
Laidback

Strategies for Caching on the Web? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
What concerns, processes, and questions do you take into account when deciding when and how to cache. Is it always a no win situation?
This presupposes you are stuck with a code base that has been optimized.
I have been working with DotNetNuke most recently for web applications and there are a number of things that I consider each time I implement caching solutions.
Do all users need to see cached content?
How often does each bit of content change?
Can I cache the entire page?
Do I need a manual way to purge the cache?
Can I use a single cache mechanism for the entire site, or do I need multiple solutions?
What impacts occur if informaiton is somehow out of date?
I would look at each feature of your website/application a decided for each feature:
Should it be cached?
How long should it be cached for?
When should the cache be expunged?
I would personally go against caching whole pages in favour of caching sections of the website/application.
First off, if your code is optimized as you said, you will only see noticable performance benefits when the site is being hammered with a lot of requests.
However, It is faster to pull resources from RAM than from the disk, so your web server will be able to handle more requests if you have a caching strategy in place.
As for knowing when you're going to need caching, consider that even low end modern web servers can handle hundreds of requests per second, so unless you expect a decent amount of traffic, caching is probably something you can just skip.
Also, if you are pulling content from your database (for example, StackOverflow probably does this) caching can be very helpful because database operations are relatively expensive and can be a huge bottleneck in high-volume situations.
As for a scenario when it's not appropriate to cache or when caching becomes difficult... If you try to cache a dynamic page that, say, displays the current date and time, you will constantly see an old date/time unless you get a little more involved with your caching strategy. So that's something to think about.
What language are you using? With ASP you have some very easy caching with only adding some property tag over the method and the value is cached depending of the time.
If you want more control over the cache, you can use some popular system like MemCached and have a control with time or by event.
Yahoo for example "versions" their JavaScript, so your browser downloads code-1.2.3.js and when a new version appears they reference that version. By doing this they can make their Javascript code cacheable for a very-very long time.
As for the general answer I think it depends on your data, on how often does it change. For example, images don't change very often, but html pages do. The "About us" page doesn't change too often, but the news section does.
You can cache by time. This is useful for data that change fast. You can set time for 30 sec or 1 min. Of course, this require some traffic. More traffic you have, more you can play with the time because if you have 1 visit every hour, this visit will be populate the cache and not using it...
You can cache by event... if your data change, you update the cache... this is one very useful if the data need to be accurate for the user very fast.
You can cache static content that you know that won't change ofen. If you have a top 10 of the day that refresh every day, than you can stock all in the cache and update every day.
Where available, look out for whole object memory caching. In ASPNET, this is a built-in feature where you can just plant your business logic objects in the IIS Application and access them from there.
This means you can store everything you need to generate a page in memory (persisting writes to database) and generate a page without ANY database IO.
You still need to use the page-building logic to generate the page, but you save a lot of time in getting the data.
Other techniques involve localised output caching, where you capture the output before sending and save it to file. This is great for static sections (like navigation on certain pages, or text bodies) and include them out when they're requested. Most implementations purge cached objects like this when a write happens or after a certain period of time.
Then there's the least "accurate": whole page caching. It's the highest performer but it's pretty useless unless you have very simple pages.
What kind of caching? Server side caching? Client side caching?
Client side caching is a no-brainer with certain things, like Static HTML, SWFs and images. Figure out how often the assets are likely to change, and set up "Expires" headers as appropriate. (2 days? 2 weeks? 2 months?)
Dynamic pages, by definition, are a little harder to cache. There have been some explorations in caching of certain chunks using Javascript (and degrading to IFrames if JS is not available.) This however, might be a little more difficult to retrofit into an existing site.
DB and application level caching may, or may not work, depending on your situation. That really depends on where your bottlenecks are. Figuring out where your application spends the most time on page-rendering is probably priority 1, then you can start looking at where and how to cache.

Resources