How does scary transcluding caching work on MediaWiki? - caching

We are running a small wiki farm (same topic; six languages and growing) and have recently updated most templates to use several layers of meta-templates in order to facilitate maintenance and readability.
We wish to standardise those templates for all languages, therefore most of them are going to contain the exact same code on each wiki. This is why, in order to further simplify maintenance, we are considering the use of scary transcluding (more specifically, substitution) so that those meta-templates are only stored on one wiki and only have to be updated on that wiki, not on every single version.
(Note: if you can think of a better idea, don't hesitate to comment on this post!)
However, scary transcluding is called so for being scarily inefficient, therefore I need to know more about the way content included that way is cached by MediaWiki.
If I understand correctly, the HTML output of a page is stored in the parser cache for a duration of $wgParserCacheExpireTime. The default is 1 day, but it's safe to increase it on a small to medium wiki because the content will get updated anyway if the page itself or an included page is updated (and in some other minor cases).
There's also a cache duration for scary transcluding: $wgTranscludeCacheExpiry. Good, because you wouldn't want to make that HTTP call every time. However, the default value of 1 hour is not suitable for smaller wikis, on which an article may only be viewed every now and then, therefore rendering that cache absolutely useless.
If a page A uses a template B that includes template C from another wiki, does page A have to be entirely regenerated after $wgTranscludeCacheExpiry has been exceeded? Or can it still make use of the parser cache of template B until $wgParserCacheExpireTime has been exceeded?
You could then increase $wgTranscludeCacheExpiry to a month, just like the parser cache, but a page wouldn't get updated automatically if the transcluded template was, would it?
If yes, would refreshing the pages using that transcluded template be the only solution to update the other wikis?

IMHO the solution to find out is simple: try it! $wgScaryTranscluding is rarely used, but the few who tried enabling it reported having very few problems. There are also JavaScript-based alternatives, see the manual.
Purging is rarely a big issue: a crosswiki template is unlikely to contain stuff you absolutely want to get out right now. If the cache doesn't feel aggressive enough for you, set it to a week or month and see if something goes wrong. Ilmari Karonen suggests such a long cache even for HTML after all.

Related

caching snippets (modX)

I was simple cruising through the modx options and i noticed the option to cache snippets. I was wondering what kind of effect this would have (downsides) to my site. I know that caching would improve the loading time of the site by keeping them 'cached' after the first time and then only reloading the updates but this all seems to good to be true. My question is simple: are there any downsides to caching snippets? Cheers, Marco.
Great question!
The first rule of Modx is (almost) always cache. They've said so in their own blog.
As you said, the loading time will be lower. Let's just get the basics on the floor first. When you chose to cache a page, the page with all the output is stored as a file in your cache-folder. If you have a small and simple site, you might not see the biggest difference in caching and not, but if you have a complex one with lots of chunks-in-chunks, snippets parsing chunks etc, the difference is enormous. Some of the websites I've made goes down 15-30 levels to parse the content in come sections. Loading all this fresh from the database can take up to a coupe of seconds, while loading a flat-file would take only a few microseconds. There is a HUGE difference (remember that).
Now. You can cache both snippets and chunks. Important to remember. You can also cache one chunk while uncache the next level. Using Modx's brilliant markup, you can chose what to cache and what to uncache, but in general you want as much as possible cached.
You ask about the downside. There are none, but there are a few cases where you can't use cached snippets/chunks. As mentioned earlier, the cached response is divided into each page. That means that if you have a page (or url or whatever you want to call it), where you display different content based on for example GET-parameters. You can't cache a search-result (because the content changes) or a page with pagination (?page=1, ?page=2 etc would produce different output on the same page). Another case is when a snippet's output is random/different every time. Say you put a random quotes in your header, this needs to be uncached, or you will just see the first random result every time. In all other cases, use caching.
Also remember that every time you save a change in the manager, the cache will be wiped. That means that if you for example display the latest news-articles on your frontpage, this can still be cached because it will not display different content until you add/edit a resource, and then the cache will be cleared.
To sum it all up. Caching is GREAT and you should use it as much as possible. I usually make all my snippets/chunks cached, and if I crash into problems, that is the first thing I check.
Using caching makes your webserver respond quicker (good for the user) and produces fewer queries to the database (good for you). All in all. Caching is a gift. Use it.
There's no downsides to caching and honestly I wonder what made you think there were downsides to it?
You should always cache everything you can - there's no point in having something be executed on every page load when it's exactly the same as before. By caching the output and the source, you bypass the need for processing time and improve performance.
Assuming MODX Revolution (2.x), all template tags you use can be called both cached and uncached.
Cached:
[[*pagetitle]]
[[snippet]]
[[$chunk]]
[[+placeholder]]
[[%lexicon]]
Uncached:
[[!*pagetitle]] - this is pointless
[[!snippet]]
[[!$chunk]]
[[!+placeholder]]
[[!%lexicon]]
In MODX Evolution (1.x) the tags are different and you don't have as much control.
Some time ago I wrote about caching in MODX Revolution on my blog and I strongly encourage you to check it out as it provides more insight into why and how to use caching effectively: https://www.markhamstra.com/modx/2011/10/caching-guidelines-for-modx-revolution/
(PS: If you have MODX specific questions, I'd suggest posting them on forums.modx.com - there's a larger MODX audience there that can help)

What's the optimal amount of queries an ExpressionEngine page should load?

I saw #parscale tweet: How many queries are you happy with for a home page? When do you say this is Optimized?
I saw responses that < 50 is good, 30 or less is best, and 100+ is danger zone. Is there really any proper number? And if say you do have > 50 queries running on your pages, what are some ways to bring it down?
I generally have sites that run the gamut that are under 50 queries and some more, though the "more" don't seem to be too slow, I'm always interested in making it faster. How?
How to reduce queries will vary from site to site, template to template, but there's been a few articles on EE optimisation and performance:
http://expressionengine.com/wiki/Reduce_Queries/
http://expressionengine.com/blog/entry/troubleshooting_site_performance_issues/
http://www.netmagazine.com/tutorials/optimise-your-expressionengine-site
http://www.leezilla.net/post/12377053779/ab-seeing-your-sites-performance
http://eeinsider.com/articles/using-cache-wisely-with-expressionengine/
But if you've done all that and still need to speed things up, then your next step is to look at add-ons like CE Cache.
Thing to remember is not all queries are created equal. You can have 1,000 queries that do very little in the way of impacting performance, or a single query that can slow everything way down.
In EE its actually better to look at the template debug output and identify key slow down spots in the template build then to always focus on just the query count.
As others have pointed out products like CE Cache, Solspace's Template Morsels, or even adding a varnish caching server in-front of an intensive EE web site can do wonders, though with the added work required to fully get a varnish setup in front of EE setup, I would currently stick to the other solutions/directions first.
There is not a magic query number. In my opinion, your server environment dictates what can be supported. The more resources you have, the more complex your code can be.
With that said, there are lots of options you can use if issues do arise on an EE website. The links in the answer above give you a solid list but here are some first things to check:
Remove search:field_name="" parameters
Reduce use of channel tags, combine if you can
Add disable="" parameter to channel tabs to disable what you don't need
Reduce use of embeds
Turn off all EE tracking code
Stop using advanced conditionals if you have a channel tag inside
Following on from Nevin's point. I find that the JB Graphite is a huge help, it turns the debug output into a pretty graph, so you can easily spot bottleneck queries.
http://devot-ee.com/add-ons/jb-graphite
I'll expand on MediaGirl's point number 6 - you can often greatly simplify conditionals by using Croxton's Ifelse and/or Switchee add-ons. Definitely worth a look.
I used CE Cache on a really intensive build and it reduced page load from 6 seconds to 0.7 seconds. Awesome addpon, with incredible documentation and the best support you can get anywhere.

Web site performance - what is it about? Readability and ignorance Vs. Performance

Let me cut to the chase...
On one hand, many of the programming advices given (here and on other places) emphasize the notion that code should always be as readable and as clear as possible, at (almost?!) any pefromance cost.
On the other hand there are SO many slow web sites (at least one of whom, I know from personal experience).
Obviously round trips and DB access, are issues a web developer should always keep in mind. But the trade-off between readability and what not to do because it slows things down, for me is very unclear.
Question are- 1.What else? 2.Is there a rule (preferably simple, but probably quite general) one should adhere to in order to make sure his code does not slow things down too much?
General best practices as well as specific advices would be much appreciated. Advices based on experience would be especially appreciated.
Thanks.
Edit: A little clarification: General peformance advices aren't hard to find. That's not what I'm looking for. I'm asking about two things- 1. While trying to make my code as readable as possible, when should I stop and say: "Now I'm hurting performance too much". 2. Little, less known things like- is selecting just one column faster than selecting all (Thanks Otávio)... Thanks again!
See the Stack Overflow discussion here:
What is the most important effect on performance in a database-backed web application?
The top voted answer was, "write it clean, and use a profiler to identify real problems and address them."
In my experience, the biggest mistake (using C#/asp.net/linq) is over-querying due to LINQ's ease-of-use. One huge query is usually much faster than 10000 small ones.
The other ASP.NET gotcha I see a lot is when the viewstate gets extremely fat and bloated. EnableViewState=false is your best friend, start every new project with it!
For web applications that have a database back end, it is extremely important that:
indexing is done properly
retrieval is done for what is needed (avoid select * when selecting specific fields will do - even more so if they are part of a covered index)
Also, whenever possible an appropriate caching strategy can help performance
Optimizing your code.
While making your code as readable as possible is very important. Optimizing it is equally as important. I've listed some items that will hopefully get you in the right direction.
For example in regards to Databases:
When you define the schema of your database, you should make sure that it is normalized and the indexes of fields are defined properly.
When running a query, specifically SELECT, only select the fields you need.
You should only make one connection to the database per page load.
Re-factor. This is probably the most important factor in producing clean, optimized code. Always go back and look at your code and see what can be done to improve it.
PHP Code:
Always test your work with a tool like PHPUnit.
echo is faster than print.
Wrap your string in single quotes (‘) instead of double quotes (“) is faster because PHP searches for variables inside “…” and not in ‘…’, use this when you’re not using variables you need evaluating in your string.
Use echo’s multiple parameters (or stacked) instead of string concatenation.
Unset or null your variables to free memory, especially large arrays.
Use strict code, avoid suppressing errors, notices and warnings thus resulting in cleaner code and less overheads. Consider having error_reporting(E_ALL) always on.
Incrementing an undefined local variable is 9-10 times slower than a pre-initialized one.
Methods in derived classes run faster than ones defined in the base class.
Error suppression with # is very slow.
Website Optimization
A good place to start is here (http://developer.yahoo.com/performance/rules.html)
Performance is a huge topic and there are a lot of things that you can do to help improve the performance of your website. It's something that takes time and experience.
Best,
Richard Castera
Scott and Rcastera did a good job covering DB and querying optimization. To address your question from a HTML / CSS / JavaScript standpoint:
CSS:
Readability is key. CSS is rendered so fast that you should never feel it is necessary to sacrifice readability for performance. As such, focus on adding in as many comments as necessary to document the code, why certain rules (like hacks) are there, and whatever else floats your comment boat. In CSS there are a few obvious rules to follow: 1) Use external stylsheets. 2) Limit external stylesheets to limit GET requests.
HTML: Like CSS, HTML is read so fast by the browser you should really only focus on writing clean code. Use whitespace, indentation, and comments to properly document your work. Only major things in HTML to remember are: 1) declare the <meta charset /> early within the head section. 2) Follow this guys advice to minimize browser reflows. *this rule actually applies to CSS as well.
JavaScript: Most optimizations for JavaScript are really well known by now so these'll seem obvious, like initializing variables outside of loops, pushing javascript to bottom of body so DOM loads before scripts start tying up all of the resources, avoiding costly statements like eval() or with(). Not to sound like a broken record, but keeping a well commented and easily readable script should still be a priority when developing JavaScript code. Especially since you can just minimize and compress away all the excess when you deploy it.

fastest etag algorithm

We want to make use of http caching on our website - in particular content validation.
Because our CMS constructs pages from smaller fragments of content, the last modified date of the actual page is not always an accurate indicator that the page has changed. Hence we also want to make use of etags. Because page construction is based on lots of other page fragments we think the only real way to provide an accurate etag is by performing some sort of digest on the content stream itself. This seems a little over cooked as caching is supposed to ease the load off the servers but a content digest is obviously CPU intensive.
I'm looking for the fastest algorithm to create a unique etag that is relevant to the content stream (inode etc just is a kludge and wont work). An MD5 hash is obviously going to get the best unique result but is anybody else making use of other algorithms that are faster in a similar situation?
Sorry forgot the important details... Using Java Servlets - running in websphere 6.1 on windows 2003.
I forgot to mention that there are also live database feeds (we're a bank and need to make sure interest rates are up to date) that can also change the content. So figuring out when content has changed can be tricky to determine.
I would generate a checksum for each fragment, but compute it when the fragment is changed, not when you render the page.
This way, you pay a one-time cost, which should be relatively small, unless we're talking hundreds of changes per second, and there is no additional cost per request.

Strategies for Caching on the Web? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
What concerns, processes, and questions do you take into account when deciding when and how to cache. Is it always a no win situation?
This presupposes you are stuck with a code base that has been optimized.
I have been working with DotNetNuke most recently for web applications and there are a number of things that I consider each time I implement caching solutions.
Do all users need to see cached content?
How often does each bit of content change?
Can I cache the entire page?
Do I need a manual way to purge the cache?
Can I use a single cache mechanism for the entire site, or do I need multiple solutions?
What impacts occur if informaiton is somehow out of date?
I would look at each feature of your website/application a decided for each feature:
Should it be cached?
How long should it be cached for?
When should the cache be expunged?
I would personally go against caching whole pages in favour of caching sections of the website/application.
First off, if your code is optimized as you said, you will only see noticable performance benefits when the site is being hammered with a lot of requests.
However, It is faster to pull resources from RAM than from the disk, so your web server will be able to handle more requests if you have a caching strategy in place.
As for knowing when you're going to need caching, consider that even low end modern web servers can handle hundreds of requests per second, so unless you expect a decent amount of traffic, caching is probably something you can just skip.
Also, if you are pulling content from your database (for example, StackOverflow probably does this) caching can be very helpful because database operations are relatively expensive and can be a huge bottleneck in high-volume situations.
As for a scenario when it's not appropriate to cache or when caching becomes difficult... If you try to cache a dynamic page that, say, displays the current date and time, you will constantly see an old date/time unless you get a little more involved with your caching strategy. So that's something to think about.
What language are you using? With ASP you have some very easy caching with only adding some property tag over the method and the value is cached depending of the time.
If you want more control over the cache, you can use some popular system like MemCached and have a control with time or by event.
Yahoo for example "versions" their JavaScript, so your browser downloads code-1.2.3.js and when a new version appears they reference that version. By doing this they can make their Javascript code cacheable for a very-very long time.
As for the general answer I think it depends on your data, on how often does it change. For example, images don't change very often, but html pages do. The "About us" page doesn't change too often, but the news section does.
You can cache by time. This is useful for data that change fast. You can set time for 30 sec or 1 min. Of course, this require some traffic. More traffic you have, more you can play with the time because if you have 1 visit every hour, this visit will be populate the cache and not using it...
You can cache by event... if your data change, you update the cache... this is one very useful if the data need to be accurate for the user very fast.
You can cache static content that you know that won't change ofen. If you have a top 10 of the day that refresh every day, than you can stock all in the cache and update every day.
Where available, look out for whole object memory caching. In ASPNET, this is a built-in feature where you can just plant your business logic objects in the IIS Application and access them from there.
This means you can store everything you need to generate a page in memory (persisting writes to database) and generate a page without ANY database IO.
You still need to use the page-building logic to generate the page, but you save a lot of time in getting the data.
Other techniques involve localised output caching, where you capture the output before sending and save it to file. This is great for static sections (like navigation on certain pages, or text bodies) and include them out when they're requested. Most implementations purge cached objects like this when a write happens or after a certain period of time.
Then there's the least "accurate": whole page caching. It's the highest performer but it's pretty useless unless you have very simple pages.
What kind of caching? Server side caching? Client side caching?
Client side caching is a no-brainer with certain things, like Static HTML, SWFs and images. Figure out how often the assets are likely to change, and set up "Expires" headers as appropriate. (2 days? 2 weeks? 2 months?)
Dynamic pages, by definition, are a little harder to cache. There have been some explorations in caching of certain chunks using Javascript (and degrading to IFrames if JS is not available.) This however, might be a little more difficult to retrofit into an existing site.
DB and application level caching may, or may not work, depending on your situation. That really depends on where your bottlenecks are. Figuring out where your application spends the most time on page-rendering is probably priority 1, then you can start looking at where and how to cache.

Resources