mod_rewrite vs php parsing - mod-rewrite

im working on a little project of mine and need your help in deciding whether mod_rewrite is performance friendly or parsing each url in php.
urls will almost always have a fixed pattern. very few urls will have different pattern.
for instance, most urls would be like so :
dot.com/resource
some others would be
dot.com/other/resource
i expect around 1000 visitors a day to the site. will server load be an issue?
intuitively, i think mod rewrite would work better. but just for that peace of mind, i'd like input from you guys. if anyone has carried out any tests or can point me towards the same, id be obliged.
thanks.

You may want to check out the following Stack Overflow post:
Any negative impacts when using Mod-Rewrite?
Quoting the accepted answer:
I've used mod_rewrite on sites that get millions/hits/month without any significant performance issues. You do have to know which rewrites get applied first depending on your rules.
Using mod_rewrite is most likely faster than parsing the URL with your current language.
If you are really worried about performance, don't use htaccess files, those are slow. Put all your rewrite rules in your Apache config, which is only read once on startup. htaccess files get re-parsed on every request, along with every htaccess file in parent folders.
To add my own, mod_rewrite is definitely capable of handling 1,000 visitors per day.

Related

Efficiency for a High Quantity of htaccess Redirects

When setting up a list of links using a htaccess redirect for possible future affiliate link "beautification" - would it be better to create a separate htaccess file within the sub folders that contains the specific links that get redirected?
The thinking here is: say there are 1000 links to be redirected - if these are placed in the the root folder htaccess file then - the server would have to scan through (1000 redirects) for every link clicked.
If 10 sub-folders each contain 100 of the 1000 redirects in each of their own htaccess file, would this be more efficient since there are less redirects to filter through by the server?
Of these 2 ideas - How does the Apache server handle this? Which would be more efficient way to do this?
I realize coding can be done for this also, but - as there may be dozens of different affiliate links from different companies with different coding methods.
Can someone please help me better understand? And no there are not 1000 redirects, just chose a nice round large number for the purpose of this question.
Thank You

How does scary transcluding caching work on MediaWiki?

We are running a small wiki farm (same topic; six languages and growing) and have recently updated most templates to use several layers of meta-templates in order to facilitate maintenance and readability.
We wish to standardise those templates for all languages, therefore most of them are going to contain the exact same code on each wiki. This is why, in order to further simplify maintenance, we are considering the use of scary transcluding (more specifically, substitution) so that those meta-templates are only stored on one wiki and only have to be updated on that wiki, not on every single version.
(Note: if you can think of a better idea, don't hesitate to comment on this post!)
However, scary transcluding is called so for being scarily inefficient, therefore I need to know more about the way content included that way is cached by MediaWiki.
If I understand correctly, the HTML output of a page is stored in the parser cache for a duration of $wgParserCacheExpireTime. The default is 1 day, but it's safe to increase it on a small to medium wiki because the content will get updated anyway if the page itself or an included page is updated (and in some other minor cases).
There's also a cache duration for scary transcluding: $wgTranscludeCacheExpiry. Good, because you wouldn't want to make that HTTP call every time. However, the default value of 1 hour is not suitable for smaller wikis, on which an article may only be viewed every now and then, therefore rendering that cache absolutely useless.
If a page A uses a template B that includes template C from another wiki, does page A have to be entirely regenerated after $wgTranscludeCacheExpiry has been exceeded? Or can it still make use of the parser cache of template B until $wgParserCacheExpireTime has been exceeded?
You could then increase $wgTranscludeCacheExpiry to a month, just like the parser cache, but a page wouldn't get updated automatically if the transcluded template was, would it?
If yes, would refreshing the pages using that transcluded template be the only solution to update the other wikis?
IMHO the solution to find out is simple: try it! $wgScaryTranscluding is rarely used, but the few who tried enabling it reported having very few problems. There are also JavaScript-based alternatives, see the manual.
Purging is rarely a big issue: a crosswiki template is unlikely to contain stuff you absolutely want to get out right now. If the cache doesn't feel aggressive enough for you, set it to a week or month and see if something goes wrong. Ilmari Karonen suggests such a long cache even for HTML after all.

caching snippets (modX)

I was simple cruising through the modx options and i noticed the option to cache snippets. I was wondering what kind of effect this would have (downsides) to my site. I know that caching would improve the loading time of the site by keeping them 'cached' after the first time and then only reloading the updates but this all seems to good to be true. My question is simple: are there any downsides to caching snippets? Cheers, Marco.
Great question!
The first rule of Modx is (almost) always cache. They've said so in their own blog.
As you said, the loading time will be lower. Let's just get the basics on the floor first. When you chose to cache a page, the page with all the output is stored as a file in your cache-folder. If you have a small and simple site, you might not see the biggest difference in caching and not, but if you have a complex one with lots of chunks-in-chunks, snippets parsing chunks etc, the difference is enormous. Some of the websites I've made goes down 15-30 levels to parse the content in come sections. Loading all this fresh from the database can take up to a coupe of seconds, while loading a flat-file would take only a few microseconds. There is a HUGE difference (remember that).
Now. You can cache both snippets and chunks. Important to remember. You can also cache one chunk while uncache the next level. Using Modx's brilliant markup, you can chose what to cache and what to uncache, but in general you want as much as possible cached.
You ask about the downside. There are none, but there are a few cases where you can't use cached snippets/chunks. As mentioned earlier, the cached response is divided into each page. That means that if you have a page (or url or whatever you want to call it), where you display different content based on for example GET-parameters. You can't cache a search-result (because the content changes) or a page with pagination (?page=1, ?page=2 etc would produce different output on the same page). Another case is when a snippet's output is random/different every time. Say you put a random quotes in your header, this needs to be uncached, or you will just see the first random result every time. In all other cases, use caching.
Also remember that every time you save a change in the manager, the cache will be wiped. That means that if you for example display the latest news-articles on your frontpage, this can still be cached because it will not display different content until you add/edit a resource, and then the cache will be cleared.
To sum it all up. Caching is GREAT and you should use it as much as possible. I usually make all my snippets/chunks cached, and if I crash into problems, that is the first thing I check.
Using caching makes your webserver respond quicker (good for the user) and produces fewer queries to the database (good for you). All in all. Caching is a gift. Use it.
There's no downsides to caching and honestly I wonder what made you think there were downsides to it?
You should always cache everything you can - there's no point in having something be executed on every page load when it's exactly the same as before. By caching the output and the source, you bypass the need for processing time and improve performance.
Assuming MODX Revolution (2.x), all template tags you use can be called both cached and uncached.
Cached:
[[*pagetitle]]
[[snippet]]
[[$chunk]]
[[+placeholder]]
[[%lexicon]]
Uncached:
[[!*pagetitle]] - this is pointless
[[!snippet]]
[[!$chunk]]
[[!+placeholder]]
[[!%lexicon]]
In MODX Evolution (1.x) the tags are different and you don't have as much control.
Some time ago I wrote about caching in MODX Revolution on my blog and I strongly encourage you to check it out as it provides more insight into why and how to use caching effectively: https://www.markhamstra.com/modx/2011/10/caching-guidelines-for-modx-revolution/
(PS: If you have MODX specific questions, I'd suggest posting them on forums.modx.com - there's a larger MODX audience there that can help)

Apache Redirects/Rewrite Maximum

I have a migration project from a legacy system to a new system. The move to the new system will create new unique id's for the objects being migrated; however, my users and search indexes will have the URLs with the old ids. I would like to set up an apache redirect or rewrite to handle this but am concerned about performance with that large number of objects (I expect to have approximatelty 500K old id to new id mappings).
Has anyone implemented this on this scale? Or knows if apache can stand to this big a redirect mapping?
If you have a fixed set of mappings, you should give a mod_rewrite rewrite map of the type
Hash File a try.
I had the very same question recently. As I found no practical answer, we implemented an htaccess 6 rules of which 3 had 200,000 conditions.
That means an htaccess file with the size of 150 MB. It was actually fine for half a day, when noone was using this particular website, even though page load times were in the seconds. However next day, our whole server got hammered, with loads well above 400. (machine is 8 cores, 16 GB RAM, SAS RAID5, so no problem with resources usually)
I suggest if you need to implement anything like this. Design your rules, so they don't need conditions, and put them in a dbm rewrite map. this easily solved the performance issues for us.
http://httpd.apache.org/docs/current/rewrite/rewritemap.html#dbm
Can you phrase the rewrites using a smaller number of rules? Is there a pattern which links the old URLs to the new ones?
If not, I'd be concerned about Apache with 500K+ rewrite mappings, that's just way past its comfort zone. Still, it might surprise you.
It sounds to me like you need to write a database-backed application just to handle the redirects, with the mapping itself stored in the database. That would scale much better.
I see this is an old topic but did you every find a a solution?
I have a case where the developers are using htaccess to redirect more than 30,000 URLs using RedirectMatch in a .htaccess file.
I am concerned about performance and management errors given the size of this file.
What I recommended is that since all of the old urls have:
/sub/####
That they move this to the database and create
/sub/index.php
Redirect all requests for:
www.domain.com/sub/###
to
www.domain.com/sub/index.php
Then have index.php send the redirect since the new URLs and old ids can be looked up in the database.
This way only HTTP requests for the old URLs are hitting re-write processes instead of every single HTTP request.

Any negative impacts when using Mod-Rewrite?

I know there are a lot of positive things mod-rewrite accomplishes. But are there any negative? Obviously if you have poorly written rules your going to have problems. But what if you have a high volume site and your constantly using mod-rewrite, is it going to have a significant impact on performance? I did a quick search for some benchmarks on Google and didn't find much.
I've used mod_rewrite on sites that get millions/hits/month without any significant performance issues. You do have to know which rewrites get applied first depending on your rules.
Using mod_rewrite is most likely faster than parsing the URL with your current language.
If you are really worried about performance, don't use .htaccess files, those are slow. Put all your rewrite rules in your Apache config, which is only read once on startup. .htaccess files get re-parsed on every request, along with every .htaccess file in parent folders.
To echo what Ryan says above, rules in a .htaccess can really hurt your load times on a busy site in comparison to having the rules in your config file. We initially tried this (~60million pages/month) but didn't last very long until our servers started smoking :)
The obvious downside to having the rules in your config is you have to reload the config whenever you modify your rules.
The last flag ("L") is useful for speeding up execution of your rules, once your more frequently-accessed rules are towards the top and assessed first. It can make maintenance much trickier if you've a long set of rules though - I wasted a couple of very frustrating hours one morning as I was editing mid-way down my list of rules and had one up the top that was trapping more than intended!
We had difficulty finding relevant benchmarks also, and ended up working out our own internal suite of tests. Once we got our rules sorted out, properly ordered and into our Apache conf, we didn't find much of a negative performance impact.
If you're worried about apache's performance, one thing to consider if you have a lot of rewrite rules is to use the "skip" flag. It is a way to skip matching on rules. So, whatever overhead would have been spent on matching is saved.
Be careful though, I was on a project which utilized the "skip" flag a lot, and it made maintenance painful, since it depends on the order in which things are written in the file.

Resources