Regular Expression for rewrite rule - mod-rewrite

I'm trying to integrate an open source forum in to my WordPress installation, I can figure out the next steps if I can just get a rewrite rule to work, I have the following so far:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^forum/qa\-theme/(.*) forum-embed/qa-theme/$1 [QSA,L]
RewriteRule ^forum/qa\-content/(.*) forum-embed/qa-content/$1 [QSA,L]
RewriteRule ^forum/([\w]+)$ forum/?url=$1 [QSA,L]
</IfModule>
The first two rules work, but the last one, I've tried all sorts of changes to this regular expression - I want to take whatever comes after forum/ and to put it in to a query string as the url parameter. I'm sure I'm tip-toeing around the expression - what am I missing?
Thanks in advance!
EDIT
It's also not clear how you are avoiding conflicts with the WordPress front-controller? Presumably you are placing these directives at the top of the .htaccess file, before the # BEGIN WordPress section? However, it may be simpler to create another .htaccess file inside the /forum subdirectory instead and this will (by default) override the WordPress directives.
A sound point, yes I was putting it above the # BEGIN WordPress, but I will make a .htaccess in the forum directory.
You say you've "tried all sorts of changes to this regular expression", but this regex certainly won't match your first example. The \w shorthand character class excludes slashes and hyphens.
True, this was a bad example to show where I was up to on my question, but I've also tried:
^forum/(.+)$
^forum/([a-z-A-Z-0-9-/]+)$
/forum/ is presumably a filesystem directory - this itself can't handle the request, it requires further rewriting to an actual file
I don't understand -- the first two rules work, and I can navigate to all pages, including forum/ -- index.php is the default file in the config, why must this rule be an exception?

RewriteRule ^forum/([\w]+)$ forum/?url=$1 [QSA,L]
Example 1: forum/2/test-question => forum/?url=2/test-question
You say you've "tried all sorts of changes to this regular expression", but this regex certainly won't match your first example. The \w shorthand character class excludes slashes and hyphens. If you want to match "whatever comes after forum/", then you could just use (.+) (like your previous examples, except + instead of * to avoid a rewrite loop, ie. to avoid matching /forum/). For example:
RewriteRule ^forum/(.+) forum/?url=$1 [QSA,L]
However, forum/?url=whatever is still not a valid end-point (as #RavinderSingh13 has pointed out in comments). /forum/ is presumably a filesystem directory - this itself can't handle the request, it requires further rewriting to an actual file (perhaps you are expecting mod_dir to issue a subrequest for the DirectoryIndex?). For example, should it be /forum/index.php?url=whatever?
It's also not clear how you are avoiding conflicts with the WordPress front-controller? Presumably you are placing these directives at the top of the .htaccess file, before the # BEGIN WordPress section? However, it may be simpler to create another .htaccess file inside the /forum subdirectory instead and this will (by default) override the WordPress directives.
You should remove the <IfModule> wrapper since it's not required here.
UPDATE:
/forum/ is presumably a filesystem directory - this itself can't handle the request, it requires further rewriting to an actual file
I don't understand -- the first two rules work, and I can navigate to all pages, including forum/ -- index.php is the default file in the config, why must this rule be an exception?
We don't know what requests the first two rules are expected to handle, but I assume they are just rewriting static files?
When you request the directory /forum/ then mod_dir must later issue a subrequest for the DirectoryIndex document. When you rewrite the request to /forum then mod_dir must still perform this additional processing later. In the meantime rewrite processing loops in .htaccess and /forum/ is passed back through the rewrite engine. This may or may not work - it can result in other conflicts - at the very least it is additional/unnecessary processing. You should rewrite directly to the file that handles the request to cut out this additional processing. In the same way the WordPress code block rewrites the request to /index.php, not /.
To clarify, when you request /forum/ only then the above directive is not triggered and mod_dir issues a subrequest for /forum/index.php. There is no url parameter.
Updated directives
However, if rewriting to /forum/index.php, you'll need additional checks to avoid /forum/index.php being caught by the same rule and resulting in a rewrite loop (500 error).
For example, try the following instead:
RewriteRule ^forum/index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^forum/(.+) forum/index.php?url=$1 [QSA,L]
The condition that checks against REQUEST_FILENAME may be optional, depending on whether there are any static resources served from this directory tree?
Alternatively, if your URLs do not contain dots then you may get away with a more restrictive regex instead to avoid matching URLs containing dots. For example:
RewriteRule ^forum/([^.]+)$ forum/index.php?url=$1 [QSA,L]
/forum/.htaccess
If moving these directives to the /forum/.htaccess file you would rewrite them as follows (and remove the RewriteBase directive entirely):
RewriteEngine On
RewriteRule ^qa-theme/(.*) /forum-embed/qa-theme/$1 [L]
RewriteRule ^qa-content/(.*) /forum-embed/qa-content/$1 [L]
RewriteRule ^([^.]+)$ index.php?url=$1 [QSA,L]
The QSA flag is not required on the first two directives since the query string is passed through by default. (Although if these are rewriting static resources then you wouldn't expect a query string to be passed anyway?)
No need to backslash-escape the hyphen in the regex, since it carries no special meaning when used outside of a character class. Likewise, the dot carries no special meaning when used inside a character class so does not need to be backslash-escaped in the last rule above.

Related

Is mod_rewrite rewrite rule match depending on position in substitution?

I try to echo current state of a URL being rewrited in .htaccess if query string contains DEBUG phrase:
RewriteCond %{QUERY_STRING} DEBUG
RewriteRule .+ echo.php?ip=%{REMOTE_ADDR}&url=$0&query=%{QUERY_STRING} [L]
and my echo.php script echoes expected URL.
But strangely when I change order of parameters in substitution to:
RewriteRule .+ echo.php?url=$0&ip=%{REMOTE_ADDR}&query=%{QUERY_STRING} [L]
echoed url is "echo.php" itself.
Is this expected behavior, and if so why?
When I test, I don't get the different behaviour you are stating. When I use the rule either way round, I get echo.php as the URL, when visiting example?DEBUG. And this is because rules in .htaccess will keep looping (because the whole directory processing starts again after the rewrite, including the rewrite rules) until the URL does not change. So since you are matching .+ you will always end up with echo.php unless some prior rule gets in the way and ends processing of that iteration before reaching your rule.
You can prove this to yourself by adding the [END] flag to the rule which stops any further mod_rewrite processing, and then you will get the behaviour you are expecting. The difference you are seeing must be due to your prior rules in some way.
As mentioned in the comments, a better way to debug mod_rewrite is to make use of the LogLevel rewrite:traceX option, with X being a number between 1 and 8. 3 is a good place to start, and increase from there if not enough info. There is a lot of info as you increase towards 8. You can only enable it in the main config. See the documentation. Make sure to switch it off again as all that logging affects performance. It does not interfere with any earlier LogLevel directives.

Rewriting an URL to become a query string

I'm trying to rewrite URLs such as
/product/16/var1/value1/var2/value2...
to this
index.php?page=product&id=16&var1=value1&var2=value2...
In other words, I would like to have a "main parameter" translated to an id (and I can do this), but I would also like to have, from that point on, couples of "directories" translated recursively to key-value pairs.
Is this possible with Apache mod_rewrite?
In the absence of the [L] flag, any mod_rewrite rule will apply repeatedly to any URI which corresponds to the rule's rewrite conditions and pattern.
Knowing this, we can build a mod_rewrite rule which looks for any URIs with query strings beginning in a certain way and then repeatedly harvests the folder-names of that URI (two at a time) to build the rest of the query string.
See example below:
In the root folder of
http://example.com/
save an .htaccess file with the following mod_rewrite directives:
RewriteEngine On
RewriteRule ^(product)/([0-9]{2})/(.*) http://%{HTTP_HOST}/$3/index.php?page=$1&id=$2
RewriteCond %{QUERY_STRING} ^(page=product&id=[0-9]{2}.*)
RewriteRule ^([^/]+)/([^/]+)/(.*/)?index.php$ http://%{HTTP_HOST}/$3index.php?%1&$1=$2
Using the above:
http://example.com/product/16/var1/value1/var2/value2/
becomes
http://example.com/index.php?page=product&id=16&var1=value1&var2=value2
and
http://example.com/product/16/var1/value1/var2/value2/var3/value3/var4/value4/
becomes
http://example.com/index.php?page=product&id=16&var1=value1&var2=value2&var3=value3&var4=value4

mod-rewrite rules executing out of order

Okay, so I have two separate mod-rewrite rules in my vhost block. The first rule redirects a customer offsite if they come in thru an affiliate URL such as example.com/1234.html and the second rule forces the URL to always contain www dot like www.example.com.
# Affiliate Links
RewriteRule ^([0-9]+)\.html$ http://affiliates.example.com/log.php?id=$1 [R=302,L]
# Ensure we are always on www dot
RewriteCond %{HTTP_HOST} ^example\.loc [NC]
RewriteRule (.*) http://www.example.com$1 [R=301,L]
The rules themselves work great. The problem is that if the first rule applies I want it to immediately redirect, however it seems as if the second rule is hoisted to the top because it always takes precedence. What do I need to change so that these execute in order?
You've stated that this is in a vhost block. In that context (as opposed to, for example, an .htaccess file) URLs always start with '/'
Thus
RewriteRule ^([0-9]+)\.html$ http://affiliates.example.com/log.php?id=$1 [R=302,L]
should instead be
RewriteRule ^/([0-9]+)\.html$ http://affiliates.example.com/log.php?id=$1 [R=302,L]
(ie, with the leading slash), otherwise it will never match anything.

mod_rewrite is being ignored

I'm trying to transform "domain.com/index.php?site=food&category=beef" into "domain.com/food/beef" but it does not work, no matter what I try. It always leaves the original domain and I get no errors.
I think it's my fault, I tried this for 3 different URLs on 3 different servers (and 3 different projects)... it just seems like I don't get how mod_rewrite really works, though I read every documentation on this topic I found. I even spent days here on SO without finding any solution.
Mod_rewrite is enabled on the server:
RewriteEngine On
RewriteRule ^ http://www.google.com [R,L]
gives me "http://www.google.com/?site=food&category=beef". It looks like mod_rewrite does not recognise the query string... So I tried several solutions with RewriteCond %{QUERY_STRING}... but nothing works :/
Hopefully you guys can help me! I'm going insane on this!
Thanks in advance!
Try:
RewriteEngine on
RewriteRule ^food/beef$ index.php?site=food&category=beef [L]
Or more generally:
RewriteEngine on
RewriteRule ^([a-zA-Z0-9_-]+)/([a-zA-Z0-9_-]+)$ index.php?site=$1&category=$2 [L]
Are you trying to do something like this?
RewriteRule ^([^/]+)/([^/]+)/? /index.php?site=$1&category=$2 [L]
This will make it so when you go to http://domain.com/food/beef the request gets rewritten to "/index.php?site=food&category=beef" internally and index.php is used to serve the original request. The browser's location bar will still say "http://domain.com/food/beef".
If you want the location bar to say http://domain.com/index.php?site=food&category=beef then add an "R," to the "[L]". If this is backwards and you want it so when someone enters http://domain.com/index.php?site=food&category=beef in the location bar, and the request gets rewritten to "/food/beef" internally on the server, then you need to parse out the query string using RewriteCond:
RewriteCond %{QUERY_STRING} ^site=([^&]+)&category=([^&]+)
RewriteRule ^index.php /%1/%2? [L]
The same thing applies with the "R" causing a browser redirect like the first example. If you want the location bar to change to http://domain.com/food/beef then the brackets should look like: [L,R]. Note that you need a ? at the end of the target in the rule, so that query strings don't get thrown in. That is why in your google example, the query string is being appended.
EDIT:
Seeing as you just wanted to change what's in the browser's location bar and not where the content is:
You need to re-rewrite what the 2nd rule above has rewritten BACK to index.php, but without a redirect. In order to keep the 2 rules from looping indefinitely because one rule rewrites to the other rule and vice versa, you need to add a flag somewhere to keep the 2nd rule above from redirecting you over and over again.
So combining the two, you'll have this:
RewriteRule ^([^/]+)/([^/]+)/? /index.php?site=$1&category=$2&redirected [L]
RewriteCond %{QUERY_STRING} !redirected
RewriteCond %{QUERY_STRING} ^site=([^&]+)&category=([^&]+)
RewriteRule ^index.php /%1/%2? [L,R=301]
Note the redirected parameter in the query string. This gets inserted when someone tries to access the clean version of the url, e.g. "/food/beef". internally, it gets rerouted to index.php but since the rule doesn't have a "R", the browser's location bar doesn't change.
The second rule now checks if the request contains the redirected param in the query string. If it doesn't, that means someone entered in their browser's location bar the index.php url, so redirect the browser to the clean version.

Isapi Rewrite: from www to non-www and at the same time from http to https

i have tried it all!
this:
RewriteCond %{HTTP_HOST} ^www\.(.*)
RewriteRule ^.*$ https://%1/$1 [R=301,L]
works only if i don't put the http at the beggining
how do make that to work:
if there is http redirect to https
if there is www redirect to non-www
and ofcourse both on the same time
http://www.domain.com -> https://domain.com
www.domain.com --> https://domain.com
http://domain.com --> https://domain.com
with every subfolders after and query!
I assume you also want
https://www.domain.com -> https://domain.com
Did you ever get this working? I'm having trouble getting a test https site going to double-check this.
In the meantime, I do see a couple things, so try this instead (this assumes isapi_rewrite v3, which it looks like you're using):
RewriteCond %{HTTP_HOST} ^www\.(.*)
RewriteRule ^(.*)$ https://%1$1 [NC,R=301]
This adds parentheses to the RewriteRule to capture the url for the $1.
Then the slash between the %1$1 isn't needed, since there's one at the start of the $1 capture.
I like to use NC for not case-sensitive, and the R rule is a final rule, so you don't need the L for last.
EDIT
I revisited this answer, to update/clarify a couple of secondary issues in my original answer above. These don't change the main solution above, which was to add the parentheses to do the capture.
Original:
Then the slash between the %1$1 isn't needed, since there's one at the start of the $1 capture.
This actually depends where the rules are, and whether there's a RewriteBase statement. Common shared-host configurations don't have the leading slash here in the rules, so the slash would be needed. If you're not sure, you can try with and without.
Original:
I like to use NC for not case-sensitive, and the R rule is a final rule, so you don't need the L for last.
It turns out it is useful to have L with R for performance, as [NC,R=301,L]. I had actually asked this at Helicon Tech a year before this question, but had forgetten it:
http://www.helicontech.com/forum/14826-Does_Redirect_need_Last.html
From their response:
... the reason for us to use [L] in 301 redirect rules is that redirect occurs not that
immediately. Even though the rule is matched futher rules will be proccessed (run through), unless you have [L]....

Resources