Rewriting an URL to become a query string - mod-rewrite

I'm trying to rewrite URLs such as
/product/16/var1/value1/var2/value2...
to this
index.php?page=product&id=16&var1=value1&var2=value2...
In other words, I would like to have a "main parameter" translated to an id (and I can do this), but I would also like to have, from that point on, couples of "directories" translated recursively to key-value pairs.
Is this possible with Apache mod_rewrite?

In the absence of the [L] flag, any mod_rewrite rule will apply repeatedly to any URI which corresponds to the rule's rewrite conditions and pattern.
Knowing this, we can build a mod_rewrite rule which looks for any URIs with query strings beginning in a certain way and then repeatedly harvests the folder-names of that URI (two at a time) to build the rest of the query string.
See example below:
In the root folder of
http://example.com/
save an .htaccess file with the following mod_rewrite directives:
RewriteEngine On
RewriteRule ^(product)/([0-9]{2})/(.*) http://%{HTTP_HOST}/$3/index.php?page=$1&id=$2
RewriteCond %{QUERY_STRING} ^(page=product&id=[0-9]{2}.*)
RewriteRule ^([^/]+)/([^/]+)/(.*/)?index.php$ http://%{HTTP_HOST}/$3index.php?%1&$1=$2
Using the above:
http://example.com/product/16/var1/value1/var2/value2/
becomes
http://example.com/index.php?page=product&id=16&var1=value1&var2=value2
and
http://example.com/product/16/var1/value1/var2/value2/var3/value3/var4/value4/
becomes
http://example.com/index.php?page=product&id=16&var1=value1&var2=value2&var3=value3&var4=value4

Related

Regular Expression for rewrite rule

I'm trying to integrate an open source forum in to my WordPress installation, I can figure out the next steps if I can just get a rewrite rule to work, I have the following so far:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^forum/qa\-theme/(.*) forum-embed/qa-theme/$1 [QSA,L]
RewriteRule ^forum/qa\-content/(.*) forum-embed/qa-content/$1 [QSA,L]
RewriteRule ^forum/([\w]+)$ forum/?url=$1 [QSA,L]
</IfModule>
The first two rules work, but the last one, I've tried all sorts of changes to this regular expression - I want to take whatever comes after forum/ and to put it in to a query string as the url parameter. I'm sure I'm tip-toeing around the expression - what am I missing?
Thanks in advance!
EDIT
It's also not clear how you are avoiding conflicts with the WordPress front-controller? Presumably you are placing these directives at the top of the .htaccess file, before the # BEGIN WordPress section? However, it may be simpler to create another .htaccess file inside the /forum subdirectory instead and this will (by default) override the WordPress directives.
A sound point, yes I was putting it above the # BEGIN WordPress, but I will make a .htaccess in the forum directory.
You say you've "tried all sorts of changes to this regular expression", but this regex certainly won't match your first example. The \w shorthand character class excludes slashes and hyphens.
True, this was a bad example to show where I was up to on my question, but I've also tried:
^forum/(.+)$
^forum/([a-z-A-Z-0-9-/]+)$
/forum/ is presumably a filesystem directory - this itself can't handle the request, it requires further rewriting to an actual file
I don't understand -- the first two rules work, and I can navigate to all pages, including forum/ -- index.php is the default file in the config, why must this rule be an exception?
RewriteRule ^forum/([\w]+)$ forum/?url=$1 [QSA,L]
Example 1: forum/2/test-question => forum/?url=2/test-question
You say you've "tried all sorts of changes to this regular expression", but this regex certainly won't match your first example. The \w shorthand character class excludes slashes and hyphens. If you want to match "whatever comes after forum/", then you could just use (.+) (like your previous examples, except + instead of * to avoid a rewrite loop, ie. to avoid matching /forum/). For example:
RewriteRule ^forum/(.+) forum/?url=$1 [QSA,L]
However, forum/?url=whatever is still not a valid end-point (as #RavinderSingh13 has pointed out in comments). /forum/ is presumably a filesystem directory - this itself can't handle the request, it requires further rewriting to an actual file (perhaps you are expecting mod_dir to issue a subrequest for the DirectoryIndex?). For example, should it be /forum/index.php?url=whatever?
It's also not clear how you are avoiding conflicts with the WordPress front-controller? Presumably you are placing these directives at the top of the .htaccess file, before the # BEGIN WordPress section? However, it may be simpler to create another .htaccess file inside the /forum subdirectory instead and this will (by default) override the WordPress directives.
You should remove the <IfModule> wrapper since it's not required here.
UPDATE:
/forum/ is presumably a filesystem directory - this itself can't handle the request, it requires further rewriting to an actual file
I don't understand -- the first two rules work, and I can navigate to all pages, including forum/ -- index.php is the default file in the config, why must this rule be an exception?
We don't know what requests the first two rules are expected to handle, but I assume they are just rewriting static files?
When you request the directory /forum/ then mod_dir must later issue a subrequest for the DirectoryIndex document. When you rewrite the request to /forum then mod_dir must still perform this additional processing later. In the meantime rewrite processing loops in .htaccess and /forum/ is passed back through the rewrite engine. This may or may not work - it can result in other conflicts - at the very least it is additional/unnecessary processing. You should rewrite directly to the file that handles the request to cut out this additional processing. In the same way the WordPress code block rewrites the request to /index.php, not /.
To clarify, when you request /forum/ only then the above directive is not triggered and mod_dir issues a subrequest for /forum/index.php. There is no url parameter.
Updated directives
However, if rewriting to /forum/index.php, you'll need additional checks to avoid /forum/index.php being caught by the same rule and resulting in a rewrite loop (500 error).
For example, try the following instead:
RewriteRule ^forum/index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^forum/(.+) forum/index.php?url=$1 [QSA,L]
The condition that checks against REQUEST_FILENAME may be optional, depending on whether there are any static resources served from this directory tree?
Alternatively, if your URLs do not contain dots then you may get away with a more restrictive regex instead to avoid matching URLs containing dots. For example:
RewriteRule ^forum/([^.]+)$ forum/index.php?url=$1 [QSA,L]
/forum/.htaccess
If moving these directives to the /forum/.htaccess file you would rewrite them as follows (and remove the RewriteBase directive entirely):
RewriteEngine On
RewriteRule ^qa-theme/(.*) /forum-embed/qa-theme/$1 [L]
RewriteRule ^qa-content/(.*) /forum-embed/qa-content/$1 [L]
RewriteRule ^([^.]+)$ index.php?url=$1 [QSA,L]
The QSA flag is not required on the first two directives since the query string is passed through by default. (Although if these are rewriting static resources then you wouldn't expect a query string to be passed anyway?)
No need to backslash-escape the hyphen in the regex, since it carries no special meaning when used outside of a character class. Likewise, the dot carries no special meaning when used inside a character class so does not need to be backslash-escaped in the last rule above.

mod-rewrite rules executing out of order

Okay, so I have two separate mod-rewrite rules in my vhost block. The first rule redirects a customer offsite if they come in thru an affiliate URL such as example.com/1234.html and the second rule forces the URL to always contain www dot like www.example.com.
# Affiliate Links
RewriteRule ^([0-9]+)\.html$ http://affiliates.example.com/log.php?id=$1 [R=302,L]
# Ensure we are always on www dot
RewriteCond %{HTTP_HOST} ^example\.loc [NC]
RewriteRule (.*) http://www.example.com$1 [R=301,L]
The rules themselves work great. The problem is that if the first rule applies I want it to immediately redirect, however it seems as if the second rule is hoisted to the top because it always takes precedence. What do I need to change so that these execute in order?
You've stated that this is in a vhost block. In that context (as opposed to, for example, an .htaccess file) URLs always start with '/'
Thus
RewriteRule ^([0-9]+)\.html$ http://affiliates.example.com/log.php?id=$1 [R=302,L]
should instead be
RewriteRule ^/([0-9]+)\.html$ http://affiliates.example.com/log.php?id=$1 [R=302,L]
(ie, with the leading slash), otherwise it will never match anything.

How to unescape QUERY_STRING in mod_rewrite?

Hi all,
Now I want to use mod_rewrite module in apache2 to redirect url.
The rewrite rule looks like:
RewriteCond %{QUERY_STRING} ^url=(.+)$
RewriteRule ^/redir$ %1 [R=301,L]
However, when http://website.com/redir?url=http%3A%2F%2Fwww.google.com is input, the mod_rewrite module cannot unecsape the url parameter http%3A%2F%2Fwww.google.com, is there any method to resolve this problem?
RewriteMap unescape int:unescape
RewriteCond %{QUERY_STRING} ^url=(.+)$
RewriteRule ^/redir$ ${unescape:%1} [R=301,L]
Apache lets you define custom rewrite mappings from different types of external sources. For example, if you wanted to rewrite /users/<some alias> to /users/<full name>, you could have a text file that specified alias/name pairs, and a rewrite rule that translated the "alias" part of the URL using that mapping.
Mappings can come from multiple types of sources. The alias/name example is the standard plain text (txt) type.
RewriteMap also lets you map to a handful of special internal sources (int). They just pass the value to an internal Apache function and return the result. They are:
toupper: Converts the key to all upper case.
tolower: Converts the key to all lower case.
escape: Translates special characters in the key to hex-encodings.
unescape: Translates hex-encodings in the key back to special characters.
unescape is what you're looking for.
More information can be found in the mod_rewrite documentation.
Yep, there is one method: give it to a Php file then make a redirection in Php with appropriate "header".
Something like:
RewriteCond %{QUERY_STRING} ^url=(.+)$
RewriteRule ^/redir$ /myredir.php?redir=%1 [R=301,L]
And in Php, in the file myredir.php something like:
<?php
if (isset($_GET['redir'])) {
header("Location: ".urldecode($_GET['redir']));
}
exit;
?>

mod_rewrite is being ignored

I'm trying to transform "domain.com/index.php?site=food&category=beef" into "domain.com/food/beef" but it does not work, no matter what I try. It always leaves the original domain and I get no errors.
I think it's my fault, I tried this for 3 different URLs on 3 different servers (and 3 different projects)... it just seems like I don't get how mod_rewrite really works, though I read every documentation on this topic I found. I even spent days here on SO without finding any solution.
Mod_rewrite is enabled on the server:
RewriteEngine On
RewriteRule ^ http://www.google.com [R,L]
gives me "http://www.google.com/?site=food&category=beef". It looks like mod_rewrite does not recognise the query string... So I tried several solutions with RewriteCond %{QUERY_STRING}... but nothing works :/
Hopefully you guys can help me! I'm going insane on this!
Thanks in advance!
Try:
RewriteEngine on
RewriteRule ^food/beef$ index.php?site=food&category=beef [L]
Or more generally:
RewriteEngine on
RewriteRule ^([a-zA-Z0-9_-]+)/([a-zA-Z0-9_-]+)$ index.php?site=$1&category=$2 [L]
Are you trying to do something like this?
RewriteRule ^([^/]+)/([^/]+)/? /index.php?site=$1&category=$2 [L]
This will make it so when you go to http://domain.com/food/beef the request gets rewritten to "/index.php?site=food&category=beef" internally and index.php is used to serve the original request. The browser's location bar will still say "http://domain.com/food/beef".
If you want the location bar to say http://domain.com/index.php?site=food&category=beef then add an "R," to the "[L]". If this is backwards and you want it so when someone enters http://domain.com/index.php?site=food&category=beef in the location bar, and the request gets rewritten to "/food/beef" internally on the server, then you need to parse out the query string using RewriteCond:
RewriteCond %{QUERY_STRING} ^site=([^&]+)&category=([^&]+)
RewriteRule ^index.php /%1/%2? [L]
The same thing applies with the "R" causing a browser redirect like the first example. If you want the location bar to change to http://domain.com/food/beef then the brackets should look like: [L,R]. Note that you need a ? at the end of the target in the rule, so that query strings don't get thrown in. That is why in your google example, the query string is being appended.
EDIT:
Seeing as you just wanted to change what's in the browser's location bar and not where the content is:
You need to re-rewrite what the 2nd rule above has rewritten BACK to index.php, but without a redirect. In order to keep the 2 rules from looping indefinitely because one rule rewrites to the other rule and vice versa, you need to add a flag somewhere to keep the 2nd rule above from redirecting you over and over again.
So combining the two, you'll have this:
RewriteRule ^([^/]+)/([^/]+)/? /index.php?site=$1&category=$2&redirected [L]
RewriteCond %{QUERY_STRING} !redirected
RewriteCond %{QUERY_STRING} ^site=([^&]+)&category=([^&]+)
RewriteRule ^index.php /%1/%2? [L,R=301]
Note the redirected parameter in the query string. This gets inserted when someone tries to access the clean version of the url, e.g. "/food/beef". internally, it gets rerouted to index.php but since the rule doesn't have a "R", the browser's location bar doesn't change.
The second rule now checks if the request contains the redirected param in the query string. If it doesn't, that means someone entered in their browser's location bar the index.php url, so redirect the browser to the clean version.

Mod_rewrite help

I'm trying to remove query strings from my calendar, but my mod_rewrite is not appending the query string.
The website is http://cacrochester.com/Calendar
and if you click the link to go to a different month, the query string is usually http://cacrochester.com/Calendar?currentmonth=2010-11
With my rule below, it just doesn't append the query string so when you click the next month link, it just stays on the month October. What's wrong with my rule?
Here is my rule
RewriteCond %{QUERY_STRING} !^$
RewriteRule ^.*$ http://cacrochester.com/Calendar? [NC,R=301,L]
EDIT:
What i want is to take a url like http://cacrochester.com/Calendar?currentmonth=2010-11 and turn it into something like http://cacrochester.com/Calendar/2010-11
You probably need your app to output relative urls like "/Calendar/2010-11". That's a simple code change.
Then in Apache you'd want to rewrite those urls, using:
RewriteRule ^/Calendar/([0-9]+-[0-9]{2})$ /Calendar.php?currentmonth=$1 [NC,QSA,L]
(You don't want a RewriteCond for this rule.)
Forcing a redirect with R=301 will only expose the internal url scheme. I don't think that's what you want.
To maintain query strings when rewriting, use the QSA (query string append) flag.
[NC,R=301,QSA,L]

Resources