Rewriting URLs with mod_rewrite for languages - mod-rewrite

I need some URL rewriting for my website using mod_rewrite but I can't figure out the regular expressions.
Here is what the current URLs may look like:
http://mydomain.com/zenphoto/pages/xyz?locale=en_US
http://mydomain.com/zenphoto/pages/xyz?locale=de_DE
http://mydomain.com/zenphoto/gallery_1?locale=de_DE
http://mydomain.com/zenphoto/gallery_n?locale=de_DE
xyz may contain different strings, e.g. legal, about, etc.
And that's how I'd like the URLs to be used:
http://mydomain.com/zenphoto/de/pages/xyz
http://mydomain.com/zenphoto/en/pages/xyz
http://mydomain.com/zenphoto/de/gallery_1
http://mydomain.com/zenphoto/en/gallery_n
I should mention that only de and en shall be possible. Any other strings shall be rerouted to de.
Could somebody help me please? :-)
Thanks,
Robert

RewriteEngine on
RewriteRule ^zenphoto/pages/([a-z]+)\?locale=(en|de)_[A-Z]{2}$ /zenphoto/$2/pages/$1
RewriteRule ^zenphoto/gallery_([0-9])\?locale=(en|de)_[A-Z]{2}$ /zenphoto/$2/gallery_$1
For the first example, I say: "If the URL starts (^) with "zenphoto/pages/" then have a sequence of lowercase letters (+ means "one or more", and [a-z] means "a letter in [a, b, ..., y, z]"), which is my first group (there is parentheses -> it's a group). Then it's followed by "?locale=", then by "en" or (| means "or") "de", and this is my second group, then it's followed by an underscore ("_") and two uppercase letters, and there is nothing after ($ means it's the end of the URL)".
I write a space, and the new URL I want, and I use $n to use the n-th group.
The second URL is the 'pretty one', and the first is the real.
You have to use backslashes before special chars like ?,+,{,},(,),[,],*,.,| if you want to use one in your URL.
Edit:
If you want to avoid infinite loops, you should add the flag [L] (L = Last) at the end of each line.

Related

Rewrite Word with capitals

I'm trying to write RewriteRule rules to select a capital letter word and rewrite to a query. The capital letter word could be in different positions. There are other single capital letters that are to be ignored
An example would be finding the word KELPIE - note it is the only word in full capitals
http://www.atestdomain.com.au/DogsBigBlackKELPIE.htm
needs to become
http://www.atestdomain.com.au/animals/search.php?keyword=&category=2&dogtype=KELPIE&location_id=2&submit=Search
Something like this is what you're after.
RewriteRule ([A-Z]{2,})\.htm$ animals/search.php?keyword=&category=2&dogtype=$1&location_‌id=2&submit=Search [NS,NE,B,DPI,L]
But, like I said, that still won't be able to differentiate between an uppercase keyword, and one preceded by a single letter word (which looks like it would be uppercased in your scheme).

How to understand gsub(/^.*\//, '') or the regex

Breaking up the below code to understand my regex and gsub understanding:
str = "abc/def/ghi.rb"
str = str.gsub(/^.*\//, '')
#str = ghi.rb
^ : beginning of the string
\/ : escape character for /
^.*\/ : everything from beginning to the last occurrence of / in the string
Is my understanding of the expression right?
How does .* work exactly?
Your general understanding is correct. The entire regex will match abc/def/ and String#gsub will replace it with empty string.
However, note that String#gsub doesn't change the string in place. This means that str will contain the original value("abc/def/ghi.rb") after the substitution. To change it in place, you can use String#gsub!.
As to how .* works - the algorithm the regex engine uses is called backtracking. Since .* is greedy (will try to match as many characters as possible), you can think that something like this will happen:
Step 1: .* matches the entire string abc/def/ghi.rb. Afterwards \/ tries to match a forward slash, but fails (nothing is left to match). .* has to backtrack.
Step 2: .* matches the entire string except the last character - abc/def/ghi.r. Afterwards \/ tries to match a forward slash, but fails (/ != b). .* has to backtrack.
Step 3: .* matches the entire string except the last two characters - abc/def/ghi.. Afterwards \/ tries to match a forward slash, but fails (/ != r). .* has to backtrack.
...
Step n: .* matches abc/def. Afterwards \/ tries to match a forward slash and succeeds. The matching ends here.
No, not quite.
^: beginning of a line
\/: escaped slash (escape character is \ alone)
^.*\/ : everything from beginning of a line to the last occurrence of / in the string
.* depends on the mode of the regex. In singleline mode (i.e., without m option), it means the longest possible sequence of zero or more non-newline characters. In multiline mode (i.e., with m option), it means the longest possible sequence of zero or more characters.
Your understanding is correct, but you should also note that the last statement is true because:
Repetition is greedy by default: as many occurrences as possible
are matched while still allowing the overall match to succeed.
Quoted from the Regexp documentation.
Yes. In short, it matches any number of any characters (.*) ending with a literal / (\/).
gsub replaces the match with the second argument (empty string '').
Nothing wrong with your regex, but File.basename(str) might be more appropriate.
To expound on what #Stefen said: It really looks like you're dealing with a file path, and that makes your question an XY problem where you're asking about Y when you should ask about X: Rather than how to use and understand a regex, the question should be what tool is used to manage paths.
Instead of rolling your own code, use code already written that comes with the language:
str = "abc/def/ghi.rb"
File.basename(str) # => "ghi.rb"
File.dirname(str) # => "abc/def"
File.split(str) # => ["abc/def", "ghi.rb"]
The reason you want to take advantage of File's built-in code is it takes into account the difference between directory delimiters in *nix-style OSes and Windows. At start-up, Ruby checks the OS and sets the File::SEPARATOR constant to what the OS needs:
File::SEPARATOR # => "/"
If your code moves from one system to another it will continue working if you use the built-in methods, whereas using a regex will immediately break because the delimiter will be wrong.

Creating a simple rule to rewrite a friendly URL to the physical URL

I cannot wrap my head around URL rewriting. What I want to do seems very simple but I am having problems getting the results I want.
I would like allow users to type www.mysite.com/search/real with an optional / at the end. This would take them to www.mysite.com/content/search_real_property.asp
That's it. Here is the rule I have right now. The problem with this is it will keep stacking.
RewriteRule ^(search) content/search_real_property.asp
So this would work /search/real but so would search/real/search/real/search/real/
and others.
Assuming there are no other issues, you've turned the rewrite engine on (RewriteEngine On) and that you're either adding the rewrite in httpd-vhosts.conf or an .htaccess file in the root of the web tree (so that any path issues are resolved)... then the issue is merely one of Regular Expression pattern matching. Though I'm a bit perplexed by ASP running on what appears to be an Apache server (assuming this IS mod rewrite we're talking about).
So, all you really want is to terminate the match - something like:
RewriteEngine On
RewriteRule ^search/real/?$ /content/search_real_property.asp
That will fix it to /search/real (with or without a trailing slash, the ? means match the preceding character 0 or 1 times) to /content/search_real_property.asp. As the $ sign denotes the line terminator (EOL effectively) there must be nothing after "real" (except perhaps that 1 forward slash).
For greater flexibility you might want to look at what you can actually do with regular expressions, for instance...
RewriteEngine On
RewriteRule ^search/([^/]*)/?$ /content/search_real_property.asp?query=$1
Which would allow you to take any string and pass it in the address bar as a variable called query (Request.QueryString('query') IIRC).
Try: http://www.regular-expressions.info/ for more info.

ReWrite RegEx, URL having at least one character

I have the following RewriteRule:
RewriteRule ^/people/([A-Za-z0-9\-\_]*)/?$ /people/people_details.cfm?person=$1 [I,L]
...it works great for forwarding my rule, but I want to make sure that the regex only picks it up if it has more than one character. So really, I need to have my regex...
[A-Za-z0-9\-\_]+
...have an additional rule to say that there has to be at least one character. Right now if I go to...
/people/
...it should go to the default document index.cfm, but because of the rule, it still tries to forward to my people_details.cfm
Any help?
Thanks,
George
Your regular expression that you put in your question already ensures that there must be at least one character. The + means "1 or more", as opposed to * which means "zero or more". Just change the * to a +.
...it should go to the default document index.cfm, but because of the rule, it still tries to forward to my people_details.cfm
Thats because you have the "/" as optional at the end, which is probably not what you wanted.

Parsing Mod-Rewrite

I'm trying to get two parameters with mod-rewrite. I tried to split them with "-" but unfortunately it returns last word as second parameter.
/ders/ilkogretim-matematik
/ders/ilkogretim-fen-ve-teknoloji
should be the URLs, "ilkogretim" will be the first parameter and the rest of it will be the second parameter. (After first "-")
My rules as follows:
RewriteRule ^ders/(.*)-(.*)/?$ /ogretmenler.php?sinif=$1&ders=$2 [QSA,L]
I hope I could explain the problem..
Thanks in advance...
Your . is only capturing a single character - you need a quantifier on there.
I've also made the first group capture any character except -:
ders/([^-]+)-(.*)/?$ /ogretmenler.php?sinif=$1&ders=$2 [QSA,L]
The problem is the single dots (.)-(.) will only match a single character. You probably want something like
^/ders/([^-]*)-(.*)/?$
The first group will match zero or more non - characters, followed by the single - and then the 2nd group will match zero or more of any character (you could restrict this more if desired).

Resources