htaccess rewrite - mod-rewrite

I would like to rewrite /anything.anyextension into /?post=anything.
eg:
/this-is-a-post.php into /?post=this-is-a-post or
/this-is-a-post.html into /?post=this-is-a-post or even
/this-is-a-post/ into /?post=this-is-a-post
I tried
RewriteRule ^([a-zA-Z0-9_-]+)(|/|\.[a-z]{3,4})$ ?$1 [L]
but it doesn't work.
Any help appreciated.

If you have access to the main server configuration, use this:
RewriteRule ^/(.+)\.\w+$ /?post=\1 [L]
If not, and you are forced to put this in a .htaccess file, you could try
RewriteRule ^(.+)\.\w+$ /?post=\1 [L]
In either case, this assumes you will only be rewriting URLs with a single path component (i.e. if you get a request like /path/anything.anyextension it might not work as you expect, the rewrite rule would need to be modified to handle that)

You need a better way to determine when to apply the rewrite rule, otherwise your page won't be able to display external JS or CSS, unless you define an exception.
SilverStripe (or the core, Sapphire) offers a good approach to this, something like:
RewriteEngine On
RewriteCond %{REQUEST_URI} !(\.css)|(\.js)|(\.swf)$ [NC]
RewriteCond %{REQUEST_URI} .+
RewriteRule ^([^\.]+) /?post=$1 [L,R=301]
This requires the URI not to be empty, not to be JS, CSS or SWF, and redirects back to your root directory:
http://localhost/this-is-a-post.php
http://localhost/?post=this-is-a-post
If you don't want a redirection, but the processing, remove the redirection rule R=301

Related

Can't use parentheses in RewriteCond QUERY_STRING

Moved from https://serverfault.com/questions/1013461/cant-use-parentheses-in-rewritecond-query-string because it's on topic here.
I need to capture a UID from an old url and redirect it to a new format.
example.com/?uid=123 should redirect to example.com/user/123
What should work:
RewriteCond %{QUERY_STRING} ^uid=(\d+)$
RewriteRule ^$ /user/%1? [L]
This does not redirect at all.
However, this does:
RewriteCond %{QUERY_STRING} ^uid=\d+$
RewriteRule ^$ /user/%1? [L]
It goes to example.com/user. The UID is left out, but it DOES redirect.
Notice: All I did was remove the parentheses in the second example.
Why is this?? How can I match the query AND capture the value of UID?
Updates
This is a laravel app. I've discovered that the redirects I did see may have been coming from the app, not Apache.
Self-answer coming soon...
Temporarily adding R=302 gives the desired result:
RewriteCond %{QUERY_STRING} ^uid=(\d+)$
RewriteRule ^$ /user/%1? [L,R=302]
This, of course, sends a 302 redirect to /users/123. I'd like to see if this can be done with an internal rewrite though...
Here are some rules in laravel's default .htaccess:
# Handle Front Controller...
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^ index.php [L]
This catches paths that do not point to real files, and it points them to the laravel app. When this is removed, Apache responds with a 404 for /users/1234.
https://httpd.apache.org/docs/2.4/rewrite/flags.html#flag_l
Such a rewrite goes back to Apache's URL parser. Then the .htaccess is processed again (since it's still applicable to this new URL). At this point, I'd expect the above rules to pick up the non-existent path and point it to the laravel app...
Found it. Writing an answer now.
The Answer
MrWhite was right. You have to add R=302 or R=301 to perform a redirect. An plain ol' rewrite won't work.
RewriteCond %{QUERY_STRING} ^uid=(\d+)$
RewriteRule ^$ /user/%1? [L,R=302]
The Reason
So, the way Laravel works is:
you request /some/file
.htaccess tells apache, "hey apache, if you have a request for a file that doesn't exist just pretend it's for index.php"
apache says, "hey php, I have a request to run index.php and the url is /some/file"
php runs the script which --whoah-- is a huge laravel application
whatever, "hey laravel, the server said /some/file is the url"
laravel does all it's fancy stuff, and it tries to match the url to one of your routes
Now, I added a rule to rewrite a certain URL to a virtual URL that Laravel should handle. I was matching against query parameters, but that was irrelevant. (see below for details)
When Apache's Rewrite Module hits a RewriteRule without an [R] flag, it rewrites the URL and sends it back to the URL Handler. Apache's URL Handler then processes the new URL against all the rules, including those in any applicable .htaccess files.
So all the proper rules did get applied.
Here's the key revelation:
The originally requested URL never changed. So while Apache was able to pass the request to PHP with the correct file, it was also sending along the old URL.
Therefore, we have to tell Apache to send a 301 or 302 Redirect response, instead of just rewriting the request. The user will send another request with the URL that Laravel needs to resolve the route.
But what about the different behavior with/without the parentheses?
The answer lies within Laravel's default .htaccess. Let's take a look my old rules without the parentheses:
RewriteCond %{QUERY_STRING} ^uid=\d+$
RewriteRule ^$ /user/%1? [L]
Without the parenthesis to grab the uid value, %1 is empty. So we end up rewriting the URL to just /user/.
Now, we have to look at another set of Laravel rules:
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ /$1 [L,R=301]
This normalizes urls so that virtual paths/routes don't contain trailing slashes. Doing this makes route parsing easier.
This returns a 301 Redirect to `/users'. This is very different from the 200 we were getting with the parentheses, but it does not mean the parentheses were behaving differently. As MrWhite said in the comments, surely something else was doing it.
I hope you enjoyed the ride. And I hope even more that this will save some poor, confused soul from hours of torment. :)

Apache rewrite condition for ajax crawling to pages with anchors

I have an AngularJS app on an Apache webserver that I would like to have indexed by search engines (i.e. Google/Bing bots etc.). I have a PhantomJS script to crawl and take snapshots of pages on my site, and I have followed the instructions from Google on how to redirect any http://mysite.com/?_escaped_fragment_=* requests to the appropriate pages.
The problem I'm facing is that I have a few routes in the app that change content based on the anchor, e.g. http://mysite.com/#!/about is different from http://mysite.com/#!/about#overview. I would like these changes to be indexed, but the hash character '#' is used for commenting and even escaping it with a backslash doesn't work. I have consulted other SO answers (e.g. Apache rewrite condition for ajax crawling and mod_rewrite page anchor), but I have not found instructions on how to deal with anchors.
I have two questions.
Is there a way to redirect URLs using mod_rewrite to snapshots that include anchors? For example, using the escaped version of '#' ('%23'):
http://mysite.com/?_escaped_fragment_=about%23overview => http://mysite.com/snapshots/about#overview.html
Here's what I currently have in my .htaccess file, though it does not work for pages with anchors:
RewriteEngine On
Options +FollowSymLinks
# Route for the index page
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/$
RewriteRule ^(.*)$ snapshots/index.html [NC,L]
# All other routes
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/?(.*)$
RewriteRule ^(.*)$ snapshots/%1.html [NC,L]
If (1) is not allowed, my idea on how to solve this problem is replace all '#' with '.' in the file names of the snapshots. Then I would need a mod_rewrite rule that would replace '#' with '.' in the escaped_fragment query parameter. Going back to my example, I currently have a rule that would take /?_escaped_fragment_=about#overview and reroute it to /snapshots/about.overview.html.
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/about%23overview$
RewriteRule ^(.*)$ snapshots/about.overview.html [NE,NC,L]
Is there a simple general rule I could use to implement this type of routing?
Any other ideas for how to solve this problem with general rewrite conditions would be appreciated, thanks!
I believe following rule should work for you:
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=([^&]+) [NC]
RewriteRule ^$ /snapshots/%1.html? [R,NE,L]
It redirects /?_escaped_fragment_=about%23overview to /snapshots/about%23overview.html

code igniter & mod_rewrite - one rewrite rule breaking another

I have a site built in codeigniter. We use short urls from our database & rewrite rules to redirect them to their full path.
For example,
RewriteRule ^secure-form$ form/contract/secure-form [L]
This works fine by itself. But I would like to use SSL on certain pages. I have edited the code so that if you go to one of these pages, all instances of http:// within the page are replaced with https:// but I need to rewrite the url to use it as well.
The pages all use the same template and all the content comes from the database so I can't just specify ssl on a particular directory.
The url's for the secure pages all start with 'secure' so I wrote the following rules and placed them above the other rewrites.
RewriteCond %{HTTPS} off
RewriteCond %{REQUEST_URI} ^/secure/?.*$
RewriteRule ^(.*)$ https://%{HTTP_HOST}/$1 [R=301,L]
RewriteCond %{HTTPS} on
RewriteCond %{REQUEST_URI} !^/secure/?.*$
RewriteRule ^(.*)$ http://%{HTTP_HOST}/$1 [R=301,L]
RewriteRule ^secure-form$ form/contract/secure-form [L]
RewriteRule ^secure-different-form$ form/contract/secure-different-form [L]
all other rewrite rules for specific pages follow
then the default rewrite further down...
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php/$1 [L]
The problem is that when I add the rules to change the protocol, it ends up displaying 'form/contract/secure-form' in the url instead of 'secure-form'.
This renders the actual form on the page broken since it uses that url to build itself.
If I take out the rules that change the protocol, it displays secure-form in the url as it should, but the page is not secure.
What am I doing wrong?
----UPDATE----
Ooh, after over 20 hrs of searching, I think I finally have an answer. So, first time through, https is off & gets turned on. Then, because of the 301, it's run again & the page gets sent to form/contract/secure... But this time, https is on. Since the uri no longer STARTS with secure, it turns https off.
Hopefully, this will help someone else.

when to use which mod_rewrite rule for self routing?

There are several ways to write a mod_write rule for self routing. At the moment i am using this one:
RewriteCond %{REQUEST_URI} !\.(js|ico|gif|jpg|png|css)$
RewriteRule ^.*$ index.php [NC,L]
But i also could use
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*) index.php
OR
ErrorDocument 404 /index.php
There may be many more.
Are there any drawbacks for using one of these examples?
Are there any use cases where one rule makes more sense then the other?
Could you explain the difference between these rules in detail?
Thx for your time and help.
When your condition is:
RewriteCond %{REQUEST_URI} !\.(js|ico|gif|jpg|png|css)$
Then only images, icons, styles, and javascript are excluded from routing. This means you can't access static html, directories, or directory indexes. So if you just want to plop down a static html page somewhere, and serve it without it getting routed through index.php. It also means if you accidentally put an image or script or style in the wrong place, and try to access it (you would normally get a 404), it wouldn't get routed through index.php eventhough and would yield the default 404 error page.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
These conditions will exclude any URI that points to an existing resource. So if you plot an image, a script, or directory, static html, etc anywhere in your document root, you'll be able to go there without it being routed through index.php. Sometimes the condition RewriteCond %{REQUEST_FILENAME} !-s is also included, which excludes URI's that point to a symlink. This is usually what you'd see when doing routing, wordpress uses this.
ErrorDocument 404 /index.php
This does essentially the same thing as the previous conditions, except it does it outside of mod_rewrite and there's no way to impose additional conditions in the future or as needed. The downside of doing routing outside of mod_rewrite is that mod_rewrite and the core directives (ErrorDocument in this case) do processing on the URI at different times in the URI-file mapping pipeline. So if you have rules that do other things, they could get applied, and then ultimately still get routed through index.php because the 2 directives are conflicting with each other. Simply because rewrite rules are applied at one point in the pipeline doesn't mean other directives won't get applied later down in the pipeline. This is a bad way to do routing.
There's also stuff like:
RewriteCond %{REQUEST_URI} !^/index.php
RewriteRule ^.*$ index.php [L]
Which will blindly route everything. Even javascript, even images, even static html, everything. Sometimes this is what people want. Ultimately, this is going to be dependent on what you want and what your index.php script does. Is it going to handle 404's? (like what you'd want in the first routing rule), is it just going to handle non-static resources? (like what the second rule does), or is it a literal catch all and will do everything (what the rule above does)?
Also note that your rewrite flags are different between the first and second rules. Those are significant if you have other rules.
The biggest drawback to the first example (which is the one you say that you use) is that this method hard codes the files extensions (.js .ico .gif .pnd) that are excluded from being rewritten to index.php. The problem with this is that if you need to add new static content that uses a file extension that is not in your exclusion list, you must modify your rewrite rule accordingly. For example, if you were to start hosting flash content and needed to host .swf and .flv files you will need to update your existing rewritecond rule.
The middle solution is best (IMHO) because it does exactly what is says it does, namely if the requested file doesn't exist (!-f condition) OR the requested directory doesn't exist (!-d condition) then rewrite the request to index.php.

dynamic subdomains with htaccess: URL shouldnt change in the browser

Trying to implement subdomains with htaccess.
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^www.
RewriteCond %{HTTP_HOST} ^([a-z0-9]+)\.domain.com(.*)$
RewriteRule ^(.*)$ http://domain.com/index.php?/public_site/main/%1/$1 [L]
</IfModule>
when i enter ahser.domain.com the browser URL is changing. is there a htaccess option to not let this happen when absolute URLs is used in RewriteRule?
Don't rewrite to a full URL with domain in it. That generates a redirect since it's going to a different website! You could put microsoft.com there; so how would it work without redirecting?
What you have to do is make sure that the web pages work under the original domain. So when the client asks for myname.domain.com/... how about rewriting that to myname.domain.com/index.php?public_site/main/myname/.... Keep the domain the same. The index.php? can be made to work in any of those domains. For instance, even this could work:
http://OTHER.domain.com/index.php?public_site/main/MYNAME/...
I.e. set it up so it doesn't matter which virtual host accesses that path.
Once you have that, the rewrite can then just do:
# will not trigger redirect
RewriteRule ^(.*)$ /index.php?/public_site/main/%1/$1 [L]
You have to be careful not to introduce a loop since you're now redirecting a URL to a longer URL which matches the same rewrite rulethe same domain. You need an additional RewriteCond not to apply this rewrite if the URL already starts with /index.php?public_site/.

Resources