Using mod_rewrite to view cached version from usual URL - caching

My PHP site generates static html versions of dynamic db driven code and stores them in a folder called cache.
This means that when you visit say, /about-us/, the request is routed through index.php?page=about-us, and produces a file called /cache/about-us.html.
If that file exists, the PHP includes it, then exits. This seems a waste of time, why not just get apache to serve up /cache/about-us.html when /about-us/ is requested, but only if it exists.
My current mod_rewrite section just includes this so far:
RewriteRule ^([A-Za-z0-9-_/\.]+)\/$ /?page=$1 [L]
Which writes any /foo_bar/ request to index.php?page=foo_bar. What can I put before this to request my cached version if it exists?

First send everything to the /cache/ folder
RewriteRule ^([A-Za-z0-9-_/\.]+)/$ /cache/$1
Then, check if it's not found, and reroute
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^/cache/([A-Za-z0-9-_/\.]+)$ /page?=$1 [L]
Depending on how your server is set up you might have to change the paths a bit (I'd recommend using absolute ones). Also note that this will also apply to images that match the RewriteRule
I'm not sure wether you'll see any significant performance increase with this - just including a static file doesn't take that long (if you don't hit a database). Also, if you have APC you could cache the files in-memory.

Related

Can't use parentheses in RewriteCond QUERY_STRING

Moved from https://serverfault.com/questions/1013461/cant-use-parentheses-in-rewritecond-query-string because it's on topic here.
I need to capture a UID from an old url and redirect it to a new format.
example.com/?uid=123 should redirect to example.com/user/123
What should work:
RewriteCond %{QUERY_STRING} ^uid=(\d+)$
RewriteRule ^$ /user/%1? [L]
This does not redirect at all.
However, this does:
RewriteCond %{QUERY_STRING} ^uid=\d+$
RewriteRule ^$ /user/%1? [L]
It goes to example.com/user. The UID is left out, but it DOES redirect.
Notice: All I did was remove the parentheses in the second example.
Why is this?? How can I match the query AND capture the value of UID?
Updates
This is a laravel app. I've discovered that the redirects I did see may have been coming from the app, not Apache.
Self-answer coming soon...
Temporarily adding R=302 gives the desired result:
RewriteCond %{QUERY_STRING} ^uid=(\d+)$
RewriteRule ^$ /user/%1? [L,R=302]
This, of course, sends a 302 redirect to /users/123. I'd like to see if this can be done with an internal rewrite though...
Here are some rules in laravel's default .htaccess:
# Handle Front Controller...
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^ index.php [L]
This catches paths that do not point to real files, and it points them to the laravel app. When this is removed, Apache responds with a 404 for /users/1234.
https://httpd.apache.org/docs/2.4/rewrite/flags.html#flag_l
Such a rewrite goes back to Apache's URL parser. Then the .htaccess is processed again (since it's still applicable to this new URL). At this point, I'd expect the above rules to pick up the non-existent path and point it to the laravel app...
Found it. Writing an answer now.
The Answer
MrWhite was right. You have to add R=302 or R=301 to perform a redirect. An plain ol' rewrite won't work.
RewriteCond %{QUERY_STRING} ^uid=(\d+)$
RewriteRule ^$ /user/%1? [L,R=302]
The Reason
So, the way Laravel works is:
you request /some/file
.htaccess tells apache, "hey apache, if you have a request for a file that doesn't exist just pretend it's for index.php"
apache says, "hey php, I have a request to run index.php and the url is /some/file"
php runs the script which --whoah-- is a huge laravel application
whatever, "hey laravel, the server said /some/file is the url"
laravel does all it's fancy stuff, and it tries to match the url to one of your routes
Now, I added a rule to rewrite a certain URL to a virtual URL that Laravel should handle. I was matching against query parameters, but that was irrelevant. (see below for details)
When Apache's Rewrite Module hits a RewriteRule without an [R] flag, it rewrites the URL and sends it back to the URL Handler. Apache's URL Handler then processes the new URL against all the rules, including those in any applicable .htaccess files.
So all the proper rules did get applied.
Here's the key revelation:
The originally requested URL never changed. So while Apache was able to pass the request to PHP with the correct file, it was also sending along the old URL.
Therefore, we have to tell Apache to send a 301 or 302 Redirect response, instead of just rewriting the request. The user will send another request with the URL that Laravel needs to resolve the route.
But what about the different behavior with/without the parentheses?
The answer lies within Laravel's default .htaccess. Let's take a look my old rules without the parentheses:
RewriteCond %{QUERY_STRING} ^uid=\d+$
RewriteRule ^$ /user/%1? [L]
Without the parenthesis to grab the uid value, %1 is empty. So we end up rewriting the URL to just /user/.
Now, we have to look at another set of Laravel rules:
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ /$1 [L,R=301]
This normalizes urls so that virtual paths/routes don't contain trailing slashes. Doing this makes route parsing easier.
This returns a 301 Redirect to `/users'. This is very different from the 200 we were getting with the parentheses, but it does not mean the parentheses were behaving differently. As MrWhite said in the comments, surely something else was doing it.
I hope you enjoyed the ride. And I hope even more that this will save some poor, confused soul from hours of torment. :)

rewrite url only and stay on the same page

I have read a lot on stack about rewriterule and how it applies and I've tried reading up on some good articles online but I still cannot wrap my head around a few things.
I have blogs setup where all folders are in
https://domain.ca/posts/post-tree/*
So I've setup htaccess like this
RewriteRule ^posts/post-tree/(.*)$ /index.php?$1 [R=301]
As I'm sure you can guess this basically brings me root index.php where I catch this request with a $_GET to know the name of the blog folder it was requesting.
This is fine I can hit index.php and with $_GET I know the blog page they requested.
What I do not get, and I've tried a lot of things, is once I have this request in index.php how do I re-write the URL to show something like https://domain.ca/blogpage/ instead of looking like https://domain.ca/index.php? where https://domain.ca/blogpage/ does not really exist of course, but it is because I want to hide the http://domain.ca/posts/post-tree/ path.
Its a little like when wordpress processes a blog page with the id and after rewrites the url to whatever slug is set for that blog page. at least my understanding of it as they don't have individual folders for blogs, but I do.
I finally got this working with the following in the htaccess file
RewriteEngine on
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
# the above checks if file or folder exists, if not the below is processed
# this will route to base index file and fetch $1 folder via $_GET
RewriteRule . /posts/post-tree/index.php?$1

How to keep mod_rewrite from recognizing directories

So, lately I've been dealing with an issue relating to mod_rewrite and it seems nobody is trying to do anything like it. Every question people have is about trying to exclude directories from the rewrite, when I want them to be included like any other.
For instance, assuming my root directory with .htaccess file in it is www.example.com/root/
When I type in made up directory, such as www.example.com/root/asdfasdf, I have my .htaccess file set to redirect me to www.example.com/root/index.php?url=asdfasdf without change what's in the address bar on my browser
However, in trying to do the same with a real directory, such as www.example.com/root/admin, it not only changes the url in the address bar but changes it to www.example.com/root/admin/?url=admin.
Can anyone explain to me what's going on. I've tried all kinds of different regular expressions and flags and the ones that redirect anything still cause this same issue. can I go to www.example.com/root/admin and still get redirected to the root folder while hiding that the query string is ?url=admin.
[UPDATE: additional information 11-30-2012]
Like I said, I've tried it will multiple different lines of code and come out with the exact same redirect issue, assuming the redirect doesn't just fail altogether and produce a 500 error. Here's one of my latest iterations, though, which has produced the issue of not ignoring direcotories.
RewriteEngine On
RewriteBase /root/
RewriteCond %{REQUEST_FILENAME} !^(.\*\\.("png"|"jpg"|"gif") [NC]
RewriteRule (.\*?) index.php?url=$1 [QSA]
The rewrite condition is to keep the engine from rewriting if a picture is being requested (for css and img tags). I only didn't mention it previously because I have tried removing that line and it has made no difference.
I'm not exactly a master of mod_rewrite, though, so if you see any errors with anything I've written, please feel free to let me know.
It's not entirely clear from your question what you are trying to do and it would have been helpful to see what your .htaccess file actually looked like. However the following lines in an .htaccess file in the root folder:
RewriteCond %{REQUEST_URI} !^/root/index\.php
RewriteRule (.*) /root/index.php?url=$1 [L]
Will silently redirect requests made to http://www.example.com/root/madeupfolder/madeupfile.php to http://www.example.com/root/index.php?url=madeupfolder/madeupfile.php and will also do the same for real folders. So if the folder admin exists under root, then requests to http://www.example.com/root/admin will be silently redirected to http://www.example.com/root/index.php?url=admin
If however you wanted to serve up folders and files that actually exist, but rewrite requests for folders and files that do not exist, then you would need to adjust the rewrite like so
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/root/index\.php
RewriteRule (.*) /root/index.php?url=$1 [R=301]
This would still rewrite requests made to http://www.example.com/root/madeupfolder/madeupfile.php to http://www.example.com/root/index.php?url=madeupfolder/madeupfile.php, but for real folders and files, such as requests made to http://www.example.com/root/admin, the admin folder would be served up.
Hope this helps, but if you can clarify your question a bit then I can try and help again.

when to use which mod_rewrite rule for self routing?

There are several ways to write a mod_write rule for self routing. At the moment i am using this one:
RewriteCond %{REQUEST_URI} !\.(js|ico|gif|jpg|png|css)$
RewriteRule ^.*$ index.php [NC,L]
But i also could use
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*) index.php
OR
ErrorDocument 404 /index.php
There may be many more.
Are there any drawbacks for using one of these examples?
Are there any use cases where one rule makes more sense then the other?
Could you explain the difference between these rules in detail?
Thx for your time and help.
When your condition is:
RewriteCond %{REQUEST_URI} !\.(js|ico|gif|jpg|png|css)$
Then only images, icons, styles, and javascript are excluded from routing. This means you can't access static html, directories, or directory indexes. So if you just want to plop down a static html page somewhere, and serve it without it getting routed through index.php. It also means if you accidentally put an image or script or style in the wrong place, and try to access it (you would normally get a 404), it wouldn't get routed through index.php eventhough and would yield the default 404 error page.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
These conditions will exclude any URI that points to an existing resource. So if you plot an image, a script, or directory, static html, etc anywhere in your document root, you'll be able to go there without it being routed through index.php. Sometimes the condition RewriteCond %{REQUEST_FILENAME} !-s is also included, which excludes URI's that point to a symlink. This is usually what you'd see when doing routing, wordpress uses this.
ErrorDocument 404 /index.php
This does essentially the same thing as the previous conditions, except it does it outside of mod_rewrite and there's no way to impose additional conditions in the future or as needed. The downside of doing routing outside of mod_rewrite is that mod_rewrite and the core directives (ErrorDocument in this case) do processing on the URI at different times in the URI-file mapping pipeline. So if you have rules that do other things, they could get applied, and then ultimately still get routed through index.php because the 2 directives are conflicting with each other. Simply because rewrite rules are applied at one point in the pipeline doesn't mean other directives won't get applied later down in the pipeline. This is a bad way to do routing.
There's also stuff like:
RewriteCond %{REQUEST_URI} !^/index.php
RewriteRule ^.*$ index.php [L]
Which will blindly route everything. Even javascript, even images, even static html, everything. Sometimes this is what people want. Ultimately, this is going to be dependent on what you want and what your index.php script does. Is it going to handle 404's? (like what you'd want in the first routing rule), is it just going to handle non-static resources? (like what the second rule does), or is it a literal catch all and will do everything (what the rule above does)?
Also note that your rewrite flags are different between the first and second rules. Those are significant if you have other rules.
The biggest drawback to the first example (which is the one you say that you use) is that this method hard codes the files extensions (.js .ico .gif .pnd) that are excluded from being rewritten to index.php. The problem with this is that if you need to add new static content that uses a file extension that is not in your exclusion list, you must modify your rewrite rule accordingly. For example, if you were to start hosting flash content and needed to host .swf and .flv files you will need to update your existing rewritecond rule.
The middle solution is best (IMHO) because it does exactly what is says it does, namely if the requested file doesn't exist (!-f condition) OR the requested directory doesn't exist (!-d condition) then rewrite the request to index.php.

mod_rewrite espression for multi client application

I have a php application that serves multiple customers. Code is placed in the domain's root and is shared for all customers. Each customer can access it's page by using query string parameter "id".
I need advice and a sample code how to achieve this routing via mod_rewrite or it' better way to do it through php routing script:
Home page:
www.example.com/customerA --> www.example.com/customerA/main?id=1
www.example.com/customerB --> www.example.com/customerB/main?id=4
Note: "main" is main.php file not displaying file extensions.
Customer subfolders are not the real ones.
Inner pages are using additional parameters like:
www.example.com/customerA/page1?id=1&par1=5
On SERVER SIDE all rewrites should be interpreted as www.example.com/main?id=4
without virtual subfolder.
Thanks.
Here's what should work:
RewriteRule ^/customerA /customerA/main?id=1 [QSA,NC,R=301,L]
RewriteRule ^/customerB /customerB/main?id=4 [QSA,NC,R=301,L]
RewriteRule ^/(customer(A|B))/main /main [QSA,NC]
Now that's I've answered precisely to your question, I'm pretty sure it's not what you want.
If you have a lot of customers, I've made a huge answer here of a question that was about films, but you it's about customers, but the principle is exactly the same.
If you want to be more generic:
# if URL is not a real file...
RewriteCond %{SCRIPT_FILENAME} !-f
# if URL is not a real folder...
RewriteCond %{SCRIPT_FILENAME} !-d
# ...and if adding "php" points to a real file...
RewriteCond %{SCRIPT_FILENAME}.php -f
# ...then rewrite internally with "php" extension:
RewriteRule (.*) $1.php [QSA,NC]
Hope this helps.

Resources