Apache rewrite condition for ajax crawling to pages with anchors - ajax

I have an AngularJS app on an Apache webserver that I would like to have indexed by search engines (i.e. Google/Bing bots etc.). I have a PhantomJS script to crawl and take snapshots of pages on my site, and I have followed the instructions from Google on how to redirect any http://mysite.com/?_escaped_fragment_=* requests to the appropriate pages.
The problem I'm facing is that I have a few routes in the app that change content based on the anchor, e.g. http://mysite.com/#!/about is different from http://mysite.com/#!/about#overview. I would like these changes to be indexed, but the hash character '#' is used for commenting and even escaping it with a backslash doesn't work. I have consulted other SO answers (e.g. Apache rewrite condition for ajax crawling and mod_rewrite page anchor), but I have not found instructions on how to deal with anchors.
I have two questions.
Is there a way to redirect URLs using mod_rewrite to snapshots that include anchors? For example, using the escaped version of '#' ('%23'):
http://mysite.com/?_escaped_fragment_=about%23overview => http://mysite.com/snapshots/about#overview.html
Here's what I currently have in my .htaccess file, though it does not work for pages with anchors:
RewriteEngine On
Options +FollowSymLinks
# Route for the index page
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/$
RewriteRule ^(.*)$ snapshots/index.html [NC,L]
# All other routes
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/?(.*)$
RewriteRule ^(.*)$ snapshots/%1.html [NC,L]
If (1) is not allowed, my idea on how to solve this problem is replace all '#' with '.' in the file names of the snapshots. Then I would need a mod_rewrite rule that would replace '#' with '.' in the escaped_fragment query parameter. Going back to my example, I currently have a rule that would take /?_escaped_fragment_=about#overview and reroute it to /snapshots/about.overview.html.
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/about%23overview$
RewriteRule ^(.*)$ snapshots/about.overview.html [NE,NC,L]
Is there a simple general rule I could use to implement this type of routing?
Any other ideas for how to solve this problem with general rewrite conditions would be appreciated, thanks!

I believe following rule should work for you:
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=([^&]+) [NC]
RewriteRule ^$ /snapshots/%1.html? [R,NE,L]
It redirects /?_escaped_fragment_=about%23overview to /snapshots/about%23overview.html

Related

Can't use parentheses in RewriteCond QUERY_STRING

Moved from https://serverfault.com/questions/1013461/cant-use-parentheses-in-rewritecond-query-string because it's on topic here.
I need to capture a UID from an old url and redirect it to a new format.
example.com/?uid=123 should redirect to example.com/user/123
What should work:
RewriteCond %{QUERY_STRING} ^uid=(\d+)$
RewriteRule ^$ /user/%1? [L]
This does not redirect at all.
However, this does:
RewriteCond %{QUERY_STRING} ^uid=\d+$
RewriteRule ^$ /user/%1? [L]
It goes to example.com/user. The UID is left out, but it DOES redirect.
Notice: All I did was remove the parentheses in the second example.
Why is this?? How can I match the query AND capture the value of UID?
Updates
This is a laravel app. I've discovered that the redirects I did see may have been coming from the app, not Apache.
Self-answer coming soon...
Temporarily adding R=302 gives the desired result:
RewriteCond %{QUERY_STRING} ^uid=(\d+)$
RewriteRule ^$ /user/%1? [L,R=302]
This, of course, sends a 302 redirect to /users/123. I'd like to see if this can be done with an internal rewrite though...
Here are some rules in laravel's default .htaccess:
# Handle Front Controller...
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^ index.php [L]
This catches paths that do not point to real files, and it points them to the laravel app. When this is removed, Apache responds with a 404 for /users/1234.
https://httpd.apache.org/docs/2.4/rewrite/flags.html#flag_l
Such a rewrite goes back to Apache's URL parser. Then the .htaccess is processed again (since it's still applicable to this new URL). At this point, I'd expect the above rules to pick up the non-existent path and point it to the laravel app...
Found it. Writing an answer now.
The Answer
MrWhite was right. You have to add R=302 or R=301 to perform a redirect. An plain ol' rewrite won't work.
RewriteCond %{QUERY_STRING} ^uid=(\d+)$
RewriteRule ^$ /user/%1? [L,R=302]
The Reason
So, the way Laravel works is:
you request /some/file
.htaccess tells apache, "hey apache, if you have a request for a file that doesn't exist just pretend it's for index.php"
apache says, "hey php, I have a request to run index.php and the url is /some/file"
php runs the script which --whoah-- is a huge laravel application
whatever, "hey laravel, the server said /some/file is the url"
laravel does all it's fancy stuff, and it tries to match the url to one of your routes
Now, I added a rule to rewrite a certain URL to a virtual URL that Laravel should handle. I was matching against query parameters, but that was irrelevant. (see below for details)
When Apache's Rewrite Module hits a RewriteRule without an [R] flag, it rewrites the URL and sends it back to the URL Handler. Apache's URL Handler then processes the new URL against all the rules, including those in any applicable .htaccess files.
So all the proper rules did get applied.
Here's the key revelation:
The originally requested URL never changed. So while Apache was able to pass the request to PHP with the correct file, it was also sending along the old URL.
Therefore, we have to tell Apache to send a 301 or 302 Redirect response, instead of just rewriting the request. The user will send another request with the URL that Laravel needs to resolve the route.
But what about the different behavior with/without the parentheses?
The answer lies within Laravel's default .htaccess. Let's take a look my old rules without the parentheses:
RewriteCond %{QUERY_STRING} ^uid=\d+$
RewriteRule ^$ /user/%1? [L]
Without the parenthesis to grab the uid value, %1 is empty. So we end up rewriting the URL to just /user/.
Now, we have to look at another set of Laravel rules:
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)/$ /$1 [L,R=301]
This normalizes urls so that virtual paths/routes don't contain trailing slashes. Doing this makes route parsing easier.
This returns a 301 Redirect to `/users'. This is very different from the 200 we were getting with the parentheses, but it does not mean the parentheses were behaving differently. As MrWhite said in the comments, surely something else was doing it.
I hope you enjoyed the ride. And I hope even more that this will save some poor, confused soul from hours of torment. :)

Create two different rules on .htaccess: redirect some urls to my old site and create 301 redirects for news from my old to new site

I'm trying to understand how to redirect some urls that on my new site are not yet active but redirect a few other to new imported page on my new website
Example
I have and joomla website i'd like to redirect
www.mywebsite.it/it to old.mywebsite.it/it
But have also some spercific 301 redirects for news and some page
it/notizie/2036-slugnews.html to notizie/year/notizia/slugnews
I'm working on Apache2, with mysql 5.7, php 7.2 and October Cms.
With mod_rewrite enabled and laravel
Read some examples on:
https://www.danielmorell.com/guides/htaccess-seo/redirects/introduction-to-redirects
Imagined i have to try to write a condition that get all urls after /it/*but insert an exception to write all except the ones with 301 redirects
Honestly can't figure out how to redirect write the condition.
Have you some suggestions?
You should place your special url redirects before general rule in your .htaccess file:
For example:
RewriteEngine on
RewriteRule ^it/notizie/2036\-slugnews\.html$ http://old.mywebsite.it/notizie/year/notizia/slugnews? [R=301,L]
RewriteCond %{HTTP_HOST} ^www\.mywebsite\.it$
RewriteRule ^(.*)$ http://old.mywebsite.it/$1 [R=301,L]
If You new to redirects, I can suggest to use https://www.301-redirect.online/ 301 redirect generator for your special page to page rules. This generator is the best one I know at this moment.
Also, change http:// to https:// if your site uses secure connection.
after a series on trials i have found an intermediate solution:
RedirectMatch 301 ^/it(.*)$ https://www.mysite.it$1
RewriteRule ^it/genericpage.html https://www.mywebiste.it/genericapagetoredirect? [R=301,L]
RewriteRule ^it/notizie/2002-slug.html https://www.mywebiste.it/notizie/2019/notizia/slugtoberedirected? [R=301,L]
In this way i redirect my old https://www.mysite.it/it home page to https://www.mysite.it
and some urls with a one to one redirect
But is not my goal.
Reasoning on the subject i found the defnitive solution
My goal is to redirect
1) a bunch of url with a rewrite url 301
2) All the links which are not included in the list of rewrite url genrically to https://www.mysite.it
3) All bad link joomla website create from www.mysite.it/it/tags/* and www.mysite.it/it/components/* to www.mysite.it
In general the goal frankly is to mantain linked urls with high authority in the google search console.
Aware of this i tried to
RewriteRule ^it/genericpage\.html$ https://www.mywebiste.it/genericapage? [R=301,L]
RewriteCond %{HTTPS_HOST} ^www\.mysite\.it$
RewriteRule ^it/$ https://www.mysite.it/? [R=301,L]
and
RewriteRule ^it/genericpage\.html$ https://www.mywebiste.it/genericapage?[R=301,L]
RewriteCond %{HTTPS_HOST} ^www\.mysite\.it$
RewriteRule ^it(/.*) $1 [R=301,L]
But without success.
What do you think?

301 redirect in .htacces doesnt work:( Redesigned a site but cant get the old url's to link to new ones

I redesigned this website, and tried to create a 301 redirect to redirect the old links to the new ones in order to save the existing google pagerank. I am using joomla for my site.
Example of a link i tried:
Old link: frutaplantaonline.nl/cgi-bin/index.pl?n=823&txt=privacy_policy
New link: frutaplantaonline.nl/privacy-policy
I tried:
Redirect 301 /cgi-bin/index.pl?n=823&txt=privacy_policy http://frutaplantaonline.nl/privacy-policy
RewriteRule ^/?cgi-bin/index.pl?n=823&txt=privacy_policy/?$ http://frutaplantaonline.nl/privacy-policy [L,R=301]
All the links have the cgi-bin/index.pl in it, if i remove the dot in index.pl i can redirect the site.
Have been searching for hours and hours but found no solution, I'd appreciate if someone can help me out!
Redirect directive in your case is uselsess as it works only on paths (i.e. /cgi-bin/index.pl).
You need to use RewriteRule directive and accompany it with RewriteCond to match against query string. From the official documentation:
If you wish to match against the hostname, port, or query string, use a RewriteCond with the %{HTTP_HOST}, %{SERVER_PORT}, or %{QUERY_STRING} variables respectively.
So in your case something like this might do:
RewriteCond %{QUERY_STRING} n=823&txt=privacy_policy
RewriteRule ^/cgi-bin/index.pl$ http://frutaplantaonline.nl/privacy-policy [R=301,L]

Forwarding only the root of a subdomain to a folder in the main domain (mod_rewrite)

I think I'm misunderstanding some fundamental thing here, as this is my first time using mod_rewrite.
I would like the following:
blog.example.com
blog.example.com/
blog.example.com/index.php
to redirect to:
example.com/blog
However, I would like all other cases to do nothing, e.g.
blog.example.com/foobar
blog.example.com/wp-admin
blog.example.com/wp-admin.php
This is my current .htaccess:
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} =blog.example.com
RewriteCond %{REQUEST_URI} ^|/|index\.php$
RewriteRule ^(.*)$ http://www\.example\.com/blog [QSA,L]
I have tried many variants but each time either all requests redirect, or none redirect, so mod_rewrite is definitely doing something, just not what I want.
I have skimmed through this post for anything relevant, but I think the issue is more that I'm missing some subtle fact due to inexperience. Would someone kindly point out my error? I know we don't need yet another mod_rewrite question on SO but I'm really struggling with this one. Thanks in advance.
Try this:
RewriteEngine On
RewriteCond %{HTTP_HOST} =blog.example.com [NC]
RewriteRule ^/?(index\.php)?$ http://www.example.com/blog [QSA,R,L]
Some basic points:
The second part of the rewrite rule is not a regular expression, so periods do not need to be escaped there.
I was not familiar with the =String option for RewriteCond. Thank you for teaching me something today (I was worried when I did not see a regex there but that should be fine)! I would add [NC] here since it will help match both lowercase and uppercase versions of your domain.
None of this will work unless your virtual host includes the blog.example.com subdomain as a proper alias, and DNS is set up for that subdomain.

htaccess rewrite

I would like to rewrite /anything.anyextension into /?post=anything.
eg:
/this-is-a-post.php into /?post=this-is-a-post or
/this-is-a-post.html into /?post=this-is-a-post or even
/this-is-a-post/ into /?post=this-is-a-post
I tried
RewriteRule ^([a-zA-Z0-9_-]+)(|/|\.[a-z]{3,4})$ ?$1 [L]
but it doesn't work.
Any help appreciated.
If you have access to the main server configuration, use this:
RewriteRule ^/(.+)\.\w+$ /?post=\1 [L]
If not, and you are forced to put this in a .htaccess file, you could try
RewriteRule ^(.+)\.\w+$ /?post=\1 [L]
In either case, this assumes you will only be rewriting URLs with a single path component (i.e. if you get a request like /path/anything.anyextension it might not work as you expect, the rewrite rule would need to be modified to handle that)
You need a better way to determine when to apply the rewrite rule, otherwise your page won't be able to display external JS or CSS, unless you define an exception.
SilverStripe (or the core, Sapphire) offers a good approach to this, something like:
RewriteEngine On
RewriteCond %{REQUEST_URI} !(\.css)|(\.js)|(\.swf)$ [NC]
RewriteCond %{REQUEST_URI} .+
RewriteRule ^([^\.]+) /?post=$1 [L,R=301]
This requires the URI not to be empty, not to be JS, CSS or SWF, and redirects back to your root directory:
http://localhost/this-is-a-post.php
http://localhost/?post=this-is-a-post
If you don't want a redirection, but the processing, remove the redirection rule R=301

Resources