Google Ajax crawling is failing - ajax

I have a php ajax website that serves pages to my users like
http://www.example.com/ => this has all the individual page contents like listing
http://www.example.com/#!page1-uid => has page1 contents, uid is the unique mongoDB identifier for that page
http://www.example.com/#!page2-uid => has page2 contents, uid is the unique mongoDB identifier for that page
I want google to crawl my website to index all of about 200+ pages but none of them are getting indexed
I pretty much followed and understood the google ajax crawling methods but not sure where/what i am still missing.
Here is the setup:
.htaccess
RewriteCond %{HTTP_USER_AGENT} (googlebot|yahoo|bingbot|baiduspider) [NC]
RewriteCond %{QUERY_STRING} _escaped_fragment_=(.*)$
RewriteRule ^(.*)$ botIndex.php?QSA=%1 [QSA,L]
botIndex.php
$var1 = $_REQUEST['QSA'];
checks if QSA is set, if so, serves the individual page1/page2
else gives out the default home page that has the listing of all the page links
When I tested using GWT ( "fetch as google" ), here is the pattern i observe
a) www.example.com/ => it gets redirected to botIndex.php and returns me all the links (default view) just as expected
b) www.example.com/#!page1-uid => redirects to the botIndex.php and returns me all the links but ideally it should return the actual page content instead of the home page contents (not sure GWT has the ability to ask for _escaped_fragment_ to mimic googlebot)
c) www.example.com/?_escaped_fragement_ => GWT returns "Not found" error
By adding few echo in the botIndex.php, What i suspect is none of the above requests shows that the "_escaped_fragment_" is caught
hence my script botIndex.php does not get the value of the QUERY_STRING (QSA) to serve the page1/page2 individual pages instead always
defaults to home page showing all the page listing.
I tested the URL's directly for botIndex.php like
a) http://www.example.com/botIndex.php?_escaped_fragment_=QSA= (returns all the links )
b) http://www.example.com/botIndex.php?_escaped_fragment_=QSA=page1-uid (returns the actual page details)
What am i still missing ?
I strongly believe the .htaccess has the issue which is not possibly passing the QSA to my script.
Please suggest.
UPDATE: I am still stuck. Anyone can help me with some pointers ?

Evidently, you have problem with preserving GET parameters during rewriting. Try to debug you .htaccess directives.
Another option is to create one entry point for your php app, just like all modern frameworks do. And implement all the logic (serving html content for bots) in your php app.

Related

How can I redirect non www http to https www in web.config

I want to redirect all of the following types of requests to https://www.example.com
http://example.com
http://www.example.com
https://example.com
This depends on how you want to redirect. You could...
Redirect from the client side. This is easy to do, simply add a script tag in the <head> of the page HTML document that contains the following.
let excluded = ['http://www.example.com', 'https://example.com', 'http://example.com']
if (excluded.indexOf(location.origin) !== -1) {
location.href = 'https://www.example.com'
}
There is only one problem with this code, being that the browser might complain about too many redirects. The second option is safer.
If you know how, you could use .htaccess. You will find this file in the root directory of your web server. Add the following lines:
RewriteEngine On
RewriteCond %{HTTPS} !on
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI}
(Check out this question for more on this.)
If the page is not strictly static, you could redirect from the backend. There are a number of ways to do this, the most likely being to just use an npm module such as forcedomain (my personal favorite) to redirect when the request is made, rather than when the page loads. This is more efficient and the browser will not get cranky on you.
Use multiple CNAME records. If you aren't familiar with DNS, I would disregard this part of the answer. Basically, make a CNAME where the host is #, and the value is www. This will get all requests to the apex domain (example.com) and redirect them to the subdomain (www.example.com). When it comes to the protocol issue, there are a few ways of doing this, the best option being wildcard redirects (if your registrar provides this).
As a general rule of thumb, it's best to use DNS records, the .htaccess file, or some sort of backend plugin if possible for redirects if you know how. This all depends on your hosting, the nature of your website (static or dynamic), and your level of knowledge on these sorts of things.
Recommend using Firebase Hosting: https://firebase.google.com/docs/hosting/quickstart

Google does not index startpage (index.html) of AJAX application correctly but all subpages containing a hashbang (#!)

I followed the google guideline Making AJAX applications crawlable to make my AngularJS Application crawlable for SEO purposes. So I am using #! (hashbang) in my routes config:
$locationProvider.hashPrefix('!');
So my URLs look like this:
http://www.example.com/#!/page1.html
http://www.example.com/#!/page2.html
...
As google replaces the hashbangs (#!) with ?_escaped_fragment_= I redirect the google bots via my .htaccess file to a snapshot of the page:
DirectoryIndex index.html
RewriteEngine On
RewriteCond %{QUERY_STRING} ^_escaped_fragment_=/?(.*)$
RewriteRule ^(.*)$ /snapshot/%1? [NC,L]
So far everything works like a charm. When a bot requests following URL http://www.example.com/#!/page1.html it will replace the hashbang and actually requests http://www.example.com/?_escaped_fragment_=/page1.html which I redirect to the static/prerendered version of the requested page.
So I submitted my sitemap.xml via the Search Console from Google Webmaster Tools. All URLs in my sitemap are indexed correctly by google but not the domain itself. So it means that a page like:
http://www.example.com/#!/page1.html
is indexed correctly and by googling specific content of any of my subpages google finds the correct page. The problem is the start/homepage itself which "naturally" has no hashbang:
http://www.example.com/
The hashbang here is appended (via javascript in my router configuration) when a user visits the site. But it looks that this is not the case for the google bot.
So the crawlers does not "see" the hashbang and hence does not use the static version here which is a big issue because especially here I provide the most important content.
I already tried to rewrite and redirect / via .htaccess to /#!/but this ends up in to many redirects and crashes everything. I also tried to use
<meta name="fragment" content="!">
in the header of the index.html. But this did not help at all.
Does anybody else faced that problem before?

how to redirect a link in my website to an external link?

I have provided a link from my web site in my andoird app which is:
www.mysite.com/support
And i want this link to be redirected to:
www.anothersite.com
i have already tried com_redirect and entered support as source and http://ww.anothersite.com but i dont get any redirects and i get 404 error.
I am running Joomla 3.x and i want to know how i can do this with URL rewrites and no external components.
It seems not possible to do it within the Joomla backend (i tried many combination of menu items and the redirect module).
For sure you can write your redirect directly in your .htaccess (if you are using it) like this:
RewriteEngine On # Turn on the rewriting engine
RewriteRule ^support/?$ http://www.anothersite.com/ [R=301,NC,L] # Permanent Move
or
RewriteEngine On # Turn on the rewriting engine
RewriteRule ^support/?$ http://www.anothersite.com [R,NC,L] # Temporary Move
More info about URL rewriting and .htaccess are here: http://www.addedbytes.com/articles/for-beginners/url-rewriting-for-beginners/

.htacces redirect for AJAX search engine indexing

Ok, so I made a pure html/javascript AJAX website, but I want my pages indexable by Google.
I have my content files with meta information in plain html, but without menubar etc. and I have my index.htm with all the menubars, javascript AJAX stuff, etc.
To make AJAX indexable for google, my URL's should look like "<something>#!<somthingelse>", which Google indexbot will change to "<something>?_escaped_fragment_=<somethingelse>", such that my server knows it should return the content directly, instead of the page that loads it via AJAX.
However, since my server is stupid, as it doesn't use server side processing, I need to perform a trick via htaccess (which is where I fail :( )
The idea is as following:
I have my fancy URL's http://mysite.com/page1#!1, http://mysite.com/page2#!1, etc.
Normally, htaccess should rewrite that to /index.htm?page=page1 such that my AJAX reads the URL param and automagically loads page1.htm content file.
For Google indexer, it should ignore this rewrite for any url containing "?_escaped_fragment_=1" such that the url points to the content page directly
This way I have to make a small compromise by putting #!1 in every fancy URL, but as far as I can think of, it is the only way to do this without server side processing (except for htaccess of course)
I just cant seem to get the rewrite rules to do this.
Here's what ive come up with so far:
RewriteEngine on
RewriteCond %{QUERY_STRING} (^|.*&)_escaped_fragment_=1(&.*|$)
RewriteRule ^(.*)$ %1 [L,R=301]
RewriteRule ^(.*)$ /index.htm?page=%1 [L,R=301]

htaccess redirect raw image viewing

I'd like to redirect any image that is viewed directly to a handler page:
if http:// mysite.com/pics/filename.jpg (or www.)is in the URL
then redirect to http:// mysite.com/pics/index.php?img=filename.jpg (no www)
But if the image is being called by a webpage or a mobile html5 app anywhere then it should be served per normal.
So if mypage.html contains an img tag with the direct photo in it it will be shown in that page. But if http:// mysite.com/pics/filename.jpg is the url then it should redirect. In other words if the file is being viewed directly it should redirect to the wrapper page, but if it's already in a wrapper page (anywhere) it shouldn't redirect.
I've seen various redirect code but none that that references the visible url for the if statement, so I don't know how to do this. And the ones I've found and tried don't work, either redirecting all requests, or not doing anything
Thanks!
This will take care of people hotlinking to images on your site for you:-
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?mysite.com [NC]
RewriteRule ([^/]+)\.(jpg|jpeg|png|gif)$ pics/index.php?img=$1.$2 [NC,L,R]
As for your mobile HTML5 app you are going to need some other way to identify it to your web server. So you are going to have to see what referer that says it is when it is asking for images. It should be pretty easy for you to write a PHP script to tell you that, then you just need to add to the above code to include that.
Simon

Resources