Redirect AJAX page requests to canonical links with .htaccess - ajax

I'm coding a site that makes heavy use of AJAX to load pages for users with JavaScript, but I also want it to be friendly for users with JavaScript disabled or unavailable. I've covered all the basics; for example, all my links point to canonical links, and JavaScript loads them via AJAX. My "about" page, therefore, is located at /about/, but will load on the main page and will, once finished, utilize hash/hashbang links to enable back-button functionality.
Here's the problem I have: while a hash/hashbang link will be able to be used to link to a specific page via AJAX for users with JavaScript, if a user with JavaScript attempts to link someone without it to the page, the page cannot be loaded for that person using AJAX.
As such, I'd like to be able, if possible, to use .htaccess to redirect hash/hashbang-specified pages to the canonical link. In other words, the exact opposite of what this contributer was trying to achieve.
http://example.com/#!about --> http://example.com/about/
Is it possible with .htaccess, or otherwise without JavaScript? If so, how?
Thanks!

I don't think it's possible to do this on server side. Because the part of the url after # is not included in the request sent to the server.

I might be a bit late to the party on this one, but i'm looking into this too. Since your url already contains the #!, as opposed to #, you can actually do this. Google will fetch
http://example.com/#!about
as
http://example.com?_escaped_fragment_about
Therefore, if you use a redirect 301 on that, and use javascript to redirect the user only version of the page, you have practically reached your desired result.
I realise you asked for a no-javascript solution, but i figure that was for reasons of SEO. For more information, please see this page by google.
EDIT:
<meta http-equiv="refresh" content="5; url=http://example.com/">
Some more on meta refresh here.
It:
1) Does not require javascript!
-
2) Can be Seo friendly!
-
3) Works with bookmarks and history (etc.)
I hope this helps!

Related

Can we identify googlebot like search engines hit on particular URL

My Problem:
My client site which displays more products and it adds more page load/weight. So i decided to use ajax more products loading and it works well. But here it affects the seo - and no products or deals has been indexed(Even i suggest the client to submit product via googlebase but client doesnot like that idea and he wants direct google crawling into site also he wants less time page load).
Question:
Can we identify the googlebot crawling request to the server or mozila like browser user agent request to the site(server).
Suggestion I have
I tried to identify user agent from requests but that doesnot working(or i might missing something?) Please anyone have correct solution for this problem to reduce the page load time using ajax and get googlebot also to crawl the website.
You should just search stackoverflow for "Google AJAX SEO". There are a number of questions around this.
In short, Google has a specification to make AJAX sites crawlable: https://developers.google.com/webmasters/ajax-crawling/docs/getting-started?hl=sv-SE
You can also look into PushState as an SEO option as well.
One tactic that is used to solve this is to harness the pagination function of whatever framework or CMS you are using. You load one page of content and display pagination links in your view then use JavaScript to hide the pagination links and fetch the content of the linked pagination page via Ajax and append it to the current page. Take a look at how infinite-scroll works for inspiration:
http://www.infinite-scroll.com/
Basically you need to be at least loading links to pages that contain the other content so that search engines can crawl the content, but you can hide these links for the users who have JavaScript Enabled.
But to better answer your question, it is possible to redirect robots using htaccess:
redirect all bots using htaccess apache
But it is better SEO, as far as I understand it, to have the content or links to it, actually available on the page.

Is my AJAX content already crawlable?

I have build a site based on Ajax navigation.
I have build it that way, that whenever someone without javascript visits my site, the nav links, which usually load content via Ajax, are acting like normal links and the user can browse through the pages as usual.
Since, Google bot doesn't run javascript, it should theoretically be able to go through all links and corresponding sites as usual, right? Since they are valid links with the href tag pointed to the corresponding site.
Now I was wondering if thats sufficient or if I need to implant this method from Google too to make sure Google sees all my content?
Thanks for your insights and excuse my poor English!
If you can navigate your site by showing source (ctrl-u in chrome), google can also crawl your site. Yes, its that simple

when to use AJAX and when not to use AJAX in web application

We have web applications elgifto.com, roadbrake.com in which we used AJAX at many places, especially to update major portions of a page. All the important functionality of elgifto.com was implemented using AJAX. Now we realize a few issues due to AJAX implementation.
All the content implemented using
AJAX is not available to the SEO
bots and it is hurting the page rank
of our site.
Users will not be able to bookmark
some of the pages as they are always
available through AJAX.
When we want to direct the user from
one page through an anchor link to
another page having AJAX, we find it
difficult.
So now we are thinking of removing AJAX for these pages and use it only for small functionality such as something similar to marking a question as favorite in SO. So before going ahead and removing, we want to know expert's opinion on this. Thanks.
The problem is not "AJAX" per se, but your implementation of it. Just as a for instance, you can fix the 'bookmark' problem like google maps does it: provide a generated link for each state of your webapp.
SEO can befixed by supplying various of these state-links to the crawlers, either organically trough links in your site, or by supplying a list (sitemap).
If you implement 2, you can fix 1 and 3 with those links.
In the end you must figure out if the effort is worth it, and if you are not overusing AJAX ofcourse, but the statements you've made are not set in stone at all.
I'm costantly developing ajax based websites, with no problems for SEO at all. You just have to use it in the best possible way.
For example, I have a website with normal links pointing to normal webpages (PHP pages), this for normal navigation if a user doesn't have JS enabled. But if a user has JS enabled, a script will change the links behavior, only fetching the content of the page needed.
This way you still have phisycal separated webpages with all their content, which will be indexed as normal.

Log in form in a lightbox

We've been trying to implement a site with a http home page, but https everywhere else. In order to do this we hit the rather big snag that our login form, in a lightbox, would need to fetch a https form using ajax, embed it in a http page and then (possibly) handle the form errors, still within the lightbox.
In the end we gave up and just made the whole site https, but I'm sure I've seen a login-in-a-lightbox implementation on other sites, though can't find any examples now I want to.
Can anyone give any examples of sites that have achieved this functionality, or explain how/why this functionality can/can't be achieved.
The Same Origin Policy prevents this. The page is either 100% HTTPS or it's not. The Same Origin Policy sees this as a "different" site if the protocol is not the same.
A "lightbox" is not different than any other HTML - it's just laid out differently. The same rules apply.
One option would be to use an iFrame. It's messy, but if having the whole shebang in https isn't an option, it can get the job done.
you might be able to put the login form into an iframe so that users can login through https while it seems they are on a http page,
but im not sure why you would want to do this.

With Google's #! mess, what effect would a redirect on the converted URL have?

So Google takes:
http://www.mysite.com/mypage/#!pageState
and converts it to:
http://www.mysite.com/mypage/?_escaped_fragment_=pageState
...So... Would be it fair game to redirect that with a 301 status to something like:
http://www.mysite.com/mypage/pagestate/
and then return an HTML snapshot?
My thought is if you have an existing html structure, and you just want to add ajax as a progressive enhancement, this would be a fair way to do it, if Google just skipped over _escaped_fragment_ and indexed the redirected URL. Then your ajax links are configured by javascript, and underneath them are the regular links that go to your regular site structure.
So then when a user comes in on a static url (ie http://www.mysite.com/mypage/pagestate/ ), the first link he clicks takes him to the ajax interface if he has javascript, then it's all ajax.
On a side note does anyone know if Yahoo/MSN onboard with this 'spec' (loosely used)? I can't seem to find anything that says for sure.
If you redirect the "?_escaped_fragment_" URL it will likely result in the final URL being indexed (which might result in a suboptimal user experience, depending on how you have your site setup). There might be a reason to do it like that, but it's hard to say in general.
As far as I know, other search engines are not yet following the AJAX-crawling proposal.
You've pretty much got it. I recently did some tests and experimented with sites like Twitter (which uses #!) to see how they handle this. From what I can tell they handle it like you're describing.
If this is your primary URL
http://www.mysite.com/mypage/#!pageState
Google/Facebook will go to
http://www.mysite.com/mypage/?_escaped_fragment_=pageState
You can setup a server-side 301 redirect to a prettier URL, perhaps something like
http://www.mysite.com/mypage/pagestate/
On these HTML snapshot pages you can add a client-side redirect to send most people back to the dynamic version of the page. This ensures most people share the dynamic URL. For example, if you try to go to http://twitter.com/brettdewoody it'll redirect you to the dynamic (https://twitter.com/#!/brettdewoody) version of the page.
To answer your last question, both Google and Facebook use the _escaped_fragment_ method right now.

Resources