Yandex AJAX crawling - ajax

For AJAX crawling of Googlebot I use "_escaped_fragment_" argument in my website.
Now I checked Yandex's search results for my site.
I saw that AJAX reponses don't exist in search results.
Is there an option for Yandex like "_escaped_fragment_" ?
Else, should I check user agent and if user agent includes "YandexBot" then serve non-AJAX page?
Thank you

I found out that Yandex also supports Google's proposition for AJAX crawling.
No need to change any code if you optimized your site for Googlebot crawling.

Related

Google indexing of AJAX site: how to transition from _escaped_fragment_ method?

My site currently uses hashbang URLs with the Google deprecated recommendation of having a static page served when requesting with _escaped_fragment_ query parameter.
Example static pre-generate page using deprecate method:
https://tweepi.com/app/#!/help is statically served when requesting https://tweepi.com/app/?_escaped_fragment_=/help
I am building a dynamic page and do not want to keep regenerating a static HTML file. I read Google's new recommendation and it says simply do not disallow Googlebot from crawling your site's CSS or JS files.
Assuming a new dynamic page with the URL https://tweepi.com/app/#!/reviews, what status code should the following URL return to ensure best SEO results when Google/Bing crawls my site? 404? 500? 301 redirect?
https://tweepi.com/app/?_escaped_fragment_=/reviews
For SEO bots: html snapshot for url: https://tweepi.com/app/?_escaped_fragment_=/reviews, (status code 200) that's mean nothing special for https://tweepi.com/app/#!/reviews
Users: 404 for https://tweepi.com/app/?_escaped_fragment_=/reviews

Ajax content indexing, Google

I've followed the instructions from the Google website to enable Ajax crawling on my AngularJS site by adding the following meta tag:
<meta name="fragment" content="!">
The rendered content has some links like:
User 1
User 2
User 3
Also some Ajax tabs which render dynamic content like:
Popular
Recent
Looking at the server logs, GoogleBot did came and passed in correctly the _escaped_fragement in the Uri, which is correct:
_escaped_fragment_=%2fpopular
_escaped_fragment_=%2frecent
Problem is that looking at actual indexed content using site:www.somesite.com and logs on server, I see that GoogleBot attempted to index pages like:
/user/1/#!/popular
/user/1/#!/recent
Why would something like this happen considering those urls are relative and don't have #! on them to indicate ajax content and is there a way to prevent this?
If those URLs are available on all pages, it will simply add them.
So, if I would go to: User 1 and there are again Popular there pages, then it's logical that Google loads: /user/1#!/popular
You might want to know that I've solved this puzzle with a script that's on Github: https://github.com/kubrickology/Logical-escaped_fragment
Simply build your AJAX pages with: __init()

Why is my ajax content not being indexed by google

I have tried to set my site up ( http://www.diablo3values.com )according to the guidelines set out here : https://developers.google.com/webmasters/ajax-crawling/ However, it appears that Google has updated their indexes (because I see the revisions to the meta description tags) but the ajax content does not show up in the index.
I am trying to use the “Handle pages without hash fragments” option.
If you view either of the following:
http://www.diablo3values.com/?_escaped_fragment_=
http://www.diablo3values.com/about?_escaped_fragment_=
you will correctly see the HTML snap shot with my content. (those are the two pages I an most concerned about).
Any Ideas? Am I doing something wrong? How do you get google to correclty recognize the tag.
I'm typing this as an answer, since it got a little to long to be a comment.
First of all, your links seems to point to localhost:8080/about, and not /about, which probably is why google doesn't index it in the first place.
Second, here's my experience with pushstate urls and Google AJAX crawling:
My experience is that ajax crawling with pushstate urls is handled a little differently by google than with hashbang urls. Since google won't know that your url is a pushstate url (since it looks just like a regular url), you need to add <meta name="fragment" content="!"> to all your pages, not only the "root" page. And google doesn't seem to know that the pages are part of the same application, so it treats every page as a separate Ajax application. So the Google bot will never actually create a navigation structure inside _escaped_fragment_, like _escaped_fragment_=/about, as it would with a hashbang url (#!/about). Instead, it will request /about?_escaped_fragment_= (which you aparently already have set up). This goes for all your "deep links". Instead of /?_escaped_fragment_=/thelink, google will always request /thelink?_escaped_fragment_=.
But as said initially, the reason it doesn't work for you is probably because you have localhost:8080 urls in your _escaped_fragment_ generated html.
Googlebot only knows to crawl the escaped fragment if your urls conform to the hash bang standard. As users navigate your site, your urls need to be:
http://www.diablo3values.com/
http://www.diablo3values.com/#!contact
http://www.diablo3values.com/#!about
Googlebot actually needs to see these urls in the source code so that it can follow them. Then it knows to download the following urls:
http://www.diablo3values.com/?_escaped_fragment=contact
http://www.diablo3values.com/?_escaped_fragment=about
On your site you appear to be loading a new page on each click, and then loading the content of each page via AJAX too. This is not how I would expect an AJAX site to work. Usually the purpose of using AJAX is so that the user never has to load a whole new page. When the user clicks, the new content section is loaded and inserted into the page. You serve the navigation once and then you only serve escaped fragments of the content.

Google crawling, AJAX and HTML5

HTML5 allows us to update the current URL without refreshing the browser. I've created a small framework on top of HTML5 which allows me to leverage this transparently, so I can do all requests using AJAX while still having bookmarkable URLs without hashtags. So e.g. my navigation looks like this:
<ul>
<li>Home</li>
<li>News</li>
<li>...</li>
</ul>
When a user clicks on the News link, my framework in fact issues an AJAX GET request (jQuery) for the page and replaces the current content with the retrieved content. After that, the current URL is updated using HTML5's pushState(). However, it is still equally possible to just type http://www.example.com/news in the browser, in which case the content will be provided synchronously of course.
The question now is, will Google crawl the pages for this site? I know that Google provides a guide for crawling Ajax applications, but the article supposes that hashtags are used for bookmarkability, and I don't (want to) use hashtags.
Since you have actual hard links to the pages and they load the same content, Google will crawl your site just fine.

Ajax generated content, crawling and black listing

My website uses ajax.
I've got a user list page which list users in an ajax table (with paging and more information stuff...).
The url of this page is :
/user-list
User list is created by ajax. When the user click on one user, he is redirected to a page which url is : /member/memberName
So we can see here that ajax is used to generate content and not to manage navigation (with the # character).
I want to detect bot to index all pages.
So, in ajax I want to display an ajax table with paging and cool ajax effetcs (more info...) and when I detect a bot I want to display all users (without paging) with a link to the member page like this :
JohnBob...
Do you think I can be black listed with this technique ? If you think so, could you please provide an alternative solution by keeping these clean urls and without redeveloping the user-list (without ajax) ?
Google support a specification to make AJAX crawlable:
http://code.google.com/web/ajaxcrawling/docs/specification.html
I did an experiment and it works:
http://seo-website-designer.com/SEO-Ajax-Google-Solution
As this is a Google specification, you won't get penalised (unless you abuse it).
Saying that, only Google support it at the moment (AFAIK).
Also, I believe following the concept of Progressive Enhancement is a better approach. That is, create a working html website then make the JavaScript enhance it
Maybe use the urls with an onclick to trigger your AJAX scripting? Like
Some URL
I don't think Google would punish you for this, you primarily use JScript, but you do provide a fall back for their bot, so your site doesn't get any less accessible.
EDIT
Ok, I misunderstood. Then my guess would be you basically have two options:
1. Write a different part of your site where bots end up, or,
2. Rewrite your current site to for example always give a 'full' page, with an option to only get, say, the content div. Then you can get only the content with JavaScript, but bots will always get a nice page.

Resources