Ajax content indexing, Google - ajax

I've followed the instructions from the Google website to enable Ajax crawling on my AngularJS site by adding the following meta tag:
<meta name="fragment" content="!">
The rendered content has some links like:
User 1
User 2
User 3
Also some Ajax tabs which render dynamic content like:
Popular
Recent
Looking at the server logs, GoogleBot did came and passed in correctly the _escaped_fragement in the Uri, which is correct:
_escaped_fragment_=%2fpopular
_escaped_fragment_=%2frecent
Problem is that looking at actual indexed content using site:www.somesite.com and logs on server, I see that GoogleBot attempted to index pages like:
/user/1/#!/popular
/user/1/#!/recent
Why would something like this happen considering those urls are relative and don't have #! on them to indicate ajax content and is there a way to prevent this?

If those URLs are available on all pages, it will simply add them.
So, if I would go to: User 1 and there are again Popular there pages, then it's logical that Google loads: /user/1#!/popular
You might want to know that I've solved this puzzle with a script that's on Github: https://github.com/kubrickology/Logical-escaped_fragment
Simply build your AJAX pages with: __init()

Related

"Fetch as Google" renders all pages to look like my homepage

I am trying to figure out why my website's posts and pages such as my resume are getting a "Complete" status with a green check mark (seemingly no errors or redirects) when fetching and rendering as google, but all of them "render" and look like my homepage. The page speed insights tool seems to be using the same rendering engine as it seems to have the same issue.
Notes:
The html served from my website on initial page load is the correct HTML and content. No redirects occur. The initial page load does not fetch content via JS. I mention this because although my website is not a one page application (I'm using Wordpress), I do use ajax in combination with a post variable flag to fetch new page content when the user navigates to the next page (after the initial page load).
I have verified that all of my pages have been indexed using the "site:" trick in Google search. They are indexed properly, but they aren't "rendering" properly.
Should I be worried? Should I just ignore that the pages aren't rendering properly? It doesn't make any sense. Is anyone else having this issue?
Your resume page has a response type of content-type image/gif so google thinks that the page is an image??

Prevent usesrs from landing on non ajax #! pages on my site without loop and SEO _escaped_fragment trouble

My site is AJAX but it pulls content from .html files. Some of those files have been indexed without the #!, so they just function as a basic html site. I want to redirect users that land on the non ajax page to the #! version. I tried a redirect (without thinking about it) and it created an endless loop with the dynamic content.
If you look at the code, you will see that it uses js to place the static pages into a content wrapper.
I am equally having trouble with an seo issue, where google does not appear to be requesting the escaped_fragment version... that or I need some help. I thought that since it was pulling content from html files, I could just copy those pages and add name it _escaped_fragment_=page.html it is not working. I tried a redirect, but google fetch just showed the redirect request and not content.
It was a template that I purchased... I figured out how to modify the theme and content, but this is beyond me.
Closed
I decided to scrap the hashbang method. I have real pages, and I decided to let them be searched and indexed. I am waiting on a solution to pull only the body into the ajax content warapper; however, I was able to apply basic CSS to the pages without messing anything up when loaded into the main page via ajax.
I used
$("a").attr("href", function(i, href) (some js stuff to add a hash-- hostname +# href)
to add a hash to the clean urls that were internal from the main menu. This created a loop if added to the pages, so I used a clean url with an onclick redirect to the ajax version. "/" before the link.
onclick="window.location = '/#link.html';return false;"
I had a JS redirect that detected if there was a hash before the page link, and if not, added it; however, google did not like it! Sure the pages are not as nice. That said, I have content for non JS enabled browsers. As soon as I get the main.js modified so that it ignores head elements, I can dress them up even more. Each page has links that will get a user to the ajax version, including the home button "/#".

Why is my ajax content not being indexed by google

I have tried to set my site up ( http://www.diablo3values.com )according to the guidelines set out here : https://developers.google.com/webmasters/ajax-crawling/ However, it appears that Google has updated their indexes (because I see the revisions to the meta description tags) but the ajax content does not show up in the index.
I am trying to use the “Handle pages without hash fragments” option.
If you view either of the following:
http://www.diablo3values.com/?_escaped_fragment_=
http://www.diablo3values.com/about?_escaped_fragment_=
you will correctly see the HTML snap shot with my content. (those are the two pages I an most concerned about).
Any Ideas? Am I doing something wrong? How do you get google to correclty recognize the tag.
I'm typing this as an answer, since it got a little to long to be a comment.
First of all, your links seems to point to localhost:8080/about, and not /about, which probably is why google doesn't index it in the first place.
Second, here's my experience with pushstate urls and Google AJAX crawling:
My experience is that ajax crawling with pushstate urls is handled a little differently by google than with hashbang urls. Since google won't know that your url is a pushstate url (since it looks just like a regular url), you need to add <meta name="fragment" content="!"> to all your pages, not only the "root" page. And google doesn't seem to know that the pages are part of the same application, so it treats every page as a separate Ajax application. So the Google bot will never actually create a navigation structure inside _escaped_fragment_, like _escaped_fragment_=/about, as it would with a hashbang url (#!/about). Instead, it will request /about?_escaped_fragment_= (which you aparently already have set up). This goes for all your "deep links". Instead of /?_escaped_fragment_=/thelink, google will always request /thelink?_escaped_fragment_=.
But as said initially, the reason it doesn't work for you is probably because you have localhost:8080 urls in your _escaped_fragment_ generated html.
Googlebot only knows to crawl the escaped fragment if your urls conform to the hash bang standard. As users navigate your site, your urls need to be:
http://www.diablo3values.com/
http://www.diablo3values.com/#!contact
http://www.diablo3values.com/#!about
Googlebot actually needs to see these urls in the source code so that it can follow them. Then it knows to download the following urls:
http://www.diablo3values.com/?_escaped_fragment=contact
http://www.diablo3values.com/?_escaped_fragment=about
On your site you appear to be loading a new page on each click, and then loading the content of each page via AJAX too. This is not how I would expect an AJAX site to work. Usually the purpose of using AJAX is so that the user never has to load a whole new page. When the user clicks, the new content section is loaded and inserted into the page. You serve the navigation once and then you only serve escaped fragments of the content.

What to put in HTML snapshot for hash-bang URL for SEO?

I am using hash-bang URLs in my AJAX application and I am implementing the server-side for:
handle ?_escaped_fragment_=key1=value1%26key2=value2
So when I look at Google's FAQ, it says that this URL has an equivalent snapshot
It is easy to see that the snapshot content is not the same as corresponding hash-bang url. This Google example does not help and therefore my question:
My HTML page has three components/panels/sections that are being updated by AJAX. I use the onclick event on the hash-bang URLs to fetch the content from server and then update relevant section of the HTML page. My panels are updated independent of each other and each panel has its own hash-bang URL .
My question is:
Should the HTML snapshot contain the entire page with all 3 sections or only the updated section?
If I am to return the entire page, it is almost impossible to get the state of the other 2 sections correctly, so would the Googlebot reject my site if the other 2 sections are returned in their default state ?
this is a good question, sadly no answer for this one :( im looking for the same. My problem is that EVERYTHING are news loaded with ajax, so each news is actually a little peace of text so im asking myself if my snapshots should be only the current new or a full page with all the info that i have in my home plus current new's content
Do you have news about that topic ?

Google crawling, AJAX and HTML5

HTML5 allows us to update the current URL without refreshing the browser. I've created a small framework on top of HTML5 which allows me to leverage this transparently, so I can do all requests using AJAX while still having bookmarkable URLs without hashtags. So e.g. my navigation looks like this:
<ul>
<li>Home</li>
<li>News</li>
<li>...</li>
</ul>
When a user clicks on the News link, my framework in fact issues an AJAX GET request (jQuery) for the page and replaces the current content with the retrieved content. After that, the current URL is updated using HTML5's pushState(). However, it is still equally possible to just type http://www.example.com/news in the browser, in which case the content will be provided synchronously of course.
The question now is, will Google crawl the pages for this site? I know that Google provides a guide for crawling Ajax applications, but the article supposes that hashtags are used for bookmarkability, and I don't (want to) use hashtags.
Since you have actual hard links to the pages and they load the same content, Google will crawl your site just fine.

Resources