Google AdSense bot's algorithm and behavior - algorithm

I am interesting in Google AdSense bot's algorithm and behavior with web site. I did not work with AdSense and i do not have account. So i need your help to understand:
1) Gbot from time to time downloads all pages from web site. Am i right?
2) Gbot do not understand dynamic content (loaded by ajax). So i must generate static content and return it within html page and this pages must show identical content for all users and for Gbot?
3) Because of (1) and (2) i cannot use only root path http://example.com with some "main" widget. I must generate unique pages for example http://example.com/thread?id=101 ?
4) Gbot downloads pages (1) for grabbing (indexing) keywords from them and then store (on it's servers) these information for example by key/value (where key is page path, value is tag cloud). Am i right?
5) When web site has been opened in browser by user. Integrated html AdSense's code loads some JavaScript. As i understand by "googling" this JavaScript do not index page, but makes call (with some parameter key==page_path) to Google's server and gets appropriate ad links. Then shows this ad links in it's frame. Is it right behavior? Maybe JavaScript makes some local indexing of page's content?
6) How Gbot and AdSense's JavaScript work with cookies? As i understand AdSense can use cookies for show appropriate ad links. If it is right, please give me some use cases;)
I know that "true" algorithm is known only by engineers from Google. But some of you had experience with AdSense and AdSense html/javascript. Please correct my vision of it;)
Thank you very much for any advice!!!
P.S. This question is very important for me. It is not some question for fun! So Please do not close it;)

1) Yes if Googlebot can access the pages and if it knows about the pages through a link, XMLSitemaps, Google +1, etc.
2) Googlebot will now make AJAX / XHR requests to understand AJAX content (http://googlewebmastercentral.blogspot.com/2011/11/get-post-and-safely-surfacing-more-of.html).
Yes, you should show the same content to Googlebot as you would Users, otherwise this would be consider cloaking, which is against their guidelines.
3) This question isn't clear. But basically it's preferable to have the URL change because Google will then know how to index the content separately. If you're using AJAX then you might want to consider permalinks like you suggested, or you can use HTML5 popstate.
4) Yes Google will index the words on the page. I'm not certain they store it as a key/value pair. I'm not even sure if they're still using Big Table (http://labs.google.com/papers/bigtable.html) ... but it's likely they use Big Table or a similar system to store the inverted index.
5) The Adsense code is embedded Javascript ... for new webpages that Google hasn't seen before, it tries to deliver the most relevant ads based on the information it's found on the web about the site or possibly through anchor text of links pointing to that page. However, to get a more accurate understanding of the content of the page, Google will send an adsense specific bot to crawl your page ... sometimes you'll see it come very fast, even as soon as you load the page for the first time. It uses a different user agent than the traditional Googlebot ... you can find all the User Agents from Google here (http://www.google.com/support/webmasters/bin/answer.py?answer=1061943)
6) Google's crawlers don't accept cookies and won't pass back cookies to your server. It has to do with the massively distributed nature of Google crawlers that makes maintain cookies or sessions extremely difficult.

Related

SEO with angularjs and asp.net restfull service

I have developed a website using angularjs and web api.
The problem is that the ajax rendered content is not crawable by google. And no one can find the website using google search.
After reading many articles regarding this issue, including:
This one with all links of explanation going out,
Google ajax crawling protocol, and also stack over flow question, I couldn't find the proper solution. Those that mention asp.net solutions, are talking about mvc, and I need only the simple REST by web api, other articles are not talking about asp.net.
Is there any simple explanation?
I'm the one who asked this same question long ago, so I will answer from my experience:
Firstly, if all your content are accessible via unique URIs (including the hashbang if you use it), modern search engines should index it just fine. In fact Google can index javascript generated content now. You can try that via the Google Webmaster tool and see how your site is indexed.
Secondly, there are libraries that help you to serve parsed content to search engines if you need to, but in my case I didn't bother much with it since Google is indexing js nicely.
I've seen others ask this question, and maybe I'm missing something or this is outdated, but I don't see why AngularJS needs to be an issue with SEO.
Say you have a landing page and it has a bunch of links. Assuming you're using html5 mode in AngularJS (and I'm not sure that's 100% necessary) and something like ng-route then the links on the landing page can work both as "angular" (JavaScript) links and "old school" (full page load) links.
If you're a human user you can click a link and it will do angular magic and adjust the content without loading the full page. Ok, all fine.
But if you instead copy the link and paste it in a new tab or new browser, it will still work - assuming you've set up routes correctly.
I'm not an SEO expert by any stretch of the imagination, but as I understand it, having links that load pages and having those pages have real and useful content is the core of SEO, and done this way, AngularJS should work fine. The key thing to check is if you copy and paste the link (not just click it) that it works.

AdSense on history.pushState enabled page

First off, I know this has been discussed over and over again. But let's take this as a "late 2012 edition" since things tend to change rapidly on the internet.
I have this web page which is a "classical" web page with full page refreshes. Every internal click produces new content. We can show AdSense ads this way without a problem.
Now I started looking into "ajaxifying" (PJAX) the whole page for performance reasons (I've actually made a prototype version and it works superbly). The whole thing works only on browsers that support history.pushState, and whenever a user clicks on a internal link a AJAX request is triggered that fetches only the content part of the page (everything between the header and footer) and replaces old content with it.
The end result is, that the user is presented with a brand new page (including the changed URL and what not) and only the mechanism for delivering the page has changed (full reload vs. AJAX). As far as google (and older browsers) is concerned this is still a regular page with regular links (progressive enhancement and all that).
And yet there isn't a way to display AdSense, what with the document.write's and AdSense's TOS ruining the party.
My question: is there a Google approved (I'm not interested in hacks that will get us banned) way to display AdSense ads on a page like this (and I haven't found it). Or if there isn't, does Google have any plans on supporting this in the future (again, I haven't found anything related to this).
update
After some more digging around I came across Google DFP, which seems to support async loading of adds. But, I'm not sure I can load AdSense ads through it dynamically without breaking the TOS. I'm 100% sure I can load other ads this way, but not for AdSense. Could somebody clear this up for me?
According to this page loading Adsense ads through DFP you are subject to the both the DFP and Adsense terms. So I guess if you are following the current Adsense terms you are not allowed to do what you are talking about... at the same time Google provides a rather easy method to do exactly what you want to do with DFP...
Its still a grey area...

Can we identify googlebot like search engines hit on particular URL

My Problem:
My client site which displays more products and it adds more page load/weight. So i decided to use ajax more products loading and it works well. But here it affects the seo - and no products or deals has been indexed(Even i suggest the client to submit product via googlebase but client doesnot like that idea and he wants direct google crawling into site also he wants less time page load).
Question:
Can we identify the googlebot crawling request to the server or mozila like browser user agent request to the site(server).
Suggestion I have
I tried to identify user agent from requests but that doesnot working(or i might missing something?) Please anyone have correct solution for this problem to reduce the page load time using ajax and get googlebot also to crawl the website.
You should just search stackoverflow for "Google AJAX SEO". There are a number of questions around this.
In short, Google has a specification to make AJAX sites crawlable: https://developers.google.com/webmasters/ajax-crawling/docs/getting-started?hl=sv-SE
You can also look into PushState as an SEO option as well.
One tactic that is used to solve this is to harness the pagination function of whatever framework or CMS you are using. You load one page of content and display pagination links in your view then use JavaScript to hide the pagination links and fetch the content of the linked pagination page via Ajax and append it to the current page. Take a look at how infinite-scroll works for inspiration:
http://www.infinite-scroll.com/
Basically you need to be at least loading links to pages that contain the other content so that search engines can crawl the content, but you can hide these links for the users who have JavaScript Enabled.
But to better answer your question, it is possible to redirect robots using htaccess:
redirect all bots using htaccess apache
But it is better SEO, as far as I understand it, to have the content or links to it, actually available on the page.

Is my AJAX content already crawlable?

I have build a site based on Ajax navigation.
I have build it that way, that whenever someone without javascript visits my site, the nav links, which usually load content via Ajax, are acting like normal links and the user can browse through the pages as usual.
Since, Google bot doesn't run javascript, it should theoretically be able to go through all links and corresponding sites as usual, right? Since they are valid links with the href tag pointed to the corresponding site.
Now I was wondering if thats sufficient or if I need to implant this method from Google too to make sure Google sees all my content?
Thanks for your insights and excuse my poor English!
If you can navigate your site by showing source (ctrl-u in chrome), google can also crawl your site. Yes, its that simple

when to use AJAX and when not to use AJAX in web application

We have web applications elgifto.com, roadbrake.com in which we used AJAX at many places, especially to update major portions of a page. All the important functionality of elgifto.com was implemented using AJAX. Now we realize a few issues due to AJAX implementation.
All the content implemented using
AJAX is not available to the SEO
bots and it is hurting the page rank
of our site.
Users will not be able to bookmark
some of the pages as they are always
available through AJAX.
When we want to direct the user from
one page through an anchor link to
another page having AJAX, we find it
difficult.
So now we are thinking of removing AJAX for these pages and use it only for small functionality such as something similar to marking a question as favorite in SO. So before going ahead and removing, we want to know expert's opinion on this. Thanks.
The problem is not "AJAX" per se, but your implementation of it. Just as a for instance, you can fix the 'bookmark' problem like google maps does it: provide a generated link for each state of your webapp.
SEO can befixed by supplying various of these state-links to the crawlers, either organically trough links in your site, or by supplying a list (sitemap).
If you implement 2, you can fix 1 and 3 with those links.
In the end you must figure out if the effort is worth it, and if you are not overusing AJAX ofcourse, but the statements you've made are not set in stone at all.
I'm costantly developing ajax based websites, with no problems for SEO at all. You just have to use it in the best possible way.
For example, I have a website with normal links pointing to normal webpages (PHP pages), this for normal navigation if a user doesn't have JS enabled. But if a user has JS enabled, a script will change the links behavior, only fetching the content of the page needed.
This way you still have phisycal separated webpages with all their content, which will be indexed as normal.

Resources