How does Facebook Ajax work? 2-3 months ago they were using # but now the whole addressbar is changing.
The first approach used is called "Ajax Crawling" (also refer to this answer).
But I think the new approach you are talking about is just the HTML5 History API. Github is using this approach for their tree browsing, and you can learn more about it here. (I recommend ALL readers to read and watch the video as it's very informative)
EDIT:
Just to point out that Facebook is definitely using the HTML5 History API (direct link from the previous github article).
They still use # as far as I can tell (but maybe we are on different versions?). For me, their links are for different pages, but they intercept my onclick and change the click to an Ajax request instead. Maybe this is to make cleaner URLs when copying and/or make it work without JS?
Related
Where I'm at: I've read Google's documentation regarding it's AJAX crawling, and I've searched around a bit in this website and others, but I'm quite confused, as it seems that all problems address the same issue: AJAX crawing with hashbangs?
I've developed an app which, among other purposes, let's the user search for locations worldwide, using an AJAX searcher quite similar to Google's, but my app uses exclusively the question mark in AJAX, instead of hashbang. Due to compatibility issues, changing it to the hashbang is not an option.
Not only am I largely confused by the fact that I could not find anyone else using the question mark instead of the hashbang, I'm also wondering if there is any documentation regarding my issue: how to let google bot crawl all my AJAX content when I'm using the question mark instead of a hashbang in my AJAX app.
The AJAX crawling schema was created explicitly for applications and websites using hashbang (#!) in the URL structure, because the fragment part of the URLs only exist on the client side; the URL rewriting in the specs, i.e. from #! to ?_escaped_fragment_= is meant to solve that.
Since most of the web is already making use of Javascript in a way or other, we (Google) needed a better solution, so we started executing Javascript in the pages we crawled and effectively render every page, just like a normal browser would. To quote our blogpost, Understanding web pages better:
In order to solve this problem, we decided to try to understand pages by executing JavaScript. It’s hard to do that at the scale of the current web, but we decided that it’s worth it. We have been gradually improving how we do this for some time. In the past few months, our indexing system has been rendering a substantial number of web pages more like an average user’s browser with JavaScript turned on.
You can also see what we "see" using Fetch as Google in Search Console (former Webmaster Tools); read more about the feature in our post titled Rendering pages with Fetch as Google
Before you do anything else, please try to fetch a few pages from your site with Fetch as Google. You might not have to do anything at all, it might actually work out of the box. And the good news is that it's not only Google that's rendering pages!
I have a project which is heavily JavaScript based (e.g. node.js, backbone.js, etc.). I'm using hashbang urls like /#!/about and have read the google ajax crawlable spec. I've done a wee bit of headless UI testing with zombie and can easily conceive of how this could be done by setting a slight delay and returning static content back to the google bot. But I don't really want to implement this from scratch and was hoping there was a pre-existing library that fits in with my stack. Know of one?
EDIT: At time of writing I don't think this exists. However, rendering using backbone (or similar) on server and client is a plausible approach (even if not a direct answer). So I'm going to mark that as answer although there may be better solutions in the future.
Just to chime in, I ran into this issue too (I have very ajax/js heavy site), and I found this which may be of interest:
crawlme
I have yet to try it but it sounds like it will make the whole process a piece of cake if it works as advertised! it's a piece of connect/express middleware that is simply inserted before any calls to pages, and apparently takes care of the rest.
Edit:
Having tried crawlme, I had some success, but the backend headless browser it uses (zombie.js) was failing with some of my javascript content, likely because it works by emulting the DOM and thus won't be perfect.
Sooo, instead I got hold of a full webkit based headless browser, phantomjs, and a set of node linkings for it, like this:
npm install phantomjs node-phantom
I then created my own script similar to crawlme, but using phantomjs instead of zombie.js. This approach seems to work perfectly, and will render every single one of my ajax based pages perfectly. the script I wrote to pull this off can be found here. to use it, simply:
var googlebot = require("./path-to-file");
and then before any other calls to your app (this is using express but should work with just connect too:
app.use(googlebot());
the source is realtively simple minus a couple of regexps, so have a gander :)
Result: AJAX heavy node.js/connect/express based website can be crawled by the googlebot.
There is one implementation using node.js and Backbone.js on the server and browser
https://github.com/Morriz/backbone-everywhere
crawleable nodejs module seems to fit this purpose: https://npmjs.org/package/crawlable and example of such SPA that can be rendered server-side in node https://github.com/trupin/crawlable-todos
Backbone looks interesting:
http://documentcloud.github.com/backbone/
http://lostechies.com/derickbailey/2011/09/26/seo-and-accessibility-with-html5-pushstate-part-1-introducing-pushstate/
How does Google Suggest work? How does it manage to update the web page on the client so quickly, based on information in a distant Google database? Why does the web page not look ‘jumpy’ if it is being frequently updated?
It uses AJAX.
When you are writing your query, it searches for the 10 most requested words matching yours. Then it writes minified JSON on an invisible DIV element. Fast, but still resource intensive.
Try to install Firebug on Firefox or use the Developer Console on Chrome, open the console and start writing "Youtube" or whatever you want. You will see the minified JSON responses.
Good luck :D
In addition to the front-end handling others have talked about, which jQuery is a great example of, you might also be interested in how they approach the idea on the backend. Dr. Peter Norvig has written about how to create a spelling corrector, where similar approaches could be used to find close matches.
The whole page is not being updated. Only parts of it are using AJAX - Asynchronous Javascript and XML. Ajax requests can be made in Javascript, and the page updated when the response comes back.
A far more interesting question is how does Google actually search 10bn+ documents in a teeny tiny fraction of a second :)
Facebook is doing Ajax History (Back and Forward button) and Bookmark using #! instead of just # in the URL. Is it always a good idea to do that, because I was thinking that a usual anchor could interfere with the Ajax History mechanism to trigger it into processing a normal anchor.
So, the Ajax History function will only process a hash portion only when it sees #! instead of just #.
And is using ! compatible with major browsers? If Facebook is using !, a guess is that it may be fairly well supported.
See Google's Making AJAX Applications Crawlable
for a possible use case (don't know if this is why Facebook used this fragment).
Update: This answer has been superseeded by this article. It discusses the issues with the Hashbang (#!), Hashes (#) and the HTML5 History API (pushState, popState) and the solutions.
In regards to usability on your website, it doesn't matter and you can use anything you like.
In regards to search engine optimization, having it and not having it both provide different avenues to go down.
For instance, Facebook uses the ! according to the Google Proposal for Making Ajax Applications Crawlable. Adding the ! will tell google that it should listen in on that ajax request and add that url to search engine results. This is great for websites which have already implemented ajax, as all you need to do is add the !.
The downside of this is that it only solves the problem of making your ajax crawlable. It does not solve the problems of:
Keeping the URLs clean and consistent for Ajax and Non-Ajax users. Eg. you could end up with www.facebook.com/profile.php?pid=123#!profile.php?pid=123
Keeping the website accessible by Non-Ajax users.
Keeping the URLs the same for both Ajax and Non-Ajax users.
Requires some severely complicated server side changes for escaping and translation of states in regards to query strings.
It is not compatible with the new HTML5 PopState functionality which is designed to truly solve these problems.
For websites which don't currently use ajax for everything, due to the problems above it is far better NOT using the Google Proposal as it is only workaround for sites like facebook which went ajax crazy and needed a desperate solution to SEO. There are alternatives which solve more of these problems (and with the HTML5 PopState now available can solve all the problems). One such alternative is jQuery Ajaxy (as seen on balupton.com) which works by simply upgrading your website into a ajax application, while keeping the experience for a Ajax-Enabled rich and interactive and continuing to work perfectly for Ajax-Disabled users.
i am planing to start a full ajax site project, and i was wondering about SEO.
The site will have urls like www.mysite.gr/#/category1 etc
Can Google crawl the site.
Is something that i have to noticed about full ajax and SEO
Any reading suggestions are welcome
Thanks
https://stackoverflow.com/questions/768233/do-hashes-in-urls-affect-seo
You might want to read about so called progressive enhancement.
Google supports indexing of AJAX sites, but unfortunately it involves extra work for the developer. See http://code.google.com/web/ajaxcrawling/docs/getting-started.html
I don't think Google is capable of doing so (yet)
http://googlewebmastercentral.blogspot.com/2009/10/proposal-for-making-ajax-crawlable.html
However you can of course make your site usable with or without JavaScript. That way, browsers will have the full candy stuff and Google (and text browsers) still can navigation your site.
In addition to SEO, you also need to think about usability standards here. A site that is that reliant on AJAX isn't going to work for things like screen-readers as well as spiders. You need a system for graceful degreadation. A website that can't function without JavaScript isn't really a functioning website.
The search engines will spider the initial page load - what happens to the page (with ajax) after that is irrelevant to listings.
Google itself doesn't crawl ajax content but advice a mechanism for it. For this you first need to change # to #!
Whole process to SEO AJAX content is explained here along with simple asp.net code to start working on it.
Imagine having to hit the “refresh” button in your browser to update your Twitter feed rather than just hitting the button on the page itself and having it instantly update? These are the types of problems that AJAX solves, although it does come with its pitfalls. Google might claim it’s able to crawl and parse AJAX websites, yet it’s risky to just take its word for it and leave your website’s organic traffic up to chance. Even though Google can usually index dynamic AJAX content, it’s not always that simple. This guide covers some of the things that can go wrong and how you can make sure your AJAX website is crawlable: https://prerender.io/ajax-seo/