How to ensure ajax content hosted on S3 gets indexed? - ajax

I have a number of sites that are completely hosted on Amazon S3 but the page is generated via JavaScript. I would like to make sure these sites are indexed by Google, but since they are hosted on S3 I don't seem to have a mechanism to serve up _escaped_fragment_ versions of the page. Does anyone have an idea on how I could get the Ajax content indexed? I would prefer to not have to replicate my templating server side.
Here is an example of one of my sites:
http://www.web608.org/

There is no real easy way to do this. I'd suggest you read Google's recommendations on this topic:
https://developers.google.com/webmasters/ajax-crawling/

The best practice I have seen is progressive enhancement. Create a working page in plain html and then use JavaScript to make it cool.

Related

SEO with angularjs and asp.net restfull service

I have developed a website using angularjs and web api.
The problem is that the ajax rendered content is not crawable by google. And no one can find the website using google search.
After reading many articles regarding this issue, including:
This one with all links of explanation going out,
Google ajax crawling protocol, and also stack over flow question, I couldn't find the proper solution. Those that mention asp.net solutions, are talking about mvc, and I need only the simple REST by web api, other articles are not talking about asp.net.
Is there any simple explanation?
I'm the one who asked this same question long ago, so I will answer from my experience:
Firstly, if all your content are accessible via unique URIs (including the hashbang if you use it), modern search engines should index it just fine. In fact Google can index javascript generated content now. You can try that via the Google Webmaster tool and see how your site is indexed.
Secondly, there are libraries that help you to serve parsed content to search engines if you need to, but in my case I didn't bother much with it since Google is indexing js nicely.
I've seen others ask this question, and maybe I'm missing something or this is outdated, but I don't see why AngularJS needs to be an issue with SEO.
Say you have a landing page and it has a bunch of links. Assuming you're using html5 mode in AngularJS (and I'm not sure that's 100% necessary) and something like ng-route then the links on the landing page can work both as "angular" (JavaScript) links and "old school" (full page load) links.
If you're a human user you can click a link and it will do angular magic and adjust the content without loading the full page. Ok, all fine.
But if you instead copy the link and paste it in a new tab or new browser, it will still work - assuming you've set up routes correctly.
I'm not an SEO expert by any stretch of the imagination, but as I understand it, having links that load pages and having those pages have real and useful content is the core of SEO, and done this way, AngularJS should work fine. The key thing to check is if you copy and paste the link (not just click it) that it works.

Angular JS *non-SPA* SEO

Non-SPA AJAX Partials for SEO
Sadly, 101% of the Angular SEO examples assume the use of a singe-page-application (SPA). My app is not a SPA. Currently, my stack is:
Node/Express - for routing and rendering Jade templates. The URLs are real, and don't use HTML pushstate, hash-bang or anything similar. for this reason, url-escaped-fragment won't work for me (I don't think)
Angular for communicating with my RESTful API(s)
My problem is that my page itself only includes pieces that are loaded via AJAX—the rest of page is rendered server side. Node/Express is not responsible for any of this logic, Angular pulls in the data that will be in my first h1.
Google Bot and similar see: <h1>{{this_unrendered_string}}</h1> which is no good.
Has anyone come up with any clever solutions for working around this scenario?
FWIW I found a service called SEO.js that will host a rendered version of any page I pass to it. If I could just tell GoogleBot and similar "Hey, don't use this page, use this page instead" But I'm not entirely sure how SEO feels about a different host serving content. Maybe some trickery could work here..
Google have documented an approach to "Making AJAX Applications Crawlable" here. https://developers.google.com/webmasters/ajax-crawling/
Implementing this isn't completely simple (basically you have to run a headless browser and return the HTML snapshots in response to specially formatted requests by Google).
It's not a simple as just returning a snapshot when you detect GoogleBot, but doing it this way probably eliminates any risk of being penalized.
There are a few companies that offer this a service - I'm getting on well with this one: https://ajaxsnapshots.com - they say that Bing and Yandex (Russian search engine) support it too.
AjaxSnapshots have an API you can use to tell them when your page is ready to snapshot - you could call that after all of your client side rendering is done.

Google crawl ajax / dynamically generated content - SEO

I've got a very unique situation that I don't believe any of the other topics here can relate.
I have a ecommerce module that is dynamically loaded / embedded into third party sites, no iframe straight JSON to web client into content. I have no access to these third part sites at all, other then my javascript file being loaded from their page and dynamically generating the content.
I'm aware of the #! method, but that's no good here, my JS does generate "urls" within the embedded platform, but they're fake and for the address bar only, and I don't believe google crawlers can reach this far.
So my question is, is there a meta that we can set to point outside the url to i.e. back to my server with static crawlable content. I.e. pointing the canonical to my server... but again I don't think that would work.
If you implement #! then you have to make sure the url your embedded in supports the fragment parameter versions, which you probably can't. It's server side stuff.
You probably can't influence the canonical tag of the page either. It again has to be done server side. Any meta tag you set via JavaScript will not be seen by a bot.
Disqus solved the problem by providing an API so the embedding websites could get there comments server side and render then in plain html. WordPress has a plugin to do this. Disqus are also one of the few systems that Google has worked out how to crawl their AJAX pages.
Some plugins request people to also include a plain link with the JavaScript. Be careful with this as you may break Google Guidelines if you do it wrong. But you may be able to integrate the plain link with your plugin so that it directs bots and users to a crawlable version of the content.
Look into Google's crawlable ajax standard (and why it's a bad idea) and canonical URLs.
Now you can actually do this. A complete guide and examples can be found here: https://github.com/kubrickology/Logical-escaped_fragment

AngularJS / AJAX app and search engine crawlers

I've got a web app which heavily uses AngularJS / AJAX and I'd like it to be crawlable by Google and other search engines. My understanding is that I need to do something special to make it work, as described here: https://developers.google.com/webmasters/ajax-crawling
Unfortunately, that looks quite nasty and I'd rather not introduce the hash tags. What I'd like to do is to serve a static page to Googlebot (based on the User-Agent), either directly or by sending it a 302 redirect. That way, the web app can be the same, and the whole Googlebot workaround is nicely isolated until it is no longer necessary.
My worry is that Google may mistakenly assume that I'm trying to trick Googlebot, while my goal is to help it. What do you guys think about this approach, and what would you recommend?
Recently I come upon this excellent post from yearofmoo, explaining in details how to make your Angular app SEO friendly. In essence, when bots see an uri with a hash tag they will know it's an ajaxed page and will try to reach the same uri by replacing '#!' in your uri with '?_escaped_fragment_='. This alternative uri instructs bots that they should expect to find a definitive static version of the page they were accessing.
Of course, to achieve this you'd have to introduce hash tags into your uris. I don't see why are you trying to avoid them. Isn't gmail using hash tags?
Yeah unfortunately, if you want to be indexed - you have to adhere to the scheme :( If your running a ruby app - there's a gem that implements the crawling scheme for any rack app....
gem install google_ajax_crawler
writeup of how to use it is at http://thecodeabode.blogspot.com.au/2013/03/backbonejs-and-seo-google-ajax-crawling.html, source code at https://github.com/benkitzelman/google-ajax-crawler
Have a look at these links and it will give you a good direction:
Set up your own Prerender service using Prerender.io open source code:
https://prerender.io/
Use a different existing service such as BromBone, Seo.js or SEO4AJAX:
http://www.brombone.com/
http://getseojs.com/
http://www.seo4ajax.com/
Create your own service for rendering and serving snapshots to search engines. Read this article. It will give you the big picture:
http://scotch.io/tutorials/javascript/angularjs-seo-with-prerender-io
As of May 2014 GoogleBot now executes JavaScript. Check WebmasterTools to see how Google sees your site.
http://googlewebmastercentral.blogspot.no/2014/05/understanding-web-pages-better.html
Edit: Note that this does not mean other crawlers (Bing, Facebook, etc.) will execute Javascript. You may still need to take additional steps to ensure that these crawlers can see your site.

full ajax site and SEO

i am planing to start a full ajax site project, and i was wondering about SEO.
The site will have urls like www.mysite.gr/#/category1 etc
Can Google crawl the site.
Is something that i have to noticed about full ajax and SEO
Any reading suggestions are welcome
Thanks
https://stackoverflow.com/questions/768233/do-hashes-in-urls-affect-seo
You might want to read about so called progressive enhancement.
Google supports indexing of AJAX sites, but unfortunately it involves extra work for the developer. See http://code.google.com/web/ajaxcrawling/docs/getting-started.html
I don't think Google is capable of doing so (yet)
http://googlewebmastercentral.blogspot.com/2009/10/proposal-for-making-ajax-crawlable.html
However you can of course make your site usable with or without JavaScript. That way, browsers will have the full candy stuff and Google (and text browsers) still can navigation your site.
In addition to SEO, you also need to think about usability standards here. A site that is that reliant on AJAX isn't going to work for things like screen-readers as well as spiders. You need a system for graceful degreadation. A website that can't function without JavaScript isn't really a functioning website.
The search engines will spider the initial page load - what happens to the page (with ajax) after that is irrelevant to listings.
Google itself doesn't crawl ajax content but advice a mechanism for it. For this you first need to change # to #!
Whole process to SEO AJAX content is explained here along with simple asp.net code to start working on it.
Imagine having to hit the “refresh” button in your browser to update your Twitter feed rather than just hitting the button on the page itself and having it instantly update? These are the types of problems that AJAX solves, although it does come with its pitfalls. Google might claim it’s able to crawl and parse AJAX websites, yet it’s risky to just take its word for it and leave your website’s organic traffic up to chance. Even though Google can usually index dynamic AJAX content, it’s not always that simple. This guide covers some of the things that can go wrong and how you can make sure your AJAX website is crawlable: https://prerender.io/ajax-seo/

Resources