MEAN-SEO not working as expected - ajax

I have a project in meanjs.
It has html5mode disabled so my URLS are like that:
http://localhost:3000/#!/products
I am trying to implement AJAX snapshoots in order to allow Google Crawlers to see content generated by javascript on client side.
I installed a module called MEAN-SEO:
http://blog.meanjs.org/post/78474995741/mean-seo
Now when I access the following URL:
http://localhost:3000/?_escaped_fragment_=
I am redirected to:
http://localhost:3000/?_escaped_fragment_=/#!/
And when I click on "products" or when I access directly, I am redirected to:
http://localhost:3000/?_escaped_fragment_=/#!/products
After reading the Google specification detailed here https://developers.google.com/webmasters/ajax-crawling/docs/getting-started , what I need is to get is something without hashbangs, like the following:
http://localhost:3000/?_escaped_fragment_=/products
What I am doing wrong?
Kind Regards.

Any specific reasons why you want html5mode off?
Here is something a lot of people have missed: Search engines (both Google and Bing) can now handle AJAX based content.
Their crawlers now understands pushstates, so if you just turn html5mode on you don't need any special handling to get your SEO working. You can load your content via AJAX, you can set title tags and meta tags with javascript and so on and so forth, and the crawlers will understand your content the same as if you had rendered things server-side. There is no need to do html-snapshotting or escaped_fragment handling for SEO anymore.
This has been announced on their developer blogs but unfortunately most of the documentation hasn't been updated with this information, so it's gone under the radar for a lot of people.
One word of warning though, Facebook does not handle pushstates, so if you want to support the Facebook crawler you still need to handle that separately.

Related

SEO with angularjs and asp.net restfull service

I have developed a website using angularjs and web api.
The problem is that the ajax rendered content is not crawable by google. And no one can find the website using google search.
After reading many articles regarding this issue, including:
This one with all links of explanation going out,
Google ajax crawling protocol, and also stack over flow question, I couldn't find the proper solution. Those that mention asp.net solutions, are talking about mvc, and I need only the simple REST by web api, other articles are not talking about asp.net.
Is there any simple explanation?
I'm the one who asked this same question long ago, so I will answer from my experience:
Firstly, if all your content are accessible via unique URIs (including the hashbang if you use it), modern search engines should index it just fine. In fact Google can index javascript generated content now. You can try that via the Google Webmaster tool and see how your site is indexed.
Secondly, there are libraries that help you to serve parsed content to search engines if you need to, but in my case I didn't bother much with it since Google is indexing js nicely.
I've seen others ask this question, and maybe I'm missing something or this is outdated, but I don't see why AngularJS needs to be an issue with SEO.
Say you have a landing page and it has a bunch of links. Assuming you're using html5 mode in AngularJS (and I'm not sure that's 100% necessary) and something like ng-route then the links on the landing page can work both as "angular" (JavaScript) links and "old school" (full page load) links.
If you're a human user you can click a link and it will do angular magic and adjust the content without loading the full page. Ok, all fine.
But if you instead copy the link and paste it in a new tab or new browser, it will still work - assuming you've set up routes correctly.
I'm not an SEO expert by any stretch of the imagination, but as I understand it, having links that load pages and having those pages have real and useful content is the core of SEO, and done this way, AngularJS should work fine. The key thing to check is if you copy and paste the link (not just click it) that it works.

Google crawl ajax / dynamically generated content - SEO

I've got a very unique situation that I don't believe any of the other topics here can relate.
I have a ecommerce module that is dynamically loaded / embedded into third party sites, no iframe straight JSON to web client into content. I have no access to these third part sites at all, other then my javascript file being loaded from their page and dynamically generating the content.
I'm aware of the #! method, but that's no good here, my JS does generate "urls" within the embedded platform, but they're fake and for the address bar only, and I don't believe google crawlers can reach this far.
So my question is, is there a meta that we can set to point outside the url to i.e. back to my server with static crawlable content. I.e. pointing the canonical to my server... but again I don't think that would work.
If you implement #! then you have to make sure the url your embedded in supports the fragment parameter versions, which you probably can't. It's server side stuff.
You probably can't influence the canonical tag of the page either. It again has to be done server side. Any meta tag you set via JavaScript will not be seen by a bot.
Disqus solved the problem by providing an API so the embedding websites could get there comments server side and render then in plain html. WordPress has a plugin to do this. Disqus are also one of the few systems that Google has worked out how to crawl their AJAX pages.
Some plugins request people to also include a plain link with the JavaScript. Be careful with this as you may break Google Guidelines if you do it wrong. But you may be able to integrate the plain link with your plugin so that it directs bots and users to a crawlable version of the content.
Look into Google's crawlable ajax standard (and why it's a bad idea) and canonical URLs.
Now you can actually do this. A complete guide and examples can be found here: https://github.com/kubrickology/Logical-escaped_fragment

AngularJS / AJAX app and search engine crawlers

I've got a web app which heavily uses AngularJS / AJAX and I'd like it to be crawlable by Google and other search engines. My understanding is that I need to do something special to make it work, as described here: https://developers.google.com/webmasters/ajax-crawling
Unfortunately, that looks quite nasty and I'd rather not introduce the hash tags. What I'd like to do is to serve a static page to Googlebot (based on the User-Agent), either directly or by sending it a 302 redirect. That way, the web app can be the same, and the whole Googlebot workaround is nicely isolated until it is no longer necessary.
My worry is that Google may mistakenly assume that I'm trying to trick Googlebot, while my goal is to help it. What do you guys think about this approach, and what would you recommend?
Recently I come upon this excellent post from yearofmoo, explaining in details how to make your Angular app SEO friendly. In essence, when bots see an uri with a hash tag they will know it's an ajaxed page and will try to reach the same uri by replacing '#!' in your uri with '?_escaped_fragment_='. This alternative uri instructs bots that they should expect to find a definitive static version of the page they were accessing.
Of course, to achieve this you'd have to introduce hash tags into your uris. I don't see why are you trying to avoid them. Isn't gmail using hash tags?
Yeah unfortunately, if you want to be indexed - you have to adhere to the scheme :( If your running a ruby app - there's a gem that implements the crawling scheme for any rack app....
gem install google_ajax_crawler
writeup of how to use it is at http://thecodeabode.blogspot.com.au/2013/03/backbonejs-and-seo-google-ajax-crawling.html, source code at https://github.com/benkitzelman/google-ajax-crawler
Have a look at these links and it will give you a good direction:
Set up your own Prerender service using Prerender.io open source code:
https://prerender.io/
Use a different existing service such as BromBone, Seo.js or SEO4AJAX:
http://www.brombone.com/
http://getseojs.com/
http://www.seo4ajax.com/
Create your own service for rendering and serving snapshots to search engines. Read this article. It will give you the big picture:
http://scotch.io/tutorials/javascript/angularjs-seo-with-prerender-io
As of May 2014 GoogleBot now executes JavaScript. Check WebmasterTools to see how Google sees your site.
http://googlewebmastercentral.blogspot.no/2014/05/understanding-web-pages-better.html
Edit: Note that this does not mean other crawlers (Bing, Facebook, etc.) will execute Javascript. You may still need to take additional steps to ensure that these crawlers can see your site.

Google Search optimisation for ajax calls

I have a page on my site which has a list of things which gets updated frequently. This list is created by calling the server via jsonp, getting json back and transforming it into html. Fast and slick.
Unfortunately, Google isn't able to index it. After reading up on how to get this done according to Google's AJAX crawling guide, I am bit confused and need some clarification and confirmation:
The ajax pages need to be implement the rules only, right?
I currently have a rest url like
[site]/base/junkets/browse.aspx?page=1&rows=18&sidx=ScoreAll&sord=desc&callback=jsonp1295964163067
this would need to become something like:
[site]/base/junkets/browse.aspx#page=1&rows=18&sidx=ScoreAll&sord=desc&callback=jsonp1295964163067
And when google calls it like this
[site]/base/junkets/browse.aspx#!page=1&rows=18&sidx=ScoreAll&sord=desc&callback=jsonp1295964163067
I would have to deliver the html snapshot.
Why replace the ? with # ?
Creating html snapshots seems very cumbersome. Would it suffice to just serve simple links? In my case I would be happy if google would only index the things pages.
It looks like you've misunderstood the AJAX crawling guide. The #! notation is to be used on links to the page your AJAX application lives within, not on the URL of the service your appliction makes calls to. For example, if I access your app by going to example.com/app/, then you'd make page crawlable by instead linking to example.com/app/#!page=1.
Now when Googlebot sees that URL in a link, instead of going to example.com/app/#!page=1 – which means issuing a request for example.com/app/ (recall that the hash is never sent to the server) – it will request example.com/app/?_escaped_fragment_=page=1. If _escaped_fragment_ is present in a request, you know to return the static HTML version of your content.
Why is all of this necessary? Googlebot does not execute script (nor does it know how to index your JSON objects), so it has no way of knowing what ends up in front of your users after your scripts run and content is loaded. So, your server has to do the heavy lifting of producing a HTML version of what your users ultimately see in the AJAXy version.
So what are your next steps?
First, either change the links pointing to your application to include #!page=1 (or whatever), or add <meta name="fragment" content="!"> to your app's HTML. (See item 3 of the AJAX crawling guide.)
When the user changes pages (if this is applicable), you should also update the hash to reflect the current page. You could simply set location.hash='#!page=n';, but I'd recommend using the excellent jQuery BBQ plugin to help you manage the page's hash. (This way, you can listen to changes to the hash if the user manually changes it in the address bar.) Caveat: the currently released version of BBQ (1.2.1) does not support AJAX crawlable URLs, but the most recent version in the Git master (1.3pre) does, so you'll need to grab it here. Then, just set the AJAX crawlable option:
$.param.fragment.ajaxCrawlable(true);
Second, you'll have to add some server-side logic to example.com/app/ to detect the presence of _escaped_fragment_ in the query string, and return a static HTML version of the page if it's there. This is where Google's guidance on creating HTML snapshots might be helpful. It sounds like you might want to pursue option 3. You could also modify your service to output HTML in addition to JSON.
I've more or less given up on this. There really seems no alternative to generating the html on the server and delivering it in the html bdoy if you want goolge to index your directory.
I even tried adding a section wraped a .net user control which implemented a simple html version of the directory. But google also managed to ignore ..
So in the end my directory has been de-ajaxified. :(

With Google's #! mess, what effect would a redirect on the converted URL have?

So Google takes:
http://www.mysite.com/mypage/#!pageState
and converts it to:
http://www.mysite.com/mypage/?_escaped_fragment_=pageState
...So... Would be it fair game to redirect that with a 301 status to something like:
http://www.mysite.com/mypage/pagestate/
and then return an HTML snapshot?
My thought is if you have an existing html structure, and you just want to add ajax as a progressive enhancement, this would be a fair way to do it, if Google just skipped over _escaped_fragment_ and indexed the redirected URL. Then your ajax links are configured by javascript, and underneath them are the regular links that go to your regular site structure.
So then when a user comes in on a static url (ie http://www.mysite.com/mypage/pagestate/ ), the first link he clicks takes him to the ajax interface if he has javascript, then it's all ajax.
On a side note does anyone know if Yahoo/MSN onboard with this 'spec' (loosely used)? I can't seem to find anything that says for sure.
If you redirect the "?_escaped_fragment_" URL it will likely result in the final URL being indexed (which might result in a suboptimal user experience, depending on how you have your site setup). There might be a reason to do it like that, but it's hard to say in general.
As far as I know, other search engines are not yet following the AJAX-crawling proposal.
You've pretty much got it. I recently did some tests and experimented with sites like Twitter (which uses #!) to see how they handle this. From what I can tell they handle it like you're describing.
If this is your primary URL
http://www.mysite.com/mypage/#!pageState
Google/Facebook will go to
http://www.mysite.com/mypage/?_escaped_fragment_=pageState
You can setup a server-side 301 redirect to a prettier URL, perhaps something like
http://www.mysite.com/mypage/pagestate/
On these HTML snapshot pages you can add a client-side redirect to send most people back to the dynamic version of the page. This ensures most people share the dynamic URL. For example, if you try to go to http://twitter.com/brettdewoody it'll redirect you to the dynamic (https://twitter.com/#!/brettdewoody) version of the page.
To answer your last question, both Google and Facebook use the _escaped_fragment_ method right now.

Resources