We make use of ajax in our Django web app pretty extensively. It's not a one page design but in most cases we just serve a HTML skeleton with Django's build in template engine and load most of the content with JavaScript async.
I read this article by google "Making AJAX Applications Crawlable". They suggest creating HTML snapshots for better search engine visibility.
We are using the django-rest-framework and Mustache for templates.
Is there a straight forward way to generate static html pages (HTML snapshots) with this setup?
I'd suggest looking into Selenium. You can use it to generate your page in the context of a browser (e.g., including things like Ajax). It's designed to be a testing tool, but that shouldn't matter.
We ended up writing a custom jquery library called pjaxr inspired by pjax. Find more about it here:
https://github.com/iekadou/django-pjaxr
We also wrote a Django specific implementation:
https://github.com/iekadou/django-pjaxr
Pjaxr uses pushState which is seo friendly. For browsers that don't support pushState pjaxr fully degrades to normal static html.
Related
I am displaying internationalized strings within a Polymer element as follows:
<div>
<span class="content">{{myContent}}</span>
</div>
... and have the following dart code:
#observable String myContent;
//...
void onUpdateLocale(_locale) {
myContent = getMyContent();
}
//...
getMyContent() => Intl.message('All my content ...', name:'myContent',
desc: 'This is my content',
args: [],
examples: {'None' : 0});
However; when Google crawls the app, it only pulls "{{myContent}}" and not its interpolated value, the actual internationalized content. Is there a way to work around this and make an internationalized Polymer.dart app that is also SEO-friendly?
Its not really clear. Although recently Google announced that they are evaluating Javascript to index the page, I've not seen any deep evaluation of how this compares to the server rendered pages approach.
And then there is the issue of non Google search engines like Bing.
Polymer as it stands today doesn't really do server side rendering and as far as I can tell the team doesn't have plans to offer than in the near future.
If your project/business depends on SEO I would not risk using polymer.
You have two options to address this issue:
Use phantom.js to render the page on server side whenever a crawler is requesting the page.
Use a third party service like ajaxsnapshots.
Forget polymer and use react.js component framework. React has a way to render the virtual DOM on the server side. This will work seamlessly if you are using node.js frameworks. It should be possible with JVM frameworks as well as Java 6+ ships with a Javascript engine (vastly improved in Java 8. Google "nashhorn").
Google have a spec that lets you serve snapshots of your page's HTML after all necessary Javascript (or Dart) has run to search engines: https://developers.google.com/webmasters/ajax-crawling/
The basic idea is to render the pages on the server side and then follow a set of URL conventions that lets you serve search engines the pre-generated HTML in a way that they wont confuse with cloaking.
Google, Bing, Yandex and some social bots support this spec.
You can implement this spec yourself or use a service that does it for you (I work for one of these: https://ajaxsnapshots.com) The solution is typically plugged in at web server level so you don't need to make any changes to your app.
So, I don't know much about Polymer, aside from the documentation on databinding I just viewed. It seems fairly similar to AngularJS by Google, in that it is using JavaScript in a declarative way to render data into an HTML document. That being the case, the browser is still fundamentally seeing the underlying calls to {{something}} as just a raw string. The JS libraries are then what change that data into text on the screen.
That being the case, you might consider handling SEO like Angular developers do. Here is the definitive resource on the subject: http://www.yearofmoo.com/2012/11/angularjs-and-seo.html
Non-SPA AJAX Partials for SEO
Sadly, 101% of the Angular SEO examples assume the use of a singe-page-application (SPA). My app is not a SPA. Currently, my stack is:
Node/Express - for routing and rendering Jade templates. The URLs are real, and don't use HTML pushstate, hash-bang or anything similar. for this reason, url-escaped-fragment won't work for me (I don't think)
Angular for communicating with my RESTful API(s)
My problem is that my page itself only includes pieces that are loaded via AJAX—the rest of page is rendered server side. Node/Express is not responsible for any of this logic, Angular pulls in the data that will be in my first h1.
Google Bot and similar see: <h1>{{this_unrendered_string}}</h1> which is no good.
Has anyone come up with any clever solutions for working around this scenario?
FWIW I found a service called SEO.js that will host a rendered version of any page I pass to it. If I could just tell GoogleBot and similar "Hey, don't use this page, use this page instead" But I'm not entirely sure how SEO feels about a different host serving content. Maybe some trickery could work here..
Google have documented an approach to "Making AJAX Applications Crawlable" here. https://developers.google.com/webmasters/ajax-crawling/
Implementing this isn't completely simple (basically you have to run a headless browser and return the HTML snapshots in response to specially formatted requests by Google).
It's not a simple as just returning a snapshot when you detect GoogleBot, but doing it this way probably eliminates any risk of being penalized.
There are a few companies that offer this a service - I'm getting on well with this one: https://ajaxsnapshots.com - they say that Bing and Yandex (Russian search engine) support it too.
AjaxSnapshots have an API you can use to tell them when your page is ready to snapshot - you could call that after all of your client side rendering is done.
I've got a very unique situation that I don't believe any of the other topics here can relate.
I have a ecommerce module that is dynamically loaded / embedded into third party sites, no iframe straight JSON to web client into content. I have no access to these third part sites at all, other then my javascript file being loaded from their page and dynamically generating the content.
I'm aware of the #! method, but that's no good here, my JS does generate "urls" within the embedded platform, but they're fake and for the address bar only, and I don't believe google crawlers can reach this far.
So my question is, is there a meta that we can set to point outside the url to i.e. back to my server with static crawlable content. I.e. pointing the canonical to my server... but again I don't think that would work.
If you implement #! then you have to make sure the url your embedded in supports the fragment parameter versions, which you probably can't. It's server side stuff.
You probably can't influence the canonical tag of the page either. It again has to be done server side. Any meta tag you set via JavaScript will not be seen by a bot.
Disqus solved the problem by providing an API so the embedding websites could get there comments server side and render then in plain html. WordPress has a plugin to do this. Disqus are also one of the few systems that Google has worked out how to crawl their AJAX pages.
Some plugins request people to also include a plain link with the JavaScript. Be careful with this as you may break Google Guidelines if you do it wrong. But you may be able to integrate the plain link with your plugin so that it directs bots and users to a crawlable version of the content.
Look into Google's crawlable ajax standard (and why it's a bad idea) and canonical URLs.
Now you can actually do this. A complete guide and examples can be found here: https://github.com/kubrickology/Logical-escaped_fragment
I am currently developing a single-page web application that is focused on functionality. It doesn't really have or need long paragraphs of text, and those that are there are loaded dynamically via javascript and AJAX.
Normally search engine optimization tips revolve around getting the right word count percentages, etc. But what are the best practices for SEO when your application is heavily reliant on AJAX? A landing page with descriptive text is not an option - it's important that users can immediately start using the application, and it's rather obvious what it does once it's loaded.
With meta tags fading in importance in modern search engines, is link-building the only solution or are there tricks to help search engines know what an AJAX-based web application is about?
Google has a written specification suggesting how you might make an AJAX web application better crawlable by their robots.
The fundamental principle is that you make a static html version of key pages, and let the crawler know these pages exist, and the relationships between them, using the #! url fragment syntax.
Somewhere you'll have to explain:
What so great about your app
How your app is working ("for dummies" style)
Who you are and why you did it
etc
You can use all this content to do SEO (no ajax is needed for that).
Forget about making ajax crawlable if you don't have any text inside your app anyway.
We're coming from GWT projects and because of problems with SEO not liking GWT for our next project we're going to move clear of GWT (mainly because seo is a high priority for this next project). In choosing a new framework, I'm looking at Wicket and liking what I've seen so far. I've only done a few tutorials, but in looking at the war layout (from these tutorials) it looks like most of the html pages are in the WEB-INF folder.
It this going to cause problems for SEO and search engines crawling through the sites files?
Ideally, I'd like to use Wicket with some AJAX and deploy to Google App Engine.
It does not matter if your .jsps (or whatever) are stored in /WEB-INF. It just means they cannot be accessed directly by going to http://webapp/path/to/jsp.
For SEO think about:
Meaningful URLs and link text (i.e. URLs should be similar to expected search engine queries)
Crawlable pages (make sure all your content can be reached by a non-JS enabled bot... i.e. don't make content only available through AJAX, for instance). A sitemap might help
Look into Wicket's Bookmarkable page links and UrlCodingStrategies for a very powerful combination to use in SEO. Basicly all your links and parameters can be encoded as/a/static/url, regardless of (changing) implementation on the backend.
if you project SEO is really important than you might reconsider using a lot of ajax since crawler wont execute javascript they are not gonna read all the return of your ajax calls... that being said the SEO quality of your site is not really based on the framework you will be using ... jsut always think about img alts, links, meta, title, h1 ... in every pages and you should be fine ... also always try to post links to your site on other websites to gain visibility and get importance for crawlers