Is there any way to allow search engines to list JSON or XML ajax data ?
I don't think there is a way to directly allow crawlers to index XML and JSON.
I would recommend trying to design your site using progressive enhancement. First, make all of the JSON and XML available in HTML form for users who don't use javascript. These users include some people with disabilities and the crawlers used by search engines. That will ensure your content is searchable.
Once you have that working and tested, add your ajax functionality. You might do this by serving HTML, XML and JSON from a single URL using content negotiation, or you might have seperate URLs.
Another graceful solution is to implement your ajax calls as requests to full HTML pages and have your javascript only use the bit that it's interested in e.g. a div with id "content. The suitability of this solution would depend on your exact requirements.
Hmm, no, not really. Search engines crawl your HTML and they don't really bother clicking around or even just loading your page into a browser and having the AJAX magic happen. Flash and JSON objects are by themselves invisible to search engines, and to get them visible, you have to transform them in some HTML.
The newest technique for getting AJAX requests to be listed in search engines is to ensure they have their own URL. This technique stems from the same one utilized by flash applications where each page has a unique identifier, preceded by a pound (#) sign.
There are currently a few jQuery plugins which will allow you to manage this:
SWFAddress - Deep Linking for Flash & AJAX
jQuery History Plugin
Related
What would be a good approach in general to cache a web page where most of the content living in a database almost never changes (e.g. description) but a little content changes high-frequently (e.g. stock items).
I want to keep the web page cached as long as possible. Would it be an option to get the dynamic content via AJAX request? Do better approaches exist?
You could request the stock data from a separate URL and use JavaScript to insert it into the document. That way, the HTML/CSS/JS remains the same and can be cached. The stock information is loaded using JavaScript and it's not inserted into the HTML by the server.
You could create a URL that returns JSON for this purpose (and similarly for other information that you wish to include using JavaScript).
i am injecting some text into my pages but i need to prevent search engines from indexing it. I read that some engines are able to read this content now. How can one prevent them from doing so?
Search engines cannot read Ajax content yet. The closest they come is Google supporting it if you use their specifications. But that does require you using their specification otherwise Google can't crawl Ajax content.
I have a page on my site which has a list of things which gets updated frequently. This list is created by calling the server via jsonp, getting json back and transforming it into html. Fast and slick.
Unfortunately, Google isn't able to index it. After reading up on how to get this done according to Google's AJAX crawling guide, I am bit confused and need some clarification and confirmation:
The ajax pages need to be implement the rules only, right?
I currently have a rest url like
[site]/base/junkets/browse.aspx?page=1&rows=18&sidx=ScoreAll&sord=desc&callback=jsonp1295964163067
this would need to become something like:
[site]/base/junkets/browse.aspx#page=1&rows=18&sidx=ScoreAll&sord=desc&callback=jsonp1295964163067
And when google calls it like this
[site]/base/junkets/browse.aspx#!page=1&rows=18&sidx=ScoreAll&sord=desc&callback=jsonp1295964163067
I would have to deliver the html snapshot.
Why replace the ? with # ?
Creating html snapshots seems very cumbersome. Would it suffice to just serve simple links? In my case I would be happy if google would only index the things pages.
It looks like you've misunderstood the AJAX crawling guide. The #! notation is to be used on links to the page your AJAX application lives within, not on the URL of the service your appliction makes calls to. For example, if I access your app by going to example.com/app/, then you'd make page crawlable by instead linking to example.com/app/#!page=1.
Now when Googlebot sees that URL in a link, instead of going to example.com/app/#!page=1 – which means issuing a request for example.com/app/ (recall that the hash is never sent to the server) – it will request example.com/app/?_escaped_fragment_=page=1. If _escaped_fragment_ is present in a request, you know to return the static HTML version of your content.
Why is all of this necessary? Googlebot does not execute script (nor does it know how to index your JSON objects), so it has no way of knowing what ends up in front of your users after your scripts run and content is loaded. So, your server has to do the heavy lifting of producing a HTML version of what your users ultimately see in the AJAXy version.
So what are your next steps?
First, either change the links pointing to your application to include #!page=1 (or whatever), or add <meta name="fragment" content="!"> to your app's HTML. (See item 3 of the AJAX crawling guide.)
When the user changes pages (if this is applicable), you should also update the hash to reflect the current page. You could simply set location.hash='#!page=n';, but I'd recommend using the excellent jQuery BBQ plugin to help you manage the page's hash. (This way, you can listen to changes to the hash if the user manually changes it in the address bar.) Caveat: the currently released version of BBQ (1.2.1) does not support AJAX crawlable URLs, but the most recent version in the Git master (1.3pre) does, so you'll need to grab it here. Then, just set the AJAX crawlable option:
$.param.fragment.ajaxCrawlable(true);
Second, you'll have to add some server-side logic to example.com/app/ to detect the presence of _escaped_fragment_ in the query string, and return a static HTML version of the page if it's there. This is where Google's guidance on creating HTML snapshots might be helpful. It sounds like you might want to pursue option 3. You could also modify your service to output HTML in addition to JSON.
I've more or less given up on this. There really seems no alternative to generating the html on the server and delivering it in the html bdoy if you want goolge to index your directory.
I even tried adding a section wraped a .net user control which implemented a simple html version of the directory. But google also managed to ignore ..
So in the end my directory has been de-ajaxified. :(
I've created an AJAX enabled web application. In my application all contents [that I want to be appear in search pages] are loaded using AJAX. However I observed that despite of valid sitemap submitted to google, my page raking is very very poor.
What all I need to do and what to avoid in order to improve page ranking.
Thanks in advance.
you probably want to make it enabled for bookmark and history. There are many ways. One of them is jQuery's history plugin: https://github.com/tkyk/jquery-history-plugin
you probably want to create a page for search engines to crawl your website with those links http://www.mysite.com/foobar.php#!fetch_content=xyz. The #! is a way recognized by Google to crawl and index its content.
reference: http://googlewebmastercentral.blogspot.com/2007/11/spiders-view-of-web-20.html
Don'ts would be interesting. But here's a do, for all of JS as well.
Make sure that all links degrade gracefully, this can be easily achieved by giving the links real URLs that lead to the same content that is to be loaded in the event that JS is not enabled. This makes crawling your website possible.
You would also have to disable default for all the affected links.
My Question is: Why don't use more webpages AJAX to load the Webpage content?
Because of the fact that you can switch off JS or is there a thought about some security problem ?
Probably for two reasons:
Users with Javascript disabled won't see anything.
Pages loaded through AJAX aren't crawl-able by search engines. You want your content to be as accessible as possible so people searching the Web will find your application.
Because in most cases it doesn't make the site any more comfortable to use (often the effect would be the opposite). "Ajax" shouldn't be used to load entire pages unless you have a very good reason for it.
One word: SEO. Seach engines execute no javascript -> do not se the content -> do not index the page.