Ajax Crawling: old way vs new way (#!) - ajax

Old way
When I used to load page asynchronously in projects that required the content to be indexed by search engines I used a really simple technique, that is
Page
<script type="text/javascript">
$('#example').click(function(){
$.ajax({
url: 'ajax/page.html',
success: function(data){
$('#content').html(data);
}
})
});
</script>
edit: I used to implement the haschange event to support bookmarking for javascript users.
New way
Recently Google came up with the idea of ajax crawling, read about it here:
http://code.google.com/web/ajaxcrawling/
http://www.asual.com/jquery/address/samples/crawling/
Basically they suggest to change "website.com/#page" to "website.com/#!page" and add a page that contains the fragment, like "website.com/?_escaped_fragment_=page"
What's the benefit of using the new way?
To me it seems that the new way adds a lot more work and complexity to something that before I did in a simple way: I designed the website to work without ajax and then I added ajax and hashchange event (to support back button and bookmarking) at a final stage.
From an SEO perspective, what are the benefits of using the new way?

The idea is to make the AJAX applications crawlable. According to the HTTP specifications, URLs refer to the same document regardless of the fragment identifier (the part after the hash mark). Therefore search engines ignore the fragment identifier: if you have a link to www.example.com/page#content, the crawler will simply request www.example.com/page.
With the new schemes, when you use the #! notation the crawler knows that the link refers to additional content. The crawler transforms the URL into another (ugly) URL and requests it from your web server. The web server is supposed to respond with static HTML representing the AJAX content.
EDIT Regarding the original question: if you already had regular links to static pages, then this scheme doesn't help you.

The advantage is not really applicable for you, because you are using progressive enhancement. The new Google feature is for applications written entirely in Javascript, which therefore can't be read by the crawler. I don't think you need to do anything here.

The idea behind it is that Javascript users can bookmark pages too, I think. If you take a look at your 'old' method, it's just replacing content on the page; there is no way to copy the URL to show the page in current state to other people.
So, if you've implemented the new #! method, you have to make sure that these URLs point to the correct pages, through Javascript.

i think it's just easier for google to be sure that you're not working with duplicate content. i'm including the hash like foo/#/bar.html in the urls and pass it to the permalink structure but i'm not quite sure if google likes or not.
interesting question though. +1

Related

How do Big sites prevent the loading circle on tabs from showing?

Okay I do not know how to explain this to you, It may be just my internet, or maybe my site is slower, or they really have a technique for doing this.
If you visit Facebook, Reddit, Youtube, Twitter, and if you click on links or any actions on those websites, the url changes but the browser tab doesn't show any loading circle.
How do they do that?
I am pretty sure my website is fast enought and at times it loads even faster than the bigger sites, but mine shows the loading circle on the browser tab.
Okay so I found the answer. Here is the technique for changing the url without reloading the page.
Updating address bar with new URL without hash or reloading the page
How do I modify the URL without reloading the page?
I am still trying to figure out though how to redirect the actual page without reloading the entire page. I am guessing they are loading it via ajax or something similar upon url change. I'll update this once I figure it out.
Edit: I am currently working on this feature for my site. The technique is to use ajax to load the content based on the url. I'll update this thread more as I update my site with this feature.
Edit 2: Damn, you will probably face the same problem I had trying to detect the url change without using onhashchange. If so, here you go:
How to detect URL change in JavaScript
This literally took me 4 hours just to figure that one out.....
Edit 3: I have now integrated this feature on my site. You can check it at
Grandweb
It is quite simple, but lots of work in appending the content once retrieved via ajax. So here is the process:
I am using pushState(); to change the url without reloading the page.
var url = $(this).attr('href');
var split_url = url.split('/');
var new_url = url.replace('https://grandweb.net/','');
window.history.pushState("object or string", "Title", "/write");
Using 'mouseup' was a bad idea, I changed my mind.
I then have to trigger the first function using 'mouseup' to retreieve the content via ajax, and then listen to succeeding onpopstate() for the next ones, because some mouse actions such as Mouse 4 or Mouse 5 are bound to the browser's Back and Forward button, and does not trigger via 'mouseup'.
$(window).on('mouseup', function(evt) {
get_content();
}
window.onpopstate = function(event) {
get_content();
}
The first one is responsible for triggering the function on first try because onpopstate only listens only when the browser's history API is populated.
Using mouseup was a bad idea, basically, don't use it unless you really want to detect mouse action from anywhere on the document.
I instead use the anchor tags/links to trigger the first function for retrieveng content.
example:
<a class="dynamic_btn" href="website.com/post">Home</a>
then
$(document).on('click','.dynamic_btn',function(e){
e.preventDefault();
get_content();
});
Using onhashchange is possible IF you have hashes on your url. I do not use hashes on my url so basically onhashchange is useless in my use case, unless I do not know something.
After retrieving the contents, I append them via creating DOM elements to existing containers from the page.
This is much easier to do if you are planning to change few elements or containers in your pages. If you plan on doing this to change a full page layout, goodluck. It's doable, but it's a really pain in tha *ss.
Upon observing Facebook, I learned that they do not implement this technique in all of their links/features. It makes sense because this is harder to maintain most especially because most of the work here is being done client side. It is very nice though because the page doesn't load.
I have implemented it on a few 'essential' functions of my website such as the viewing of posts and returning to the homepage. I can implement it on the whole site, but I am still deciding on that. That is all, thank you very much for reading internet stranger.

Should I load an entire html page with AJAX?

My designer thought it was a good idea to create a transition between different pages. Essentially only the content part will reload (header and footer stay intact), and only the content div should have a transitional effect (fade or some sort). To create this sort of effect isn't really the problem, to make google (analytics) happy is...
Solutions I didn't like and why;
Load only the content div with ajax: google won't see any content, meaning the site will never be found, or only the parts which are retrieved by ajax, which arent't full pages at all
show the transitional effect, then after that 'redirect' the user to the designated page (capture the click event of a elements): effect is pretty much the same as just linking to another page, eg. user will still see a page being reloaded
I thought of one possible solution:
When a visitor clicks a link, capture the event, load the target with ajax, show the transitional effect in the meantime, then just rewrite the entire document with the content fetched with the ajax request.
At least this will work and has some advantages; the page reload will look seamless, no matter how slow your internet connection is, google won't really mind because the ajax content is a full html page itself, and can be crawled as is, even non-javascript browsers (mobile phones et al.) won't mind, they just reload the page.
My hesitation to implement this method is that i would reload an entire page using ajax. I'm wondering if this is what ajax is meant to do, if it would slow things down. Most of all, is there a better solution, eg. my first 'bad' solution but slightly different so google would like it (analytics too)?
Thanks for your thoughts on this!
Short answer: I would not recommend loading an entire page in this manner.
Long answer: Not recommended. whilst possible, this is not really the intent of XHR/Ajax. Essentially what you're doing is replicating the native behaviour of the browser. Some of the problems you'll encounter:
Support for the Back/Forward
button. You'll need a URI # scheme
to solve.
The Browser must parse
the entire page through AJAX.
This'll slow things down. E.g. if
you load a block of HTML into the
browser, then replace the DOM with
it, only then will any scripts, CSS
or images contained therein begin
downloading.
Memory - the
browser's not changing pages. Over
time (depending on the browser), I'd
expect the memory usage to increase.
Accessibility. Screen readers
will need to be notified whenever
the page content is updated. Might
not be a concern for you but worth
mentioning.
Caching. Browser
would not know which page to cache
(beyond the initial load).
Separation of concerns - your View
is essentially broken into
server-side pieces to render the
page's content along with the static
HTML for the page framework and
lastly the JS to combined the server
piece with the browser piece.
This'll make maintenance over time
problematic and complex.
Integration with other components -
you're already seeing problems with
Google Analytics. You may encounter
issues with other components related
to the timing of when the DOM is
constructed.
Whether it's worth it for the page transition effect is your call but I hope I've answered your question.
you can have AJAX and SEO: Google's proposal .
i think you can learn something from Gmail's design.
This may be a bit strange, but I have an idea for this.
Prepare your pages to load with an 'ifarme' GET parameter.
When there is 'iframe' load it with some javascript to trigger the parent show_iframe_content()
When there is no 'iframe' just load the page, with a hidden iframe element called 'preloader'
Your goal is to make sure every of your links are opened in the 'preloader' with an additional 'iframe' get parameter, and when the loading of the iframe finishes, and it calls the show_iframe_content() you copy the content to your parent page.
Like this: Link clicked -> transition to loading phase -> iframe loaded -> show_iframe_content() called -> iframe content copied back to parent -> transition back to normal phase
The whole thing is good since, if a crawler visit ary of your pages, it will do it without the 'iframe' get parameter, so it can go through all your pages like normal, but when you use it in a browser, you make your links do the magic above.
This is just a sketch of it, but I'm sure it can be made right.
EDIT: Actually you can do it with simple ajax, without iframe, the thing is you have to modify the page after it has been loaded in the browser, to load the linked content with ajax. Also crawlers should see the links.
Example script:
$.fn.initLinks = function() {
$("a",this).click(function() {
var url = $(this).attr("href");
// transition to loading phase ...
// Ajax post parameter tells the site lo load only the content without header/footer
$.post(href,"ajax=1",function(data) {
$("#content").html(data).initLinks();
// transition to normal phase ...
});
return false;
});
};
$(function() {
$("body").initLinks();
});
Google analytics can track javascript events as if they are pageviews- check here for implementation:
http://www.google.com/support/googleanalytics/bin/answer.py?hl=en-GB&answer=55521

Getting content: AJAX vs. "Regular" HTTP call

I like that, these days, we have an option for how we get our web content from the server: we can make an old-style HTTP request (with its own URL in the browser) or we can make an AJAX call and replace parts of the DOM on the fly.
My question is this: how do you decide which method to use when there's an option to use either?
In the "old days" we'd have to redraw the entire page (including the parts that didn't change) if we wanted to show updated content. Now that AJAX has matured we don't need to do that any more; we could, conceivably, render a "page" once and just update the changing parts as needed. But what would be the consequences of doing so? Is there a good rule of thumb for doing a full-page reload vs a partial-page reload via AJAX?
If you want people to be able to bookmark individual pages, use HTTP requests.
If you are changing context, use HTTP requests.
If you are dividing functionality between different pages for better maintainability, use HTTP requests.
If you want to maximize your page views, use HTTP requests.
Lots of good reasons to still use HTTP requests - Stack overflow is a wonderful example of those divisions between AJAX and HTTP requests. Figure out why each function is HTTP or AJAX and I'm sure you will derive lots more reasons when to use each.
My simple rule:
Do everything ajax, especially if this is an application not just pages of content. Unless people are likely to want to link to direct content, like in a blog. Then it is easier to do regular full pages.
Of course there are many blended combinations, it doesn't have to be one or the other completely.
A fair amount of development overhead goes into partial-page reloads with AJAX. You need to create additional javascript handlers for all returned data. If you were to return full HTML blocks, you would still need to specify where the content should go and whether or not it's replacing other content. You would potentially have to re-render header tags to reflect content changes and you would have to implement a history solution to make sure search engines can index each page (using SWFAddress jQuery plugin, for example). If you return JSON encoded data you have an additional processing step.
The trade-off for reduced bandwidth usage by not using a page refresh is offset by an increase in JS code and event bindings which could affect page rendering speed as well as visual effects.
It all really depends on your target audience and the overall feel you are trying to go for on your page. AJAX and preloaders are flashy, and people love flashy things. If you believe the end-user experience will improve by adding partial page loads by all means implement them.

load a new page using ajax

I am new to ajax and i wanted to know if we can load a complete new page and not just a part of it using ajax. Please give a small example script for understanding if this is possible. Here i am trying to display only one url to user while i change from one page to another when he clicks on any of the links in the page.
You can of course request for a new page and load it via body.innerHTML = ajax.responseText;
I would strongly recommend against this though for reasons outlined in this post: Why not just using ajax for Page Requests to load the page content?
The whole premise really is that with
AJAX you don't need to reload the
whole page to update a small
percentage of that webpage. This saves
bandwidth and is usually much quicker
than reloading the whole page.
But if you are using AJAX to load the
whole page this is in fact
counterproductive. You have to write
customised routines to deal with the
callback of the AJAX data. Its a whole
lot of extra work for little to no
increase in performance.
General rule for where to use AJAX: If
your updating >50% of your page, just
reload, else use AJAX.
You will not only need to request for the new page, but then also take care of ensuring the old styles on the current page are removed and don't interfere with the new page. Theres all sorts of problems associated with what your trying to do. It's possible, but I recommend not to do it.
edit: actually you might be able to just do document.write(ajax.responseText) which should take care of overwriting everything in the document, including css styles etc. Though still don't recommend it.
When you're finished with processing the AJAX request simply use this JS line:
window.location.replace('<URL of the new page>');
This has exactly the effect of loading the new page via
....
When you make an AJAX request, a request goes off and brings you the contents of the URL that you have requested. Now technically you can do whatever you like with the contents (which could be HTML), you can replace any element within the DOM with it. Be careful however of replacing EVERYTHING on the page, you are more likely just going to want to replace what is within the tags.
If what you want to do is show one URL for multiple pages, AJAX is overkill. Why not just use an IFRAME?
This could be useful if your page was unsure if it was expecting back errors to be inserted onto the page or a "new" submission confirmation page. This can be used when you want to put a validation servlet (or whatever) in front of the submission servlet (or whatever). If the page always hits the validation servlet, you hide the submission servlet which actually performs the data update. In the case where the validation passes, forward to the submission servlet. The user never knows what happened in the background.
When the page gets a response back you could just look at the first portion of the response text and determine if it had a keyword set by the server, which means this is a new page. Remove the keyword from the text, and do document.write(ajax.responseText); as described previously. Otherwise insert the response text into your errorBox div and let the user retry submission.

What are the "best practices" for AJAX with Django (or any web framework)

I am developing an issue tracking application in Django, mostly for a learning exercise but also for my own projects - and I am looking into using some AJAX for "enhanced" usability. For example, allowing users to "star" particular issues, which would add them to their watch list. This is implemented in a lot of sites, and is often AJAX - as the URL that the user is viewing doesn't need to change when they click the star.
Now, I am wondering what kind of response to return from my star_unstar view - that detects whether the request is being made via AJAX or not.
At present, if the request is an AJAX request, it returns just the section of HTML that is needed for the star, so I can replace the HTML in the star's parent DIV, so as the star appears "on" or "off", depending on the user's action.
However, I would much rather return some kind of JSON object, as it just seems more "proper", I think. The problem with this method is that the javascript would have to modify the star image's src attribute, the href on it, and the link title also, which seems a lot of work for such a simple feature. I am also looking into in-line commenting in the future, but I want to get a feel for how things "should" be done before I start coding lots of JS.
What is the general consensus when implementing features such as this, not just with Django, but all frameworks that operate in a similar way?
When I work with Ajax my main concern is usually to limit the amount of data I have to send. Ajax applications of this type should be very responsive (invisible if possible).
In the case of toggling a star, I would create the actual on/off states as CSS classes, StarOn and StarOff. The client will download both the off and on star when they first visit the page, which is acceptable considering that the star is a small image. When you want to change the star appearance in the future, you'll only be editing CSS, and won't have to touch the javascript at all.
As for the Ajax, I'd send back and forth one thing -- a JSON variable true/false that says whether or not the request was successful. As soon as the user clicks on the star, I'd change it to the StarOn state and send out the request. 99% of the time Ajax will return true and the user will not even realize that there was some sort of delay in the web request. In the rare case where you get a false back, you'll have to revert the star to StarOff and display an error message to the user.
I don't think your question relates particularly to Django or Python, as you point out at the end.
There's a lot of personal preference in whether you return a blob of HTML to write into the DOM or some serialized data as JSON. There are some practical factors you might want to take into account though.
Advantages of HTML:
- Easy and fast to write straight into the page.
Advantages of JSON:
- Not coupled to the front-end of your application. If you need that functionality anywhere else in the application, it is there ready to go.
My call on it. It's only a relatively trivial amount of HTML to update, and I'd probably go for returning JSON in this case and giving myself the extra flexibility that might be useful down the road.

Resources