How effective is ajaxcrawling compared to serverside created website SEO? - ajax

I'm looking for real world experiences in regards to ajaxcrawling:
http://code.google.com/web/ajaxcrawling/index.html
I'm particularly concerned about the infamous Gizmodo failure of late, I know I can find them via Google now, but it's not clear to me how effective this method of ajaxcrawling is in comparison to serverside generated sites is.
I would like to make a wiki that lives mostly on the client side, and which is populated by ajax json. It just feels more fluid, and I think it would be a pluspoint over my competition. (wikipedia, wikimedia)
Obviously, for a wiki it's incredibly important to have working SEO.
I would be very happy for any experiences you have had dealing with clientside development.
My research shows that the general consensus on the web right now is, that you should absolutely avoid doing ajax sites unless you don't care about SEO (for example, a portfolio site, a corporate site etc).

Well, these SEO problems arise when you have a single page that loads content dynamically based on sophisticated client-side behavior. Spiders aren't always smart enough to know when JavaScript is being injected, so if they can't follow links to get to your content, most of them won't understand what's going on in a predictable way, and thus won't be able to fullly index your site.
If you could have the option of unique URLs that lead to static content, even if they all route back to a single page by a URL rewriting scheme, that could solve the problem. Also, it will yield huge benefits down the road when you've got a lot of traffic -- the whole page can be cached at the web server/proxy level, leading to less load on your servers.
Hope that helps.

Related

What's the best SEO practice when you do an AJAX driven website?

I encountered several websites running in Ajax and it seems like their SEO is pretty bad, does Google really crawl websites like that?
Optimization guides for different search engines tell that bots are unable to crawl such sites. I think, Google's bots might use Chrome's engine for some purposes (I remember, they made site screenshots one time), but nevertheless, it's the static HTML that's important. Therefore, the usual practice would be generating valid HTML on the server to provide functional site for user agents like, for example, Lynx, and then patching it with AJAX, history API and all other imaginable bells and whistles.

Crawlers/SEO Friendly/Mod Rewrite/It doesn't make any sense

So I am attached to this rather annoying project where a clients client is all nit picky about the little things and he's giving my guy hell who is gladly returning the favor by following the good old rule of shoving shi* down the chain of command.
Now my question. The application consists basically of 3 different mini projects. The backend interface for the administrator, backend interface for the client and the frontend for everyone.
I was specifically asked to apply MOD_REWRITE rules to make things SEO friendly. That was the ultimate aim, so this was basically an exercise in making things more search friendly rather than making the links aesthetically better looking.
So I worked on the frontend, which is basically the landing page for everyone. It looks beautiful, the links are at worst followed by one backslash.
My clients issue. He wants to know why the backend interfaces for the admin and user are still displaying those gigantic ugly links. And these are very very ugly links, I am talking three to four backslashes followed by various get sequences and what not, so you can probably understand the complexities behind MOD_REWRITING something such as this.
In the spur of the moment I said that I left it the way it was to make sure the backend interface wouldn't be sniffed up by any crawlers.
But I am not sure if that's necessarily true. Where do crawlers stop? When do they give up on trying to parse links? I know I can use a .robot file to specify rules. But, as indigenous creatures, what are their instincts?
I know this is more of a rant than anything and I am running a very high risk of having my first question rejected :| But hey, it feels good to have this off my chest.
Cheers!
Where do crawlers stop? When do they give up on trying to parse links?
Robots.txt does not work for all bots.
You can use basic authentication or limited access by IP to hide back-end, if no files are needed for front-end.
If not practicable, try to send 404 or 401 headers for back-end files. But this is just an idea, no guarantee.
But, as indigenous creatures, what are their instincts?
Hyperlinks, toolbars and browser-sided, pre-activated functions for malware-, spam- and fraud-warnings...

specific limitations of AJAX?

I'm still pretty new to AJAX and javascript, but I'm getting there slowly.
I have a web-based application that relies heavily on mySQL and there are individual user accounts that are accessed and the UI is populated with user specific data.
I'm working on getting rid of a tabbed navigation bar that currently loads new pages because all that changes from page to page is information within one box.
The thing is that box needs to reload info from the database, etc.
I have had great help from users here showing that I need to call the database within the php page that ajax is calling.
OK-so pardon the lengthy intro-what I'm wondering is are there any specific limitations to what ajax can call that I need to know about? IE: someone mentioned that it's best not to call script files and that I should remove scripts from the php page that is being called and keep those in the 'parent' page. Any other things like this I need to keep in mind?
To clarify: I'm not looking to discuss the merits/drawbacks of the technology. I'm wondering about specific coding implementation that I need to be aware of (for example-I didn't until yesterday realize that if even if I had established a mySQL connection on the page, that I would need to re establish that connection in my called page as well...makes perfect sense now).
XMLHttpRequest which powers ajax has a number of limitations. I recommend brushing up on the same origin policy. This is a pivotal rule because it limits where AJAX calls can be made.
First, you can't have Javascript embedded in the HTTP response to an AJAX call. That's a security issue.
No mention of the dynamics of the database, but if the data to be displayed in tabs doesn't have to be real-time, why not cache it server-side?
I find that like any other protocol, Ajax works best in tightly controlled conditions. It wouldn't make much sense for updating nearly the whole page, unless you find that the user experience is improved with an on-page 'loader'. Without going into workarounds, disadvantages will include losing the browser back button / history, issues such as the one your friend mentioned, and also embedded resources and other rich content can suffer as well, and just having an extra layer of complexity to deal with in your app. Don't treat it as magic sauce for your app - make sure every use delivers specific results that benefit your client / audience.
IMHO, it's best to put your client side javascript in a separate page and then import it - neater container. one thing I've faced before is how to call xml back which contains code to run such as more javascript - it's worth checking if this is likely earlier on and avoiding, than having to look at evals.
Mildly interesting.

Are there any disadvantages to using AJAX?

Are there any disadvantages to using AJAX?
No integration with the browser's history.
If you build a site that requires Ajax to see content and perform tasks, you have several major problems. Ajax-only content/functions are invisible/unavailable to:
search bots
many mobiles
people with Javascript turned off
etc etc.
However, if you build a site using the progressive enhancement principle, those problems are solved, and you still get to serve nice-to-use Ajax to most users.
Progressive enhancement involves first creating your site using bare-bones (X)HTML, on REST-like principles (at least to the extent of requiring POST requests for state changes). Simple semantic markup; forget about CSS and Javascript.
Step one is to get that right, and have your entire site (or as much of it as makes sense) working nicely this way for search bots and Lynx-like user agents.
Then add a visual layer: CSS/graphics/media for visual polish, but don't significantly change your original (X)HTML markup; allow the original text-only site to stay intact and functioning. Keep your markup clean!
Third is to add a behavioural layer: Javascript (Ajax). Offer things that make the experience faster, smoother, nicer for users/browsers with Ajax-capable JS... but only those users.
Browser compatibility.
Asynchronized access to data means it's harder to make things go correctly in every combination of actions.
Dependency of javascript makes the site unusable for some. Also javascript performance can be a bottleneck in resource limited environments.
User may not know via the client that an AJAX operation was made, or if it failed. It can be difficult to recover from client side errors caused by a failed AJAX call.
Makes it really Hard to do functional testing .
Inability to update the client without "polling", which means querying the server every X seconds.
It requires javascript. And you have to admit to your friends how "Web 2.0" you are. Instead of being hard core old school: It's all tables for layout and frames for navigation for me.
Yes, Ajax is not supported by old browsers or browsers which don't have javascript enabled. Nowadays, most of the browsers do have support for Ajax -- even mobile browser like the one on the IPhone.
The biggest issue for me is that Ajax adds complexity to the project.
There are many ajax libraries out there, which are suppose to make life easier. In most cases, these libraries are easy to use to create a "Hello World" application. One of the main issues which is most of the times kept asside by Ajax libraries is (client-side) error handling/logging.
For larger projects, the developer has to understand the internals of the library, which adds a new learning discipline to the project.
Some of our big clients -for security reasons- took a corporate decision of having javascript switched off. Therefore no AJAX is possible.
If you are going to develop something using AJAX for a given client be sure that your client are allowed to use javascript.
Restrict your application to a reasonable number of browsers and browsers versions.
Crossbrowser compatibility can make your life miserable.
Ultimately, the problem is that it introduces is complexity. Most problems inherent with AJAX sites (bookmarking, browser history, graceful degradation, etc...) can be overcome with a good design, so there are not really any disadvantages to a well designed AJAX enabled site. The problem is a creating such a site requires a lot of design and very good developers who can manage the complexity.

Mixing Secure and Non-Secure Content on Web Pages - Is it a good idea?

I'm trying to come up with ways to speed up my secure web site. Because there are a lot of CSS images that need to be loaded, it can slow down the site since secure resources are not cached to disk by the browser and must be retrieved more often than they really need to.
One thing I was considering is perhaps moving style-based images and javascript libraries to a non-secure sub-domain so that the browser could cache these resources that don't pose a security risk (a gradient isn't exactly sensitive material).
I wanted to see what other people thought about doing something like this. Is this a feasible idea or should I go about optimizing my site in other ways like using CSS sprite-maps, etc. to reduce requests and bandwidth?
Browsers (especially IE) get jumpy about this and alert users that there's mixed content on the page. We tried it and had a couple of users call in to question the security of our site. I wouldn't recommend it. Having users lose their sense of security when using your site is not worth the added speed.
Do not mix content, there is nothing more annoying then having to go and click the yes button on that dialog. I wish IE would let me always select show mixed content sites. As Chris said don't do it.
If you want to optimize your site, there are plenty of ways, if SSL is the only way left buy a hardware accelerator....hmmm if you load an image using http will it be cached if you load it with https? Just a side question that I need to go find out.
Be aware that in IE 7 there are issues with mixing secure and non-secure items on the same page, so this may result in some users not being able to view all the content of your pages properly. Not that I endorse IE 7, but recently I had to look into this issue, and it's a pain to deal with.
This is not advisable at all. The reason browsers give you such trouble about insecure content on secure pages is it exposes information about the current session and leaves you vulnerable to man-in-the-middle attacks. I'll grant there probably isn't much a 3rd party could do to sniff venerable info if the only insecured content is images, but CSS can contain reference to javascript/vbscript via behavior files (IE). If your javascript is served insecurely, there isn't much that can be done to prevent a rouge script scraping your webpage at an inopportune time.
At best, you might be able to get a way with iframing secure content to keep the look and feel. As a consumer I really don't like it, but as a web developer I've had to do that before due to no other pragmatic options. But, frankly, there's just as many if not more defects with that, too, as after all, you're hoping that something doesn't violate the integrity of the insecure content so that it may host the secure content and not some alternate content.
It's just not a great idea from a security perspective.

Resources