This week Adobe released the Spry toolkit that finally allows web designers to join the, until now, programmer led world of Ajax development. At the same time, Backbase introduced an Ajax development tool for Java applications. But what use is an Ajax powered website when search engines such as Google can’t see it?

Ajax is the future of the web, declare the usual suspects, and with good cause. It is, undoubtedly an exciting technology that is already driving forward the idea of Rich Internet Applications. Yet dynamic client-server interaction and the display associated client-side trickery that accompanies it, cannot and will not be spidered, indexed or cached by search engines that don’t understand Ajax. In fact it’s even simpler than that: search engines not only don’t understand it, they don’t even see it. The irony being that Google is something of an Ajax pioneer, and GMail one of the best known Ajax applications

Think about it. A still typical, static HTML driven website with a couple of dozen short pages will see them all indexed by Google, driving traffic precisely to where it wants to go. An Ajaxified ‘fat client application’ site consisting of a single page (<body> element) and XML based content being loaded by JavaScript under user control (an onload event) with local interaction and navigation makes for a very rich user experience. It also makes for a very invisible site as far as the Google spiders are concerned. For all intents and purposes it is no more, it has ceased to be, it is an ex-website (with apologies to Monty Python).

So what are the workarounds? Well you could have all the content in the page, but invisible to the user. Do that, however, and Google will almost certainly treat it as cloaking, serving up one page for web spiders and another for real people. The end result being much the same, your site gets blacklisted and so not listed. You could design two distinct versions of the content, the Ajax one and a ‘traditional’ one that is accessible to Google. While this may well work in as far as ensuring your site shows up in the land of search, it also means your Ajax development is a waste of time because Google will send everyone to the non-Ajaxified version anyway. Then there’s the database driven site solution of using querystring parameters using a Google sitemap. Apart from there being no querystring appended URLs, courtesy of the Ajax client side searching, unless you create them all artificially which is no mean feat for anything but the least complex of sites (and why bother with Ajax if all you have is a one page vanity web anyway?)

It seems to me that in order to successfully circumvent the Google search engine optimization, cloaking, spider-fodder rules in our Ajax driven vision of the future, you would have to make changes that would allow the SEO gangsters to ply their evil trade. Although there is some glimmer of hope in as far as Google has implemented Sitemaps, and it does enable pseudo content to flag text block priority to AdSense spiders. Until and unless a big change does come, some kind of content duplication would appear the only way forward. I know of one site which has a nifty Ajax navigation menu system that is fully duplicated within a static footer menu purely to bring server generated content to the party and allow Google to see the blasted thing. Or are I, and much of the Ajax obsessed development community, missing the point here? Is Ajax really only suitable for what has been cruelly described as client-side interface candy and nothing more?

I might run the risk of being called a heretic for saying it, but say it I will: in your head long rush towards Web 2.0, be careful you don’t get caught by the Ajax tripwire.

This leaves me to ask two things of you, dear reader:

1. Please don’t tell me that it’s AJAX not Ajax. According to the man himself, Jesse James Garrett, Ajax is not an acronym. Yes, most people will tell you it is short for Asynchronous JavaScript and XML, but most people are wrong. It’s actually shorthand for Asynchronous JavaScript+CSS+DOM+XMLHttpRequest.

2. How do you build an Ajaxified website without it disappearing from the face of the web?

Recommended Answers

All 6 Replies

Ajax is nothing new, it's just another scripting language for web applications.
And like all of them it's way overhyped.

I think it's important to realize that any RIA (Rich Internet Application), whether designed with Ajax, Adobe Flex, or Java, is a different animal than a typical website.

Some typical uses of an RIA are for product ordering interfaces, complex user-driven reporting, custom configuration interfaces, and so on. It really isn't important for search engines to see such interfaces. This is because they are a sub-component of a much larger site/system. While a search engine may get the user in the door, they will be directed to the RIA itself through the normal operation of the system.

Also, if such applications become the norm, it is the Search Engine companys' job to learn how to index them. I'm certainly not going to dumb-down my development projects because of a fault in a search engine. I'll trust that with their better resources and core focus on search technology, they'll catch up.

tgreer, you say you're "certainly not going to dumb-down my development projects because of a fault in a search engine." What if you are running a corporate site where the difference between search engine rankings and how well-indexed in Google you are means having your head served on a silver platter to your investors? That is the position that most businesses in corporate America are in, given all the hype the Internet gained has lately.

Also take into consideration the difference between the search engines not knowing how to spider and not wanting to spider. People are still using mod_rewrite to rewrite dynamic URLs to have .html endings to get more pages indexed into the search engines (ie DaniWeb). However, search engines have had the capability of searching multiple query dynamic URLs for nearly a decade. Why, then, is the technique still so heavily used? Because, even today, it does make all the difference.

Essentially, Google is fully capable of indexing dynamic pages and yet hesitant to do so, afraid that their spiders will end up in an infinite loop since query strings are capable of creating a virtual unlimited number of distinct pages in just one line of code.

I think it's all just a matter of playing catch-up. Perhaps in 5 years from now, Ajax sites will be capble of being indexed. And in a decade, we'll start seeing them actually indexed.

Think about forms, which have been around since the modern WWW's beginnings. Spiders are still incapable of filling out or submitting forms. Why? Because they can be filled out with an infinite amount of data, each producing different results. What would Googlebot type into a searchbox, for example? It would be ridiculous to spider the search results.

In many cases, Ajax works the same way, with most of its uses dealing with user interaction. Googlebot is meant to be a spider, crawling around the web following hyperlinks. We've not yet reached the day when a spider is an artificial intelligence bot whose job is to mimic human interactions with web pages.

My point was that an RIA is usually a sub-component of a much larger system. It's "Step 4" of an ordering process, for example, and does not need to be indexed. The main site/page, of course, does.

By "dumbing-down", I meant that I would never avoid creating an RIA if it provided the best customer service, just because Google can't see it, nor would I craft elaborate work-arounds as described in the original article.

There is a paranoia in the web development community about never doing anything that might adversely affect search engine rank, often based on erroneous, misunderstood, and ever-changing criteria. I was recently asked about how to disable "QUOTE" tags in vBulletin because the webmaster was afraid of "duplicate content penalties", for example.

I have no problem with a site following SEO best practices, but to avoid developing RIA applications based on whether or not Google can crawl your order form or custom car configurator is a ridiculous extreme.

If you read my posts in the SEO forum, you'll see I am a strong advocate against doing anything for SEO's sake. The search engines are designed to best crawl the most accessibility-friendly websites ... sites that correctly implement heading tags, CSS, lists, etc. These are the most usability-friendly sites from an end-user's perspective, not just a spider's. I am also a huge advocate for XML-based and CSS-only designs (like crazy!).

However, you have to wonder ... are the search engines keeping us stuck in the past with their refusal to fully index dynamic pages (with the operative word being fully), or crawl any type of forms or even execute a smple JavaScript echo statement, which requires no computation? (Although people can argue that search engines are capable of literally reading between the quotes of a JS echo statement without actually parsing it, although this obviously only applies to inline code and not external .js files.)

This was the first fully Ajaxed site that we created: http://3hmp3.com (based on osCommerce, so please excuse the markup). Take a look at it on Google... absolutely no problem.

You just gotta know how to do it.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.