With all the recent news and attention to the latest Google products, I've found myself becoming increasingly paranoid about Google. Have things started to take a creepy turn, or is it just my imagination?

For example, the Google Library Project, and the Google Print Project. Who gave Google the right to scan anything they wanted? Since when did an "opt-out" option trump copyright and express consent?

As an author, I'm sensitive to issues of copyright and express consent. This made me wonder, why are we so complacent about Google?

What do I mean? Well, do a Google search. See all of the "cached" content? When did I ever give Google permission to cache my entire site? That's MY content! I own the copyright. Yet Google has cached it, so that anyone can see my content without actually visiting my site! Morever, THEY derive ad revenue from that content, not me.

I would think forum operators in particular would be outraged at this hijacking of their content, yet no one seems to mind.

Once you start down the road to paranoia, it only gets worse. What information about me and my searches are they keeping? How long are they keeping it? With whom are they sharing it? I was an Adsense publisher, before they unilaterally and with no justification or explanation terminated me... that means they also have my social security number. I find that a bit alarming.

Google: I've decided that I will charge US$1million per page for the right to cache my pages. This is retroactive. You can't opt-in, you have to opt-out. If my page images are still cached within 10 days from this notice, I will generate an invoice. Thanks for your business!

Recommended Answers

All 2 Replies

Greetings Thomas,

Yes, with all the press coverage, it does seem like Google is trying to take over the world. But it's not just Google. Yahoo

Claims

to have an index greater than twice the size of Google. Here's something else to be paranoid about - Included in these indices are snapshots of the rooftops of your home and your front and back yard (if applicable). If you live in a major city (ie. NYC), Amazon's A9 search engine may have a snapshot of the exterior of your home!

Fortunately, if you wish to keep the spiders away from indexing your copywritten works, the search engines will respect your wishes and not cache your content. All the major engines follow the same standard - which is refered to as the robots.txt protocol. Simply upload a robots.txt file into your root directory. You can use the disallow command to prevent the bots from crawling your entire site, or particular sections of your site.

Alternatively, you can use a simple meta tag to prevent the bots from indexing. Simply use the noindex command.

And don't forget, the search engine found you because someone linked to your site. If you would like not to be indexed, make sure the link has the rel=nofollow command, which will prevent bots from following it.

As storage space becomes cheaper, and engines rival over who has the biggest index, chances are your cached content will remain for a very long time.

...and to prevent caching, you'd use:

<META NAME="ROBOTS" CONTENT="NOARCHIVE">

which I've done. However, that doesn't force them to throw away anything they've already cached. In fact, Google's FAQ says you can request they remove items from their cache, which they will respond to on a case-by-case basis. Excuse me? Shouldn't I decide who can and cannot "cache" my content?

Now, my site is just a tiny little site promoting my consulting business. I want it to be widely indexed, and I host technical articles in the hopes of drawing traffic. So I have to admit, I want search engines to find me. I would think that forum operators, who derive substantial ad income from the content of their sites, would be in an uproar about Google caching their content. Each time somone views a cached copy of a page, that's a hit stolen from your site.

I use Google. But somewhere in the recent past they crossed the line from being a search engine company, to being an advertising broker. In order to continue to drive ad revenue, they need content to piggy-back. If that means scanning entire libraries of copyrighted material, and caching every site they crawl, so be it. I think that's going too far.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.