I am working on a site for a Manhattan real estate brokerage firm. The site is content rich - thousands of properties for rent and for sale. What the site is lacking is informational resources regarding NYC neighborhoods. Rather than writing content from scratch about Greenwich Village (for example) I'd like to use the content from the WikiPedia. Wikipedia says I can use their content, as long as I don't modify and acknowledge the source. I'd like to put thousands of NYC related pages into my site.

Is there a quick and efficient way to grab the WikiPedia data and store them on my server?

Has anyone used WikiPedia data to supplement their own websites?

Any thoughts, suggestions, ideas? Any negative side effects?

Recommended Answers

All 10 Replies

Be aware that the content will become out-of-date, and that Google may look at it like duplicate content. Some of my clients used it, misunderstanding my suggestion, and they did get PR for it but little to no traffic.

Oh, and hi Avi. It's Eli

just like directories with dmoz banned, one day in the future, sites with duplicate wikipedia content will be banned. be wary!

If however, your intent is simply to provide information for your users, then by all means go right ahead! Don't worry about PageRank and all that other nonsense: I see nothing whatsoever wrong with providing helpful, specific, relevant content for your users from a willing, ready-made source.

Also I remember reading somewhere that Google was identifying wiki-pedia content and not giving it any weight.

But you know, this should not be your directing principle. The question that you need to ask yourself is that will this content add value to your visitor experience? If yes, by all means go for it.

If you answered that NO and are doing it just for SEO purposes then chuck the plans.

But to answer your original question -- The content at Wikipedia is licensed under GNU GPL. You may read the full text here - http://en.wikipedia.org/wiki/Wikipedia:Text_of_the_GNU_Free_Documentation_License.

BTW you can grab the latest dump of their database in MySQL format from their site and use it on your site in accord to the above license.

All the best.

not necessarly related by webaroo is a program in which you can download the net. They have a package to download entire wiki database. Yuo can set it up so that it updates and when. Not really related but it's interesting

I've never bought into the whole duplicate content thing. If Google detects mirror pages, it might filter one of them out, but I've never witnessed firsthand anything more than that.

I've never bought into the whole duplicate content thing. If Google detects mirror pages, it might filter one of them out, but I've never witnessed firsthand anything more than that.

That's right -- you don't get a panelty for having some duplicate content on your site... it just slides into supplimentary results!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.