954,353 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Using WikiPedia Content

I am working on a site for a Manhattan real estate brokerage firm. The site is content rich - thousands of properties for rent and for sale. What the site is lacking is informational resources regarding NYC neighborhoods. Rather than writing content from scratch about Greenwich Village (for example) I'd like to use the content from the WikiPedia. Wikipedia says I can use their content, as long as I don't modify and acknowledge the source. I'd like to put thousands of NYC related pages into my site.

Is there a quick and efficient way to grab the WikiPedia data and store them on my server?

Has anyone used WikiPedia data to supplement their own websites?

Any thoughts, suggestions, ideas? Any negative side effects?

jewboy
Posting Whiz in Training
269 posts since Feb 2005
Reputation Points: 11
Solved Threads: 1
 

Wikimedia's API is used specifically for this purpose.

http://en.wikipedia.org/wiki/API

cscgal
The Queen of DaniWeb
Administrator
19,421 posts since Feb 2002
Reputation Points: 1,474
Solved Threads: 229
 

Be aware that the content will become out-of-date, and that Google may look at it like duplicate content. Some of my clients used it, misunderstanding my suggestion, and they did get PR for it but little to no traffic.

Oh, and hi Avi. It's Eli

FeldBum
Newbie Poster
11 posts since Apr 2006
Reputation Points: 10
Solved Threads: 0
 

just like directories with dmoz banned, one day in the future, sites with duplicate wikipedia content will be banned. be wary!

olddocks
Junior Poster in Training
70 posts since Jul 2005
Reputation Points: 10
Solved Threads: 0
 

If however, your intent is simply to provide information for your users, then by all means go right ahead! Don't worry about PageRank and all that other nonsense: I see nothing whatsoever wrong with providing helpful, specific, relevant content for your users from a willing, ready-made source.

tgreer
Made Her Cry
Team Colleague
2,118 posts since Dec 2004
Reputation Points: 227
Solved Threads: 37
 

Also I remember reading somewhere that Google was identifying wiki-pedia content and not giving it any weight.

But you know, this should not be your directing principle. The question that you need to ask yourself is that will this content add value to your visitor experience? If yes, by all means go for it.

If you answered that NO and are doing it just for SEO purposes then chuck the plans.

But to answer your original question -- The content at Wikipedia is licensed under GNU GPL. You may read the full text here - http://en.wikipedia.org/wiki/Wikipedia:Text_of_the_GNU_Free_Documentation_License .

BTW you can grab the latest dump of their database in MySQL format from their site and use it on your site in accord to the above license.

All the best.

pulse
Posting Whiz in Training
227 posts since Aug 2004
Reputation Points: 11
Solved Threads: 0
 

not necessarly related by webaroo is a program in which you can download the net. They have a package to download entire wiki database. Yuo can set it up so that it updates and when. Not really related but it's interesting

ardentsunshine
Junior Poster in Training
68 posts since Apr 2006
Reputation Points: 10
Solved Threads: 2
 

I've never bought into the whole duplicate content thing. If Google detects mirror pages, it might filter one of them out, but I've never witnessed firsthand anything more than that.

cscgal
The Queen of DaniWeb
Administrator
19,421 posts since Feb 2002
Reputation Points: 1,474
Solved Threads: 229
 
I've never bought into the whole duplicate content thing. If Google detects mirror pages, it might filter one of them out, but I've never witnessed firsthand anything more than that.


That's right -- you don't get a panelty for having some duplicate content on your site... it just slides into supplimentary results!

pulse
Posting Whiz in Training
227 posts since Aug 2004
Reputation Points: 11
Solved Threads: 0
 

Yay, an advocate :mrgreen:

cscgal
The Queen of DaniWeb
Administrator
19,421 posts since Feb 2002
Reputation Points: 1,474
Solved Threads: 229
 

another way to copy content is through the Special:Import and Special:Export pages, explanation here: http://meta.wikimedia.org/wiki/Help:Export

IpbWiki
Newbie Poster
3 posts since Jun 2006
Reputation Points: 10
Solved Threads: 0
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You