Web Scraping Help

Please support our Perl advertiser: Programming Forums - DaniWeb Sister Site
Reply

Join Date: Mar 2005
Posts: 192
Reputation: stupidenator is an unknown quantity at this point 
Solved Threads: 4
stupidenator's Avatar
stupidenator stupidenator is offline Offline
Junior Poster

Web Scraping Help

 
0
  #1
Oct 5th, 2007
Hello,
I am working on a project and I need to go to multiple news web sites and get articles pertaining to stock numbers. My ideas so far have been to download an RSS file from somewhere like Google Finance, and then extract the links out of there, follow them, get just the article section and then store it into a database. The problem I am seeing is that the sites are structured too differently for me to write something that can accomplish this. I am looking for help on getting a little more advanced with the scraping and wondering if someone could maybe recommend some perl modules that might make this a little easier.

Thanks for the help!
--
Nick
Reply With Quote Quick reply to this message  
Join Date: Mar 2006
Posts: 898
Reputation: KevinADC has a spectacular aura about KevinADC has a spectacular aura about 
Solved Threads: 67
KevinADC's Avatar
KevinADC KevinADC is offline Offline
Practically a Posting Shark

Re: Web Scraping Help

 
0
  #2
Oct 5th, 2007
search CPAN for RSS modules. I have no specific recommendations.
Reply With Quote Quick reply to this message  
Join Date: Sep 2007
Posts: 176
Reputation: trudge is an unknown quantity at this point 
Solved Threads: 20
trudge trudge is offline Offline
Junior Poster

Re: Web Scraping Help

 
0
  #3
Oct 9th, 2007
WWW::Mechanize is the de facto module for scraping, and other tasks. Beware though if the target site contains JavaScript, as Mechanize will not execute it.

Also see http://www.research.att.com/sw/tools/wsp/

And FEAR::API at CPAN.
Amer Neely - Web Mechanic
"Others make web sites. We make web sites work!"
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:



Similar Threads
Other Threads in the Perl Forum
Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC