943,956 Members | Top Members by Rank

Ad:
  • Perl Discussion Thread
  • Unsolved
  • Views: 2406
  • Perl RSS
Oct 5th, 2007
0

Web Scraping Help

Expand Post »
Hello,
I am working on a project and I need to go to multiple news web sites and get articles pertaining to stock numbers. My ideas so far have been to download an RSS file from somewhere like Google Finance, and then extract the links out of there, follow them, get just the article section and then store it into a database. The problem I am seeing is that the sites are structured too differently for me to write something that can accomplish this. I am looking for help on getting a little more advanced with the scraping and wondering if someone could maybe recommend some perl modules that might make this a little easier.

Thanks for the help!
--
Nick
Similar Threads
Reputation Points: 18
Solved Threads: 4
Junior Poster
stupidenator is offline Offline
192 posts
since Mar 2005
Oct 5th, 2007
0

Re: Web Scraping Help

search CPAN for RSS modules. I have no specific recommendations.
Reputation Points: 246
Solved Threads: 67
Practically a Posting Shark
KevinADC is offline Offline
898 posts
since Mar 2006
Oct 9th, 2007
0

Re: Web Scraping Help

WWW::Mechanize is the de facto module for scraping, and other tasks. Beware though if the target site contains JavaScript, as Mechanize will not execute it.

Also see http://www.research.att.com/sw/tools/wsp/

And FEAR::API at CPAN.
Reputation Points: 18
Solved Threads: 20
Junior Poster
trudge is offline Offline
176 posts
since Sep 2007

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in Perl Forum Timeline: Fetching values in Perl from MySql
Next Thread in Perl Forum Timeline: Problems writing stdout to a file, please help!





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC