•
•
•
•
What is DaniWeb IT Discussion Community?
You're currently browsing the Perl section within the Software Development category of DaniWeb, a massive community of 456,532 software developers, web developers, Internet marketers, and tech gurus who are all enthusiastic about making contacts, networking, and learning from each other. In fact, there are 2,893 IT professionals currently interacting right now! Registration is free, only takes a minute and lets you enjoy all of the interactive features of the site.
Please support our Perl advertiser: Programming Forums
Views: 1201 | Replies: 2
![]() |
•
•
Join Date: Mar 2005
Location: Nebraska, U.S.
Posts: 191
Reputation:
Rep Power: 4
Solved Threads: 4
Hello,
I am working on a project and I need to go to multiple news web sites and get articles pertaining to stock numbers. My ideas so far have been to download an RSS file from somewhere like Google Finance, and then extract the links out of there, follow them, get just the article section and then store it into a database. The problem I am seeing is that the sites are structured too differently for me to write something that can accomplish this. I am looking for help on getting a little more advanced with the scraping and wondering if someone could maybe recommend some perl modules that might make this a little easier.
Thanks for the help!
--
Nick
I am working on a project and I need to go to multiple news web sites and get articles pertaining to stock numbers. My ideas so far have been to download an RSS file from somewhere like Google Finance, and then extract the links out of there, follow them, get just the article section and then store it into a database. The problem I am seeing is that the sites are structured too differently for me to write something that can accomplish this. I am looking for help on getting a little more advanced with the scraping and wondering if someone could maybe recommend some perl modules that might make this a little easier.
Thanks for the help!
--
Nick
•
•
Join Date: Sep 2007
Location: North Bay Ontario
Posts: 176
Reputation:
Rep Power: 2
Solved Threads: 20
WWW::Mechanize is the de facto module for scraping, and other tasks. Beware though if the target site contains JavaScript, as Mechanize will not execute it.
Also see http://www.research.att.com/sw/tools/wsp/
And FEAR::API at CPAN.
Also see http://www.research.att.com/sw/tools/wsp/
And FEAR::API at CPAN.
Amer Neely - Web Mechanic
"Others make web sites. We make web sites work!"
"Others make web sites. We make web sites work!"
![]() |
•
•
•
•
•
•
•
•
DaniWeb Perl Marketplace
•
•
•
•
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
•
•
•
•
adult advertising blog browser browsers browsing community data design development devices domains firefox google india internet java legal linux marketing merger microsoft mobile applications mozilla msn multimedia news php privacy report research rss search security sex software sun technology tutorials users video w3c web web development wiki wikipedia xml yahoo youtube
- web scraping (PHP)
- Anyone know a software product marketer / developer? (Internet Marketing Job Offers)
- Web Site Scraping (Software Developers' Lounge)
Other Threads in the Perl Forum
- Previous Thread: Fetching values in Perl from MySql
- Next Thread: Problems writing stdout to a file, please help!


Linear Mode