Web Scraping Help

Question

stupidenator 0 Junior Poster

17 Years Ago

Hello,
I am working on a project and I need to go to multiple news web sites and get articles pertaining to stock numbers. My ideas so far have been to download an RSS file from somewhere like Google Finance, and then extract the links out of there, follow them, get just the article section and then store it into a database. The problem I am seeing is that the sites are structured too differently for me to write something that can accomplish this. I am looking for help on getting a little more advanced with the scraping and wondering if someone could maybe recommend some perl modules that might make this a little easier.

Thanks for the help!
--
Nick

finance perl

3 Contributors
2 Replies
162 Views
4 Days Discussion Span
Latest Post 17 Years Ago Latest Post by trudge

All 2 Replies

KevinADC 192 Practically a Posting Shark

17 Years Ago

search CPAN for RSS modules. I have no specific recommendations.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

trudge 8 Junior Poster · Answer 1 · 2007-10-10T00:11:01+00:00

WWW::Mechanize is the de facto module for scraping, and other tasks. Beware though if the target site contains JavaScript, as Mechanize will not execute it.

Also see http://www.research.att.com/sw/tools/wsp/

And FEAR::API at CPAN.

Web Scraping Help

Recommended Answers Collapse Answers

All 2 Replies

Recommended Answers