Generalized Web Scraping

Question

hopewemakeit 0 Newbie Poster

14 Years Ago

I am an undergraduate Student, from Computer Science and engineering department

I can construct a crawler in Perl, for one particular web-site to fetch the useful information, in my case the - Job Ads at that company's webpage.

Now, I want to construct some crawler that is generalized for say around 100 companies, using Perl

How can I do it ? I need some ideas/code/resource... and Do I need to study all 100 HTML codes?

Regards,
Kunal

perl

2 Contributors
3 Replies
152 Views
2 Weeks Discussion Span
Latest Post 14 Years Ago Latest Post by hopewemakeit

All 3 Replies

mitchems 12 Posting Whiz in Training

14 Years Ago

Look into

HTML::Parse

. It is event driven and tag driven. You create functions when a tag opens or closes and how to deal with it. Play around with it and see if you can more effectively parse HTML with it.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

mitchems 12 Posting Whiz in Training · Answer 1 · 2009-06-05T23:03:13+00:00

I apologize in advance for back-posting. I meant

HTML::Parser

HTML::Parse is deprecated

hopewemakeit 0 Newbie Poster · Answer 2 · 2009-06-12T16:27:43+00:00

hi,
thanks for the post

I have made crawlers for one web-site and it really is based on the Job-portal on that site and its HTML coding.. as in , like for what HTML tag opens and closes, and accordingly the data retrieval.,in between them (the one i need)

But I really cant figure it out, there are 100 web pages before me and I need to create a common scraper and all the HTML codes/tags are different.

Generalized Web Scraping

Recommended Answers Collapse Answers

All 3 Replies

Recommended Answers