943,719 Members | Top Members by Rank

Ad:
  • Perl Discussion Thread
  • Unsolved
  • Views: 847
  • Perl RSS
May 28th, 2009
0

Generalized Web Scraping

Expand Post »
I am an undergraduate Student, from Computer Science and engineering department

I can construct a crawler in Perl, for one particular web-site to fetch the useful information, in my case the - Job Ads at that company's webpage.

Now, I want to construct some crawler that is generalized for say around 100 companies, using Perl

How can I do it ? I need some ideas/code/resource... and Do I need to study all 100 HTML codes?

Regards,
Kunal
Similar Threads
Reputation Points: 10
Solved Threads: 0
Newbie Poster
hopewemakeit is offline Offline
2 posts
since May 2009
Jun 5th, 2009
0

Re: Generalized Web Scraping

Look into
Perl Syntax (Toggle Plain Text)
  1. HTML::Parse
. It is event driven and tag driven. You create functions when a tag opens or closes and how to deal with it. Play around with it and see if you can more effectively parse HTML with it.
Reputation Points: 26
Solved Threads: 38
Posting Whiz in Training
mitchems is offline Offline
293 posts
since Feb 2009
Jun 5th, 2009
0

Re: Generalized Web Scraping

I apologize in advance for back-posting. I meant
Perl Syntax (Toggle Plain Text)
  1. HTML::Parser
  2.  
  3. HTML::Parse is deprecated
Reputation Points: 26
Solved Threads: 38
Posting Whiz in Training
mitchems is offline Offline
293 posts
since Feb 2009
Jun 12th, 2009
0

Re: Generalized Web Scraping

hi,
thanks for the post

I have made crawlers for one web-site and it really is based on the Job-portal on that site and its HTML coding.. as in , like for what HTML tag opens and closes, and accordingly the data retrieval.,in between them (the one i need)

But I really cant figure it out, there are 100 web pages before me and I need to create a common scraper and all the HTML codes/tags are different.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
hopewemakeit is offline Offline
2 posts
since May 2009

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in Perl Forum Timeline: help
Next Thread in Perl Forum Timeline: how to edit a post made 10days back





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC