Generalized Web Scraping

Reply

Join Date: May 2009
Posts: 2
Reputation: hopewemakeit is an unknown quantity at this point 
Solved Threads: 0
hopewemakeit hopewemakeit is offline Offline
Newbie Poster

Generalized Web Scraping

 
0
  #1
May 28th, 2009
I am an undergraduate Student, from Computer Science and engineering department

I can construct a crawler in Perl, for one particular web-site to fetch the useful information, in my case the - Job Ads at that company's webpage.

Now, I want to construct some crawler that is generalized for say around 100 companies, using Perl

How can I do it ? I need some ideas/code/resource... and Do I need to study all 100 HTML codes?

Regards,
Kunal
Reply With Quote Quick reply to this message  
Join Date: Feb 2009
Posts: 57
Reputation: mitchems is an unknown quantity at this point 
Solved Threads: 2
mitchems's Avatar
mitchems mitchems is offline Offline
Junior Poster in Training

Re: Generalized Web Scraping

 
0
  #2
Jun 5th, 2009
Look into
  1. HTML::Parse
. It is event driven and tag driven. You create functions when a tag opens or closes and how to deal with it. Play around with it and see if you can more effectively parse HTML with it.
And don't tell me there isn't one bit of difference between null and space, because that's exactly how much difference there is.

Larry Wall
Reply With Quote Quick reply to this message  
Join Date: Feb 2009
Posts: 57
Reputation: mitchems is an unknown quantity at this point 
Solved Threads: 2
mitchems's Avatar
mitchems mitchems is offline Offline
Junior Poster in Training

Re: Generalized Web Scraping

 
0
  #3
Jun 5th, 2009
I apologize in advance for back-posting. I meant
  1. HTML::Parser
  2.  
  3. HTML::Parse is deprecated
And don't tell me there isn't one bit of difference between null and space, because that's exactly how much difference there is.

Larry Wall
Reply With Quote Quick reply to this message  
Join Date: May 2009
Posts: 2
Reputation: hopewemakeit is an unknown quantity at this point 
Solved Threads: 0
hopewemakeit hopewemakeit is offline Offline
Newbie Poster

Re: Generalized Web Scraping

 
0
  #4
Jun 12th, 2009
hi,
thanks for the post

I have made crawlers for one web-site and it really is based on the Job-portal on that site and its HTML coding.. as in , like for what HTML tag opens and closes, and accordingly the data retrieval.,in between them (the one i need)

But I really cant figure it out, there are 100 web pages before me and I need to create a common scraper and all the HTML codes/tags are different.
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:



Similar Threads
Other Threads in the Perl Forum
Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC