1.11M Members

Generalized Web Scraping

 
0
 

I am an undergraduate Student, from Computer Science and engineering department

I can construct a crawler in Perl, for one particular web-site to fetch the useful information, in my case the - Job Ads at that company's webpage.

Now, I want to construct some crawler that is generalized for say around 100 companies, using Perl

How can I do it ? I need some ideas/code/resource... and Do I need to study all 100 HTML codes?

Regards,
Kunal

 
0
 

Look into

HTML::Parse

. It is event driven and tag driven. You create functions when a tag opens or closes and how to deal with it. Play around with it and see if you can more effectively parse HTML with it.

 
0
 

I apologize in advance for back-posting. I meant

HTML::Parser

HTML::Parse is deprecated
 
0
 

hi,
thanks for the post

I have made crawlers for one web-site and it really is based on the Job-portal on that site and its HTML coding.. as in , like for what HTML tag opens and closes, and accordingly the data retrieval.,in between them (the one i need)

But I really cant figure it out, there are 100 web pages before me and I need to create a common scraper and all the HTML codes/tags are different.

You
This article has been dead for over six months: Start a new discussion instead
Post:
Start New Discussion
Tags Related to this Article