User Name Password Register
DaniWeb IT Discussion Community
All
What is DaniWeb IT Discussion Community?
You're currently browsing the Perl section within the Software Development category of DaniWeb, a massive community of 391,991 software developers, web developers, Internet marketers, and tech gurus who are all enthusiastic about making contacts, networking, and learning from each other. In fact, there are 4,307 IT professionals currently interacting right now! Registration is free, only takes a minute and lets you enjoy all of the interactive features of the site.
Please support our Perl advertiser:
Views: 527 | Replies: 3
Reply
Join Date: Dec 2007
Posts: 2
Reputation: adadadad is an unknown quantity at this point 
Rep Power: 0
Solved Threads: 0
adadadad adadadad is offline Offline
Newbie Poster

parsing a webpage

  #1  
Jun 8th, 2008
Hi,
I have an html file and I have to extract information from its table. I am new to perl but I would give my approach below so I could get some help..

In the html file the table is like for example

a 1
b 2
c 3
d 4

then there are two columsn and 4 rows.
that means the html file will have one tag <table> and </table>
and four <tr> and </tr> tags and two <td> and </td> tags if i am right.
Now if I have t o extract whole second column of the table and the elements in a text file separated by newline.

To start with first i will open the file like....

open (HTMLFILE,"webpage.html") or die "Could not open the file";

then I will start reading the file...

while(<HTMLFILE>)
{
$line = <HTMLFILE>;
#Dont know if its right
# Here I am trying to set the current line to a variable


if($line=~m/<table>)
#if the current line is equal to a particular expression like <table>
{
#then send the element to the text file
}
}

I am very much confused on how to get the second column. Also what is the way to write to the text file each time without overwriting. I cannot use sed or awk.

Please help me!
thanks in advance!!!
AddThis Social Bookmark Button
Reply With Quote  
Join Date: Jan 2006
Posts: 215
Reputation: katharnakh is an unknown quantity at this point 
Rep Power: 3
Solved Threads: 19
katharnakh's Avatar
katharnakh katharnakh is offline Offline
Posting Whiz in Training

Re: parsing a webpage

  #2  
Jun 9th, 2008
Hi,
You can use HTML::Parser module to parse html files. You can download the same here.
Also you can go through this link to learn how to do it or perldoc HTML::Parser after you install the module.
Originally Posted by adadadad View Post
... what is the way to write to the text file each time without overwriting. I cannot use sed or awk.
If you want to output to a file, then you can open a file in append mode and write.

katharnakh.
Last edited by katharnakh : Jun 9th, 2008 at 3:50 am.
challenge the limits
Reply With Quote  
Join Date: Dec 2007
Posts: 2
Reputation: adadadad is an unknown quantity at this point 
Rep Power: 0
Solved Threads: 0
adadadad adadadad is offline Offline
Newbie Poster

Re: parsing a webpage

  #3  
Jun 9th, 2008
Hi,
thanks for the quick reply. As you suggested I read about the html parser function. We are supposed to do it with regular expressions of perl. I believe we would use cat or something to extract. But would it be possible if you can give me a small example of extracting some element of html and outputting it to a text file. Thanks!!
Reply With Quote  
Join Date: Jan 2006
Posts: 215
Reputation: katharnakh is an unknown quantity at this point 
Rep Power: 3
Solved Threads: 19
katharnakh's Avatar
katharnakh katharnakh is offline Offline
Posting Whiz in Training

Re: parsing a webpage

  #4  
Jun 9th, 2008
Ok, what's your try?
Go through perldoc perlrequick for getting quick hands on regular expression and perldoc perl for finding help on different help topics.

BTW, Its not a place to get your homework done.

katharnakh
Last edited by katharnakh : Jun 9th, 2008 at 5:44 am.
challenge the limits
Reply With Quote  
Reply

Only community members can participate in forum threads. You must register or log in to contribute.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 

DaniWeb Perl Marketplace
Thread Tools Display Modes

Similar Threads
Other Threads in the Perl Forum

All times are GMT -4. The time now is 9:48 am.
Forum system based on vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
©2003 - 2008 DaniWeb® LLC