Hi,
I have an html file and I have to extract information from its table. I am new to perl but I would give my approach below so I could get some help..

In the html file the table is like for example

a    1
b    2
c    3
d    4

then there are two columsn and 4 rows.
that means the html file will have one tag <table> and </table>
and four <tr> and </tr> tags and two <td> and </td> tags if i am right.
Now if I have t o extract whole second column of the table and the elements in a text file separated by newline.

To start with first i will open the file like....

open (HTMLFILE,"webpage.html") or die "Could not open the file";

then I will start reading the file...

while(<HTMLFILE>)
{
    $line = <HTMLFILE>;
  #Dont know if its right
                                                # Here I am trying to set the current line to a variable


    if($line=~m/<table>)
  #if the current line is equal to a particular expression like <table>
    {
        #then send the element to the text file
    }
}

I am very much confused on how to get the second column. Also what is the way to write to the text file each time without overwriting. I cannot use sed or awk.

Please help me!
thanks in advance!!!

Recommended Answers

All 3 Replies

Hi,
You can use HTML::Parser module to parse html files. You can download the same here.
Also you can go through this link to learn how to do it or perldoc HTML::Parser after you install the module.

... what is the way to write to the text file each time without overwriting. I cannot use sed or awk.

If you want to output to a file, then you can open a file in append mode and write.

katharnakh.

Hi,
thanks for the quick reply. As you suggested I read about the html parser function. We are supposed to do it with regular expressions of perl. I believe we would use cat or something to extract. But would it be possible if you can give me a small example of extracting some element of html and outputting it to a text file. Thanks!!

Ok, what's your try?
Go through perldoc perlrequick for getting quick hands on regular expression and perldoc perl for finding help on different help topics.

BTW, Its not a place to get your homework done.

katharnakh

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.