0

Hi,
I have an html file and I have to extract information from its table. I am new to perl but I would give my approach below so I could get some help..

In the html file the table is like for example

a    1
b    2
c    3
d    4

then there are two columsn and 4 rows.
that means the html file will have one tag <table> and </table>
and four <tr> and </tr> tags and two <td> and </td> tags if i am right.
Now if I have t o extract whole second column of the table and the elements in a text file separated by newline.

To start with first i will open the file like....

open (HTMLFILE,"webpage.html") or die "Could not open the file";

then I will start reading the file...

while(<HTMLFILE>)
{
    $line = <HTMLFILE>;
  #Dont know if its right
                                                # Here I am trying to set the current line to a variable


    if($line=~m/<table>)
  #if the current line is equal to a particular expression like <table>
    {
        #then send the element to the text file
    }
}

I am very much confused on how to get the second column. Also what is the way to write to the text file each time without overwriting. I cannot use sed or awk.

Please help me!
thanks in advance!!!

Edited by Dani: Formatting fixed

2
Contributors
3
Replies
4
Views
8 Years
Discussion Span
Last Post by katharnakh
0

Hi,
You can use HTML::Parser module to parse html files. You can download the same here.
Also you can go through this link to learn how to do it or perldoc HTML::Parser after you install the module.

... what is the way to write to the text file each time without overwriting. I cannot use sed or awk.

If you want to output to a file, then you can open a file in append mode and write.

katharnakh.

0

Hi,
thanks for the quick reply. As you suggested I read about the html parser function. We are supposed to do it with regular expressions of perl. I believe we would use cat or something to extract. But would it be possible if you can give me a small example of extracting some element of html and outputting it to a text file. Thanks!!

0

Ok, what's your try?
Go through perldoc perlrequick for getting quick hands on regular expression and perldoc perl for finding help on different help topics.

BTW, Its not a place to get your homework done.

katharnakh

This article has been dead for over six months. Start a new discussion instead.
Be sure to adhere to our posting rules.