Parse HTML?

Question

terabyte 0 Junior Poster in Training

13 Years Ago

I'm trying to convert html to plain text (remove all html tags)
I don't want to use regex so I tried the module HTML::Parser

and tried the parse() function but i got this error

Undefined subroutine &main::start called at getwords.pl line 27.

and as a matter of fact I checked the module's source HTML/Parser.pm and there is no parse function

then I downloaded WWW::Mechanize and I got the same error since WWW::Mechanize uses HTML::Parser parse function

really weird

I already downloaded the source from CPAN (http://search.cpan.org/dist/HTML-Parser/Parser.pm) and the parse function is missing.

What can I do? and why is the funcion missing?

html-css perl regex

2 Contributors
2 Replies
277 Views
2 Days Discussion Span
Latest Post 13 Years Ago Latest Post by d5e5

All 2 Replies

d5e5 109 Master Poster

13 Years Ago

Undefined subroutine &main::start called at getwords.pl line 27.

The error message says it can't find your start function. It doesn't mention a function named 'parse', you need to define a function or subroutine named 'start' in your main package.

For example, if you have an html file in your current working directory name 'VerySimpleFile.html' the following script should run OK:

#!/usr/bin/perl
use strict;
use warnings;

use HTML::Parser ();

# Create parser object
my $p = HTML::Parser->new( api_version => 3,
                        start_h => [\&start, "tagname, text"],
                        text_h  => [\&text,  "text"],
                        end_h   => [\&end,   "tagname, text"],
                        marked_sections => 1,
                      );

# Parse directly from file

$p->parse_file('VerySimpleFile.html');

sub start{
    my ($tagname, $text) = @_;
    print "<!-- $tagname starts here................-->\n";
    print $text;
}

sub text{
    my $text = shift;
    print $text;
}

sub end{
    my ($tagname, $text) = @_;
    print "\n<!-- $tagname ends here................-->\n";
    print $text;
}

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

d5e5 109 Master Poster · Answer 1 · 2011-06-28T20:58:35+00:00

The previous script uses the parse_file() method to automatically open and parse the entire file. If you want to parse a file that is already open (such as STDIN) or you want to parse only some records in a large file then you can use the parse() method instead. (I can't find it in the source Parser.pm file either but that doesn't seem to matter -- it works for me anyway.) The following script loops through a file already open in STDIN, and parses it and prints some output.

#!/usr/bin/perl
use strict;
use warnings;

use HTML::Parser ();

# Create parser object
my $p = HTML::Parser->new( api_version => 3,
                        start_h => [\&start, "tagname, text"],
                        text_h  => [\&text,  "text"],
                        end_h   => [\&end,   "tagname, text"],
                        marked_sections => 1,
                      );

# Loop through STDIN and parse each line.
while (<>){
    $p->parse($_);
}
print "\n";

$p->eof;#Tell Parser object we're finished parsing this file

sub start{
    my ($tagname, $text) = @_;
    print "<!-- $tagname starts here................-->\n";
    print $text;
}

sub text{
    my $text = shift;
    print $text;
}

sub end{
    my ($tagname, $text) = @_;
    print "\n<!-- $tagname ends here................-->\n";
    print $text;
}

Parse HTML?

Recommended Answers Collapse Answers

All 2 Replies

Recommended Answers