I'm trying to learn the very basics of HTML parsing in python. Through these forums I learned what a parser is.
Parsing often means "perform syntax analysis" on a program or a text. It means check if a text obeys given grammar rules and extract the corresponding information. For example, suppose that you define the rule that the structure of a question in english is
auxiliary verb + subject + main verb + rest . Then the output of the statement
parse("Are they playing football?") could be a hierarchy of tuples, or other objects, like this
("question", ("auxiliary verb", "are"), ("subject", "they"), ("verb", "playing"), ("rest", "football"), )
Programs and compilers handle such trees more easily than raw text." (thanks for that explanation Gribouillis)
So what would the output be if I fed this data-
<html> <body> <h1>My First Heading</h1> <p>My first paragraph.</p> </body> </html>
to the python html parser?
(i.e from html.parser import HTMLParser)