i/o, stack and queue help

Question

MRWIGGLES 0 Newbie Poster

15 Years Ago

if i had a data file with html/xhtml tags:

Code:
<html>
<head>
<title> data file </title>
</head>

<body>
<center><h1>
heading 1
</h1></center>

<b>bolded</b>
<P>paragraph</P>
<P>
<br />

how would get a python program to read ONLY the start and end tags and and enqueue them in a queue?

for example, the queue for this would look:

Thanks!

python queue

4 Contributors
4 Replies
214 Views
1 Day Discussion Span
Latest Post 15 Years Ago Latest Post by vegaseat

All 4 Replies

vegaseat 1,735 DaniWeb's Hypocrite

15 Years Ago

Is there a more simplistic way to do this without having to import things? I'm a beginning at Python.

You really don't need a queue, a list will do fine. Here is a more simple version of Gribouillis' code that actually works in Python2 or Python3 ...

# extract tag names in an html code file
# works with Python2 and Python3

try:
    # Python2
    import HTMLParser as hp
except ImportError:
    # Python3
    import html.parser as hp

class MyHTMLParser(hp.HTMLParser):
    def __init__(self):
        hp.HTMLParser.__init__(self)
        self.tag_list = list()

    def handle_starttag(self, tag, attrs):
        self.tag_list.append("<%s>" % tag)

    def handle_endtag(self, tag):
        self.tag_list.append("</%s>" % tag)


parser = MyHTMLParser()
# pick an HTML file you have in the working directory
# or give the full file path
filename = "test1.htm"
parser.feed(open(filename).read())
parser.close()
for tag in parser.tag_list:
    print(tag)

"""typical result -->
<html>
<head>
<title>
</title>
</head>
<body>
<table>
<tr>
<td>
<img>
<a>
</a>
</td>
</tr>
</table>
</body>
</html>
"""

Note that Python is a modular language and comes with many thoroughly tested and optimized modules. To code in Python means you have to use those modules for your advantage. Python syntax may be easy, but remembering all those modules may use all the power of your brain!

Edited 15 Years Ago by vegaseat because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 1 · 2009-11-24T22:55:20+00:00

You can use the modules HTMLParser and collections.deque to implement the queue

from HTMLParser import HTMLParser
from collections import deque # deque is a linked list which can be used as fifo or filo

class MyHTMLParser(HTMLParser):

    def __init__(self):
        HTMLParser.__init__(self)
        self.tag_deque = deque()

    def handle_starttag(self, tag, attrs):
        self.tag_deque.append("<{t}>".format(t=tag))

    def handle_endtag(self, tag):
        self.tag_deque.append("</{t}>".format(t=tag))


def main():
    parser = MyHTMLParser()
    filename = "mydatafile.html"
    parser.feed(open(filename).read())
    parser.close()
    print(parser.tag_deque)

if __name__ == "__main__":
    main()

MRWIGGLES 0 Newbie Poster · Answer 2 · 2009-11-25T03:27:50+00:00

Is there a more simplistic way to do this without having to import things? I'm a beginning at Python.

pythopian 10 Junior Poster in Training · Answer 3 · 2009-11-25T06:17:21+00:00

Is there a more simplistic way to do this without having to import things? I'm a beginning at Python.

You will always have to import stuff for anything but the most trivial tasks.

i/o, stack and queue help

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers