Your task is to write a program that checks html files, (web pages) to determine whether the embedded XHTML tags are balanced. Balanced tags are necessary for the file to be a valid XHTML file as explained above.
The first phase takes care of reading the file and finding all of the tags from the file. The first phase will print out the tags as it finds them and also save them into a queue of tags. If the input file ends in the middle of a tag, the program will report the error and end.
The second phase of the program takes the queue of tags generated in phase one and analyzes the sequence of tags to make sure that they are properly balanced. This phase will make use of the queue and a stack, as described on the previous slide. As the tags are matched, you should print the matching tags. If the tag is self-closing, you should print it and also that it is self-closing.
This is what i have so far.My phase 2 is not running, can someone please help me

def printGreeting():
    print 'Welcome bla bla bla bla\n\n'

def phase1(file, queue):
    line_num = 0
    start = 0
    end = 0
    ok_tag = False
    for line in file:
        line_num += 1
        if line.count('>') != line.count('<'):
            print 'You have an error in line ', line_num
            return False
        for i in range (len(line)):
            if line[i] == '<':
                start = i
                ok_tag = True
            if line[i] == '>':
                if ok_tag:
                    end = i
                    tag = line[start:end + 1]
                    queue.append(tag)
                    print tag
                    ok_tag = False
    return True


def openTag(tag):
    if tag[1] == '/':
        return False
    return True

def isMatching(tag1, tag2):
    if openTag(tag1):
        if openTag(tag2) == False:
def selfClosing(tag):
    if tag[-2] == '/':
        return True
    return False


def main():
    printGreeting()

    filename = raw_input('Please enter the file name: ')
    file = open(filename, 'r')

    queue = []

    #Phase 1
    if phase1(file, queue) == False:
        print 'File ends in the middle of a tag\n\n'
        file.close()
        sys.exit()
    else:
        print 'Phase 1: End of file was reached for', filename, 'with no errors\n'

    #Phase 2
    print '\PHASE 2'
    stack = []
    for tag in queue:
        if selfClosing(tag):
            print tag, 'is self-closing'
        else:
            if openTag(tag):
                push(stack, tag)
            else:
                tag1 = pop(stack)
                if tag1 == None:
                    print tag, 'has no match'
                    sys.exit()
                if isMatching(tag, tag1): file.close()
        sys.exit()
    else:
        print 'Phase 1: End of file was reached for', filename, 'with no errors\n'

    #Phase 2
    print '\PHASE 2'
    stack = []
    for tag in queue:
        if selfClosing(tag):
            print tag, 'is self-closing'
        else:
            if openTag(tag):
                push(stack, tag)
            else:
                tag1 = pop(stack)
                if tag1 == None:
                    print tag, 'has no match'
                    sys.exit()
                if isMatching(tag, tag1):
                    print tag1, 'is matching', tag
                else:
                    print tag1, 'does not match', tag
                    sys.exit()
    print 'Phase 2: The tags match in this document.\n'


main()

Recommended Answers

All 23 Replies

this supposed to be the output

<html>
<head>
<title>
</title>
</head>
<body>
<center>
<h1>
</h1>
</center>
<b>
</b>
<P>
</P>
<P>
<br />
<br />
<br />
</P>
<hr />
<P>
<foo>
</foo>
</P>
<P>
<br />
<tag1>
<br />
<tag2>
<br />
<tag3>
<br />
</tag3>
<br />
</tag2>
<br />
</tag1>
<br />
</P>
<P>
</P>
</body>
</html>

Phase 1: End of file was reached for xhtml.dat with no errors

<title> matches </title>
<head> matches </head>
<h1> matches </h1>
<center> matches </center>
<b> matches </b>
<P> matches </P>
<br /> is self-closing
<br /> is self-closing
<br /> is self-closing
<P> matches </P>
<hr /> is self-closing
<foo> matches </foo>
<P> matches </P>
<br /> is self-closing
<br /> is self-closing
<br /> is self-closing
<br /> is self-closing
<tag3> matches </tag3>
<br /> is self-closing
<tag2> matches </tag2>
<br /> is self-closing
<tag1> matches </tag1>
<br /> is self-closing
<P> matches </P>
<P> matches </P>
<body> matches </body>
<html> matches </html>

Phase 2: The tags match in this document.
linuxserver1.cs.umbc.edu[117]
but i ma not having nothing after phase

AFAIK Python stack/queue operations are done as:

l = list()  # [] also works
l.append(8) # Equiv. to push
value = l.pop()

Also your isMatching() function matches <head> with </body>.
What you need is

def isMatching(tag1, tag2):
    return openTag(tag1) and not openTag(tag2) and tag1 == tag2.replace('/',''):

its still gibing me the same thing, it still doesnt ouput after phase1, thats is no phase 2 is running

What is the sample data & exact output?

thats the sample output above , this is it again
<html>
<head>
<title>
</title>
</head>
<body>
<center>
<h1>
</h1>
</center>
<b>
</b>
<P>
</P>
<P>
<br />
<br />
<br />
</P>
<hr />
<P>
<foo>
</foo>
</P>
<P>
<br />
<tag1>
<br />
<tag2>
<br />
<tag3>
<br />
</tag3>
<br />
</tag2>
<br />
</tag1>
<br />
</P>
<P>
</P>
</body>
</html>
Phase 1: End of file was reached for xhtml.dat with no errors

<title> matches </title>
<head> matches </head>
<h1> matches </h1>
<center> matches </center>
<b> matches </b>
<P> matches </P>
<br /> is self-closing
<br /> is self-closing
<br /> is self-closing
<P> matches </P>
<hr /> is self-closing
<foo> matches </foo>
<P> matches </P>
<br /> is self-closing
<br /> is self-closing
<br /> is self-closing
<br /> is self-closing
<tag3> matches </tag3>
<br /> is self-closing
<tag2> matches </tag2>
<br /> is self-closing
<tag1> matches </tag1>
<br /> is self-closing
<P> matches </P>
<P> matches </P>
<body> matches </body>
<html> matches </html>
Phase 2: The tags match in this document.
iwas suppose to have a stack.py and a queue.py and import it on my def main.Maybe thats where my problem is.I can give you them and check if you want

No, I mean the sample data(input) & the exact output your script is throwing.
P.S. Check your code, I think you copied it wrong. Has an indentation error at line 71+

EDIT:
Make certain changes to your checking function

def isMatching(tag1, tag2):    
    return openTag(tag1) != openTag(tag2) and tag1.replace('/','') == tag2.replace('/','')

Breakdown:

openTag(tag1) != openTag(tag2) # Both are mutually exclusive. i.e. Both are not closing or opening
tag1.replace('/','') == tag2.replace('/','') # Test if both sans the ending forward-slash are same tags

Sample Data:

<html>
<head>
<title>A</title>
</head>
<body>
<span>ANC</span>
	<p>scs</p>
	<br/>
</body>
</html>

Sample Output:

Welcome bla bla bla bla


<html>
<head>
<title>
</title>
</head>
<body>
<span>
</span>
<p>
</p>
<br/>
</body>
</html>
Phase 1: End of file was reached for in.txt with no errors

\PHASE 2
<title> is matching </title>
<head> is matching </head>
<span> is matching </span>
<p> is matching </p>
<br/> is self-closing
<body> is matching </body>
<html> is matching </html>
Phase 2: The tags match in this document.

As a side note. Avoid using `file` as your variable name. It's a Python in-built.

i think the problem is maybe the stack.py and the queue.py, because we had to import them inside the main.We had to make a stack.py and queue.py.I did what you said but its not giving me different

I got same output as nbastec with his input file (except for the filename prompt/filename in report). This version is missing the handling of closing tag without opening tag.

class Stack(list):
    def push(self,x):
        self.append(x)

    def __iter__(self):
        while self:
            yield self.pop()

    @property
    def top(self):
        if self:
            return self[-1]

class Queue(Stack):
    def push(self, x):
        self.insert(0, x)

def printGreeting():
    print 'Welcome bla bla bla bla\n\n'

def phase1(filein, queue):
    line_num = 0
    start = 0
    end = 0
    ok_tag = False
    with filein as filein:
        for line in filein:
            line_num += 1
            if line.count('>') != line.count('<'):
                print 'You have an error in line ', line_num
                return False
            for i in range (len(line)):
                if line[i] == '<':
                    start = i
                    ok_tag = True
                if line[i] == '>':
                    if ok_tag:
                        end = i
                        tag = line[start:end + 1]
                        queue.push(tag)
                        print tag
                        ok_tag = False
        return True


def openTag(tag):
    if tag[1] == '/':
        return False
    return True

def isMatching(tag1, tag2):    
    return openTag(tag1) != openTag(tag2) and tag1.replace('/','') == tag2.replace('/','')

def selfClosing(tag):
    if tag[-2] == '/':
        return True
    return False


def main():
    printGreeting()

    filename = raw_input('Please enter the file name: ')
    filein = open(filename, 'r')

    queue = Queue()

    #Phase 1
    if phase1(filein, queue) == False:
        print 'File ends in the middle of a tag\n\n'        
    else:
        print 'Phase 1: End of file was reached for', filename, 'with no errors\n'
        # phase1 with closed the file
        assert filein.closed
        #Phase 2
        print '\PHASE 2'
        stack = Stack()
        for tag in queue:
            if selfClosing(tag):
                print tag,'is self-closing'
            else:
                if openTag(tag):
                    stack.push(tag)
                else:
                    tag1 = stack.pop()
                    if isMatching(tag, tag1):
                        print tag1, 'is matching', tag
                    else:
                        print tag1, 'does not match', tag
                        break
                        
        else:
            if not stack:
                print 'Phase 2: End of file was reached for', filename, 'with no errors\n'
            else:
                print 'unmatched', stack, queue


main()

I got same output as nbastec with his input file (except for the filename prompt/filename in report). This version is missing the handling of closing tag without opening tag.

Ya, I overrode the unnecessary code. The OP has obviously made a typo while copying his code, it displays #Phase 2 twice. I just removed it, 'coz it wasn't even valid Python.

for bar in baz:
   ...
   ...
else:
   ...
   ...

Ya, I overrode the unnecessary code. The OP has obviously made a typo while copying his code, it displays #Phase 2 twice. I just removed it, 'coz it wasn't even valid Python.

for bar in baz:
   ...
   ...
else:
   ...
   ...

This is valid Python, just normal for with else.

This is valid Python, just normal for with else.

Didn't know that. What would it be used for semantically?
EDIT: Googled. Found it. More syntactic sugar. Nice.

else is executed when break does not occur in for or while.

>>> a = 1
>>> while a > 0:
	a = int(raw_input('Give number: '))
	if a>=100:
		print('Quiting with number >= 100: %i' % a)
		break
else:
	print('Negative number %i inputed' % a)

	
Give number: 4
Give number: 2
Give number: -2
Negative number -2 inputed
>>> a = 1
>>> while a > 0:
	a = int(raw_input('Give number: '))
	if a>=100:
		print('Quiting with number >= 100: %i' % a)
		break
else:
	print('Negative number %i inputed' % a)

	
Give number: 234
Quiting with number >= 100: 234

they want the program must use command line arguments. At the command line the user must enter
(in this order):
the name of the executable file,
the name of the html file to be checked.
ma little bit confuse now,i have attach the web page(url), so maybe somneone can explain it to me better.
http://www.csee.umbc.edu/courses/201/fall11/assignments/proj_adts.shtml#(1)

Check optparse
or if it's a bit of an overkill

sys.argv
def main(filename):
    printGreeting()

    filein = open(filename, 'r')
## snipeti, snip

if __name__ == '__main__':
    import sys
    if len(sys.argv) == 2:
        main(sys.argv[1])
    else:
        print('Usage: %s <name of html file>' % sys.argv[0])

thanks alot guys, i pray it works, going to run it now.

but pytony,they want us to have a stack.py and queue.py, and then import it,so the class queue and class stack, is that what am going to import?cause it is similar to what i have

We do not want to give ready solution, so you should adapt to your form by yourself. otherwise it would be cheating wouldn't it?

no thats not what am saying, am talking abt understanding what the question want , not solving it

The questions seems to address 3 things:

1) Understanding basic abstract data structures of queue & stack.
2) Uses of stack & queue.
3) Simple HTML parsing.
4) Use of Python Modules.
5) Python command-line args.

So they want you to create 2 classes Stack & Queue (in a separate module/file), having the basic operations of append/push/pop/peek.
Then you need to import these classes using import statement into your __main__ module & use these data structures to achieve the question.


My 2 cents:

Point #1 should be better off learnt using languages having array (a composite data structure) support.

Uses of stack can be better learnt using Reverse-Polish notation for solving arithmetic expressions.

+1 for not using Regular Expressions for HTML parsing.

@nbaztec: I quite agree with you, but maybe they do not expect to use yet classes, but plain module with functions. Also I agree that this program domain is not place for do it yourself but you should use off the self module made by others like Beautifulsoup or lxml.

For the heck of it here is 30 minute version of RPN calculator with basic operations:

import operator

class Stack(list):
    def __init__(self, *args):
        list.__init__(self, args[::-1])
        
    def push(self,x):
        self.append(x)

    def __iter__(self):
        while self:
            yield self.pop()

    @property
    def top(self):
        if self:
            return self[-1]
        
    def __str__(self):
        return str(list(self[::-1]))

    def __repr__(self):
        return 'Stack(%s)' % self

if __name__ == '__main__':
    op = dict((sym, oper) for sym, oper in zip('+-*/', (operator.add, operator.sub,
                                                        operator.mul, operator.truediv)))
    stack = Stack()

    while True:
        print stack
        data = raw_input()
        try:
            data = float(data)
        except ValueError:
            try:
                # iteration reverses stack so we must reverse it for backup
                # must make copy for backup as it is mutable
                restore = Stack(*stack[::-1])
                data = op[data](stack.pop(), stack.pop())
            except KeyError as e:
                if data == 'quit':
                    break
                print 'No operator',e
                stack = restore
                continue
            except IndexError:
                print 'Not enough numbers in stack'
                stack = restore
                continue
            except ZeroDivisionError:
                print 'Can not divide with zero!'
                stack = restore
                continue
            
        stack.push(data)
        
    print 'Bye, bye'

first of all i just want to say thank you guys, you guys have given me the best explanantion to this problem, i really appreciated.My prof is not that good, i am a python beginner and he just comes and assigns labs and projects and i have put a lot of hours on my own understanding it myself and you guys have been amazing and patient with and i want to say thank you

i just have to work on the stack and queue modules to import, i believe i can handle that.Thank you all once again

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.