How would you count how many opening HTML tags there were & then closing tags in a line?
eg

for line in something:
    for match in re.match("[I]opening_regex_here[/I]",line): 
         # will match <H1> or <html> or whatever
        opentags += 1
    for match in re.match("[I]closing_regex_here[/I]",line):
        # will match <H1> or </html> or </whatever>
        closingtags += 1

You could do it as simple as this ...

html_str = """\
<html>
<head>
<title>Bring up an image with sound</title>
</head>
<body>
<IMG SRC="train.bmp" WIDTH=320 HEIGHT=210 BORDER=5>
<bgsound src="train.wav" loop=2>
</body>
</html>
"""

tag_close = html_str.count('</')
# each close tag also has an open tag, so deduct it
tag_open = html_str.count('<') - tag_close

print( "open = %d  close = %d" % (tag_open, tag_close) )

thanks, I ended up with:

ultimate_regexp = "(?i)<\/?\w+((\s+\w+(\s*=\s*(?:\".*?\"|'.*?'|[^'\">\s]+))?)+\s*|\s*)\/?>"
    for match in re.finditer(ultimate_regexp, line):
        if repr(match.group()).startswith("'</"):
            etcount += 1
        else:
            otcount += 1

I found the re on a website, thanks,
matio

This question has already been answered. Start a new discussion instead.