Hello,

I have a group of regular expressions

<Element Generation at d66238>
<Element Vitals at d662b0>
<Element Network at d66670>
<Element Hardware at d66eb8>
<Element Memory at d6ac88>
<Element Swap at d6e0a8>
<Element Swapdevices at d6e238>
<Element FileSystem at d6e5d0>
<Element Vitals at d662b0>

i need to get the word after elemenet i.e Vitals, Network ect ect

How would i do it?

Here is a way

import re

data = """
<Element Generation at d66238>
<Element Vitals at d662b0>
<Element Network at d66670>
<Element Hardware at d66eb8>
<Element Memory at d6ac88>
<Element Swap at d6e0a8>
<Element Swapdevices at d6e238>
<Element FileSystem at d6e5d0>
<Element Vitals at d662b0>
"""
expr = re.compile(r"<Element ([^\s]+)")

if __name__ == "__main__":
  for match in expr.finditer(data):
    print match.group(1)

Well, in the regular expression language, the string "<Element ([^\s]+)" means the character < followed by the character E ... followed by t followed by a single space followed by a group (...) which will be refered to later as group(1) . The \s means a whitespace character, [^\s] means any non whitespace character and the + means one or more such non whitespace characters. Finally, the r"..." syntax means don't interprete the backslashes in the string. Now the statement expr = re.compile(r"...") creates a regular expression object with my string, which has methods to search the occurences of the expression in a string. One of these methods is expr.finditer which iterates over all the matches found in the string data. For each such match, a match object match is created, which contain methods to access the occurrence found in the string. match.group(1) retrieves the part of the string which corresponds to the group ([^\s]+) .

Thanks guys really helpful.

The method wont work on the following script and i get the error

Traceback (most recent call last):
File "/Users/adamplowman/Desktop/getting_xml_info copy.py", line 34, in <module>
for match in expr.finditer(txt):
TypeError: expected string or buffer

#!/usr/bin/env python

from xml.etree import ElementTree as ET
import os
import urllib
import re
info={}
test={}
def find_text(element):
    if element.text is None:
        for subelement in element:
            for txt in find_text(subelement):
                yield txt
                
    else:
        info[element] = element.text
        
        
data = " "
      

feed = urllib.urlopen("http://server-up.theatticnetwork.net/demo/")
try:
    tree = ET.parse(feed)
		
except Exception, inst:
    print "Unexpected error opening %s: %s" % (tree, inst)
    
root= tree.getroot()
text = root.getchildren()
for txt in text:
    expr = re.compile(r"<Element ([^\s]+)")
    if __name__ == "__main__":
        for match in expr.finditer(txt):
            print match.group(1)

i am not sure why though?

It's because the items in root.getchildren are not strings but Element objects. You could replace the end of your program with

root= tree.getroot()
text = root.getchildren()
expr = re.compile(r"<Element ([^\s]+)")
for element in text:
    txt = str(element)
    for match in expr.finditer(txt):
	print match.group(1)

However, this is not very useful because the data can readily be obtained as a field of the Element object, so you could simply write

root= tree.getroot()
text = root.getchildren()
for element in text:
    print element.tag
This article has been dead for over six months. Start a new discussion instead.