954,541 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Regular expressions

Hello,

I have a group of regular expressions

i need to get the word after elemenet i.e Vitals, Network ect ect

How would i do it?

adam291086
Junior Poster in Training
61 posts since Nov 2008
Reputation Points: 10
Solved Threads: 0
 

Here is a way

import re

data = """
<Element Generation at d66238>
<Element Vitals at d662b0>
<Element Network at d66670>
<Element Hardware at d66eb8>
<Element Memory at d6ac88>
<Element Swap at d6e0a8>
<Element Swapdevices at d6e238>
<Element FileSystem at d6e5d0>
<Element Vitals at d662b0>
"""
expr = re.compile(r"<Element ([^\s]+)")

if __name__ == "__main__":
  for match in expr.finditer(data):
    print match.group(1)
Gribouillis
Posting Maven
Moderator
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
 

Can you explain that to me? I want to understant it

adam291086
Junior Poster in Training
61 posts since Nov 2008
Reputation Points: 10
Solved Threads: 0
 

Well, in the regular expression language, the string "<Element ([^\s]+)" means the character < followed by the character E ... followed by t followed by a single space followed by a group (...) which will be refered to later as group(1) . The \s means a whitespace character, [^\s] means any non whitespace character and the + means one or more such non whitespace characters. Finally, the r"..." syntax means don't interprete the backslashes in the string. Now the statement expr = re.compile(r"...") creates a regular expression object with my string, which has methods to search the occurences of the expression in a string. One of these methods is expr.finditer which iterates over all the matches found in the string data. For each such match, a match object match is created, which contain methods to access the occurrence found in the string. match.group(1) retrieves the part of the string which corresponds to the group ([^\s]+) .

Gribouillis
Posting Maven
Moderator
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
 

Here's some good reading on regular expressions.

jlm699
Veteran Poster
1,112 posts since Jul 2008
Reputation Points: 355
Solved Threads: 292
 

Thanks guys really helpful.

The method wont work on the following script and i get the error

Traceback (most recent call last): File "/Users/adamplowman/Desktop/getting_xml_info copy.py", line 34, in for match in expr.finditer(txt): TypeError: expected string or buffer
#!/usr/bin/env python

from xml.etree import ElementTree as ET
import os
import urllib
import re
info={}
test={}
def find_text(element):
    if element.text is None:
        for subelement in element:
            for txt in find_text(subelement):
                yield txt
                
    else:
        info[element] = element.text
        
        
data = " "
      

feed = urllib.urlopen("http://server-up.theatticnetwork.net/demo/")
try:
    tree = ET.parse(feed)
		
except Exception, inst:
    print "Unexpected error opening %s: %s" % (tree, inst)
    
root= tree.getroot()
text = root.getchildren()
for txt in text:
    expr = re.compile(r"<Element ([^\s]+)")
    if __name__ == "__main__":
        for match in expr.finditer(txt):
            print match.group(1)


i am not sure why though?

adam291086
Junior Poster in Training
61 posts since Nov 2008
Reputation Points: 10
Solved Threads: 0
 

It's because the items in root.getchildren are not strings but Element objects. You could replace the end of your program with

root= tree.getroot()
text = root.getchildren()
expr = re.compile(r"<Element ([^\s]+)")
for element in text:
    txt = str(element)
    for match in expr.finditer(txt):
	print match.group(1)

However, this is not very useful because the data can readily be obtained as a field of the Element object, so you could simply write

root= tree.getroot()
text = root.getchildren()
for element in text:
    print element.tag
Gribouillis
Posting Maven
Moderator
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You