Please support our Python advertiser: Programming Forums
![]() |
•
•
Join Date: Nov 2008
Posts: 30
Reputation:
Rep Power: 1
Solved Threads: 0
Hello,
I have a group of regular expressions
<Element Generation at d66238>
<Element Vitals at d662b0>
<Element Network at d66670>
<Element Hardware at d66eb8>
<Element Memory at d6ac88>
<Element Swap at d6e0a8>
<Element Swapdevices at d6e238>
<Element FileSystem at d6e5d0>
<Element Vitals at d662b0>
i need to get the word after elemenet i.e Vitals, Network ect ect
How would i do it?
I have a group of regular expressions
<Element Generation at d66238>
<Element Vitals at d662b0>
<Element Network at d66670>
<Element Hardware at d66eb8>
<Element Memory at d6ac88>
<Element Swap at d6e0a8>
<Element Swapdevices at d6e238>
<Element FileSystem at d6e5d0>
<Element Vitals at d662b0>
i need to get the word after elemenet i.e Vitals, Network ect ect
How would i do it?
Here is a way
python Syntax (Toggle Plain Text)
import re data = """ <Element Generation at d66238> <Element Vitals at d662b0> <Element Network at d66670> <Element Hardware at d66eb8> <Element Memory at d6ac88> <Element Swap at d6e0a8> <Element Swapdevices at d6e238> <Element FileSystem at d6e5d0> <Element Vitals at d662b0> """ expr = re.compile(r"<Element ([^\s]+)") if __name__ == "__main__": for match in expr.finditer(data): print match.group(1)
Well, in the regular expression language, the string
"<Element ([^\s]+)" means the character < followed by the character E ... followed by t followed by a single space followed by a group (...) which will be refered to later as group(1) . The \s means a whitespace character, [^\s] means any non whitespace character and the + means one or more such non whitespace characters. Finally, the r"..." syntax means don't interprete the backslashes in the string. Now the statement expr = re.compile(r"...") creates a regular expression object with my string, which has methods to search the occurences of the expression in a string. One of these methods is expr.finditer which iterates over all the matches found in the string data. For each such match, a match object match is created, which contain methods to access the occurrence found in the string. match.group(1) retrieves the part of the string which corresponds to the group ([^\s]+) . Last edited by Gribouillis : Dec 1st, 2008 at 11:03 am.
Here's some good reading on regular expressions.
Let's Go Pens!
** Just because I reply to your question does not invite you to PM me. Keep discussions on the thread of topic, I will not answer your questions over PM. **
** Just because I reply to your question does not invite you to PM me. Keep discussions on the thread of topic, I will not answer your questions over PM. **
•
•
Join Date: Nov 2008
Posts: 30
Reputation:
Rep Power: 1
Solved Threads: 0
Thanks guys really helpful.
The method wont work on the following script and i get the error
i am not sure why though?
The method wont work on the following script and i get the error
•
•
•
•
Traceback (most recent call last):
File "/Users/adamplowman/Desktop/getting_xml_info copy.py", line 34, in <module>
for match in expr.finditer(txt):
TypeError: expected string or buffer
#!/usr/bin/env python
from xml.etree import ElementTree as ET
import os
import urllib
import re
info={}
test={}
def find_text(element):
if element.text is None:
for subelement in element:
for txt in find_text(subelement):
yield txt
else:
info[element] = element.text
data = " "
feed = urllib.urlopen("http://server-up.theatticnetwork.net/demo/")
try:
tree = ET.parse(feed)
except Exception, inst:
print "Unexpected error opening %s: %s" % (tree, inst)
root= tree.getroot()
text = root.getchildren()
for txt in text:
expr = re.compile(r"<Element ([^\s]+)")
if __name__ == "__main__":
for match in expr.finditer(txt):
print match.group(1)
i am not sure why though?
It's because the items in root.getchildren are not strings but Element objects. You could replace the end of your program with
However, this is not very useful because the data can readily be obtained as a field of the Element object, so you could simply write
python Syntax (Toggle Plain Text)
root= tree.getroot() text = root.getchildren() expr = re.compile(r"<Element ([^\s]+)") for element in text: txt = str(element) for match in expr.finditer(txt): print match.group(1)
python Syntax (Toggle Plain Text)
root= tree.getroot() text = root.getchildren() for element in text: print element.tag
![]() |
Similar Threads
Other Threads in the Python Forum
- Regular Expressions (VB.NET)
- Parsing a log file using regular expressions (C#)
- regular expressions (C#)
- PHP4 regular expressions? (PHP)
- Regular Expressions with Decimal Points (Java)
- Regular Expressions (C#)
- Regular Expressions (Computer Science)
- Regular Expressions (C#)
- mod_rewrite: help with regular expressions (Linux Servers and Apache)
- matching regular expressions (Java)
Other Threads in the Python Forum
- Previous Thread: Python and Linux
- Next Thread: Python, Cgi and MySQL
•
•
•
•
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)





Linear Mode