Getting some especial elements in XML file

Question

YasaminKh 0 Light Poster

14 Years Ago

Hi,

I'm new to programming and especially to Python. I have this xml file that has parts like this:

<VECTOR_AVERAGE name="Density Correlations" nvalues="15625">
<SCALAR_AVERAGE indexvalue="( 1,2,0 ) -- ( 3,4,5)">
<COUNT>204160</COUNT>
<MEAN method="jackknife">6.368e-05</MEAN>
<ERROR converged="yes" method="jackknife">2.89e-05</ERROR>
<VARIANCE method="simple">6.37e-05</VARIANCE>
<AUTOCORR method="jackknife">0.843</AUTOCORR>

I need to save MEAN and ERROR. I wrote the following code which works well for me.

import sys
from xml.dom.minidom import parse

# Load XML into tree structure                                                                                                                         
tree = parse(sys.stdin)

#Find all VECTOR_AVERAGE nodes                                                                                                                         
va_list = tree.getElementsByTagName('SIMULATION')[0].getElementsByTagName('AVERAGES')[0].getElementsByTagName('VECTOR_AVERAGE')

#Find the 'Density Correlations' node in the list of VECTOR_AVERAGEs                                                                                   
for va in va_list:
    if va.attributes.getNamedItem('name').nodeValue == 'Density Correlations':
        density_correlations = va
        break

#Get a list of all the SCALAR_AVERAGES                                                                                                                 
sa_list = density_correlations.getElementsByTagName('SCALAR_AVERAGE')

#Initialize lists for holding the data                                                                                                                 
indexvalue_list = []
mean_list = []
error_list = []
variance_list = []
autocorr_list = []


#Iterate over all SCALAR_AVERAGEs and put their values into the lists                                                                                  
#above.                                                                                                                                                
for sa in sa_list:
    indexvalue_list.append(sa.attributes.getNamedItem('indexvalue').nodeValue)
    mean_list.append(float(sa.getElementsByTagName('MEAN')[0].childNodes[0].data))
    error_list.append(float(sa.getElementsByTagName('ERROR')[0].childNodes[0].data))
    variance_list.append(float(sa.getElementsByTagName('VARIANCE')[0].childNodes[0].data))
    autocorr_list.append(float(sa.getElementsByTagName('AUTOCORR')[0].childNodes[0].data))

#Output the data in a 5 column format                                                                                                                  
print "#indexvalue MEAN\tERROR\tVARIANCE\tAUTOCORR"
for i in range(len(mean_list)):
    print "%s\t%g\t%g\t%g\t%g" % (indexvalue_list[i],mean_list[i],error_list[i],variance_list[i],autocorr_list[i])

But my question is that how can i get the above parameters for some particular 'indexvalue's not for all of them.

I'm just interested to get the values for indexvalues that look like this:
( 1,4,3 ) -- ( 1,4,3 ) not all of them. ( or in general like this: ( x,y,z ) -- ( x,y,z )

Anyone has any idea to help me with this?
I really appreciate it.

python xml

Edited 14 Years Ago by YasaminKh because: n/a

2 Contributors
8 Replies
332 Views
1 Day Discussion Span
Latest Post 14 Years Ago Latest Post by TrustyTony

TrustyTony 888 ex-Moderator

14 Years Ago

Would this change to lines 30.-34. help?

#Iterate over all SCALAR_AVERAGEs and put their values into the lists                                                                                  
#above.                                                                                                                                                
for sa in sa_list:
    iv1, _, iv2 = sa.attributes.getNamedItem('indexvalue').nodeValue.partition('--')
    iv1,iv2 = eval(iv1), eval(iv2)
    print '%s != %s indexvalues not same. Not interesting' % (iv1,iv2) if iv1 != iv2 else 'indexvalues same %s, interesting' % iv1
    if iv1 == iv2:
        indexvalue_list.append(x)
        mean_list.append(float(sa.getElementsByTagName('MEAN')[0].childNodes[0].data))
        error_list.append(float(sa.getElementsByTagName('ERROR')[0].childNodes[0].data))
        variance_list.append(float(sa.getElementsByTagName('VARIANCE')[0].childNodes[0].data))
        autocorr_list.append(float(sa.getElementsByTagName('AUTOCORR')[0].childNodes[0].data))

TrustyTony 888 ex-Moderator

14 Years Ago

I do not know what you have, but the code was little wrong from earlier tests here is whole your code again, so that it runs with your input:

import sys
from xml.dom.minidom import parseString

# Load XML into tree structure                                                                                                                         
tree = parseString("""<VECTOR_AVERAGE name="Density Correlations" nvalues="15625">
<SCALAR_AVERAGE indexvalue="( 1,2,0 ) -- ( 3,4,5)">
<COUNT>204160</COUNT>
<MEAN method="jackknife">6.368e-05</MEAN>
<ERROR converged="yes" method="jackknife">2.89e-05</ERROR>
<VARIANCE method="simple">6.37e-05</VARIANCE>
<AUTOCORR method="jackknife">0.843</AUTOCORR>
</SCALAR_AVERAGE> 
</VECTOR_AVERAGE> 
""") # Added two closing tags for testing, made indexvalue same for second test

#Find all VECTOR_AVERAGE nodes                                                                                                                         
va_list = tree.getElementsByTagName('VECTOR_AVERAGE') ## SIMULATION, AVERAGES REMOVED as they are not in input

#Find the 'Density Correlations' node in the list of VECTOR_AVERAGEs                                                                                   
for va in va_list:
    if va.attributes.getNamedItem('name').nodeValue == 'Density Correlations':
        density_correlations = va
        break

#Get a list of all the SCALAR_AVERAGES                                                                                                                 
sa_list = density_correlations.getElementsByTagName('SCALAR_AVERAGE')

#Initialize lists for holding the data                                                                                                                 
indexvalue_list = []
mean_list = []
error_list = []
variance_list = []
autocorr_list = []


#Iterate over all SCALAR_AVERAGEs and put their values into the lists                                                                                  
#above.                                                                                                                                                
for sa in sa_list:
    x = sa.attributes.getNamedItem('indexvalue').nodeValue
    iv1, _, iv2 = x.partition('--')
    iv1,iv2 = eval(iv1), eval(iv2)
    if iv1 == iv2:
        print  'indexvalues same %s, interesting' % (iv1,)
        indexvalue_list.append(x)
        mean_list.append(float(sa.getElementsByTagName('MEAN')[0].childNodes[0].data))
        error_list.append(float(sa.getElementsByTagName('ERROR')[0].childNodes[0].data))
        variance_list.append(float(sa.getElementsByTagName('VARIANCE')[0].childNodes[0].data))
        autocorr_list.append(float(sa.getElementsByTagName('AUTOCORR')[0].childNodes[0].data))
    else:
         print ('%s != %s indexvalues not same. Not interesting' % (iv1,iv2)) 

#Output the data in a 5 column format
print "#indexvalue MEAN\tERROR\tVARIANCE\tAUTOCORR"
for i in range(len(mean_list)):
    print "%s\t%g\t%g\t%g\t%g" % (indexvalue_list[i],mean_list[i],error_list[i],variance_list[i],autocorr_list[i])
""" Output before and after changing indexvalues same:
(1, 2, 0) != (3, 4, 5) indexvalues not same. Not interesting
#indexvalue MEAN	ERROR	VARIANCE	AUTOCORR
indexvalues same (1, 2, 0), interesting
#indexvalue MEAN	ERROR	VARIANCE	AUTOCORR
( 1,2,0 ) -- (1,2,0)	6.368e-05	2.89e-05	6.37e-05	0.843
"""

Edited 14 Years Ago by TrustyTony because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

YasaminKh 0 Light Poster · Answer 1 · 2010-07-01T07:02:20+00:00

Thanks for your quick response i what you said but i'm getting this error:

line 32
iv1,iv2 = eval(iv1), eval(iv2)
^
SyntaxError: invalid syntax

Do you know how can I fix that?

YasaminKh 0 Light Poster · Answer 2 · 2010-07-01T21:08:25+00:00

Thanks for your help but using the exact code that you have posted here i'm getting this error:

File "First.py", line 28, in ?
iv1, _, iv2 = x.partition('--')
AttributeError: 'unicode' object has no attribute 'partition'

I can't think of any reason.
BTW the eval function is not bold in my code.

YasaminKh 0 Light Poster · Answer 3 · 2010-07-01T21:32:10+00:00

thanks again i solved the problem by using

for sa in sa_list:
    x = sa.attributes.getNamedItem('indexvalue').nodeValue
    tuples = [eval(item) for item in x.split("--")]
    if tuples[0] == tuples[1]:
        indexvalue_list.append(x)
        #indexvalue_list.append(sa.attributes.getNamedItem('indexvalue').nodeValue)                                                                                                                                      
        mean_list.append(float(sa.getElementsByTagName('MEAN')[0].childNodes[0].data))
        error_list.append(float(sa.getElementsByTagName('ERROR')[0].childNodes[0].data))
        variance_list.append(float(sa.getElementsByTagName('VARIANCE')[0].childNodes[0].data))
        autocorr_list.append(float(sa.getElementsByTagName('AUTOCORR')[0].childNodes[0].data))

....

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 4 · 2010-07-01T22:07:05+00:00

Looks elegant alternative, though I could not reproduce your errors in Python 2.6.5 and 3.1.2.

I got even the code running only by changing the prints to function form in python 3.1.2.

I like very much partition, but also list comprehensions. Just to be curious, does my between code snippet work in your machine? Picking piece of string between separators

YasaminKh 0 Light Poster · Answer 5 · 2010-07-01T23:36:12+00:00

I tried the other code you have, and I got this Errors:

Traceback (most recent call last):
File "testonline.py", line 9, in ?
print between('<a>','</a>',s)
File "testonline.py", line 4, in between
before,_,a = s.partition(left)
AttributeError: 'str' object has no attribute 'partition'

Does it mean something to you?

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 6 · 2010-07-01T23:43:25+00:00

Are you redefining str somewhare? Do you get this:

>>> str
<type 'str'>
>>> dir(str)
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

And what is the version of your Python?

>>> import sys
>>> sys.version_info
(2, 6, 5, 'final', 0)