Hi,

I've had a bit of experience with perl and shell scripting. I've been reading the O'Reilly Python book and have been having a go at replicating some of our perl scripts in python (v3.1.2).

I'm looking to put something together than will parse through large xml files and manipulate the results. I've been messing with "xml.parsers.expat" but I'm having problems getting the arguments out of the StartElementHandler, although the list prints the correct results, all I can return is the integer '1'.

Could someone have a look and see what I'm doing wrong?

$ cat xmlparse.py
#!/usr/bin/python
import xml.parsers.expat

# handler functions
def start_element(L, D):
    print(L,type(L),D,type(D))
    test = ([L,D])
    print('The list "test:"')
    print(test)
    return test

# test xml string.
xmlstring = '<xml attr="1" ><text>hello</text></xml>'

#create the parser
parsed = []
p = xml.parsers.expat.ParserCreate()
p.StartElementHandler = start_element
parsed = p.Parse(xmlstring)

print ("\nThis is the returned info:")
print(parsed,type(parsed))


$ ./python31.exe xmlparse.py
xml <class 'str'> {'attr': '1'} <class 'dict'>
The list "test:"
['xml', {'attr': '1'}]
text <class 'str'> {} <class 'dict'>
The list "test:"
['text', {}]

This is the returned info:
1 <class 'int'>
$

I admin, I'm sort of aheadof my reading so if I'm doing something fundamentally wrong, I'd appreciate a shove in the right direction :)

The return value of Parse method has nothing to do with what you return from the start_element handler. The latter, in fact, is ignored. To keep the parsed info, the handler should create an object to keep it, and store the object in the persistent collection. It is a fun project to do; helps to understand a lot of XML inner workings; however you'd end up with some incarnation of a DOM model.
If you are just interested in getting results, switch to DOM-style parser right away.

I'll do a bit more reading on what xml parsers are available and how best to proceed.

Cheers for the help.

This article has been dead for over six months. Start a new discussion instead.