I need to convert .docx files to .doc (on Linux and Windows).
I'm planning to use the zip mod to access all of the internal XML documents.
Then I'll the /word/document.xml and I need to parse it so that it will read all of the text in the tags, place all of the text strings in a list, and then print the basic list.
Very simple stuff, xcept how do you actually parse an XML file?
from os import name, getcwd cwd = getcwd() if name != 'nt': dirType = '/' else: dirType = '\\' xml = open('%s%sword%sdocument.xml' % (cwd, dirType, dirType)) text = xml.read() line = 0 repr(xml) size = len(xml) while line != size: .. text = xml[line] .. line = (line+1) .. repr(text)
is a pain. Does it even work??
So how do you parse an XML file?