I am currently working on an app to convert documents (specifically Open Document Text, at least for now) to epub format. The problem I'm running into right now is this, I am using etree ElementTree to parse the xml files extracted from the .odt file, right now I'm working on the content.xml (the file with all the text), and I'm having a problem, I'm trying to get the tag (element.tag) and attribute (element.attrib).
Ok, now hopefully someone has been able to follow thus far, so I understand attrib is a dictionary, I try print(element.attrib.keys()) and it prints out this:
dict_keys(['{urn:oasis:names:tc:opendocument:xmlns:text:1.0}style-name'])
If I do print(element.attrib) it prints this:
{'{urn:oasis:names:tc:opendocument:xmlns:text:1.0}style-name': 'T1'}now it looks to me like the key is {urn:oasis:names:tc:opendocument:xmlns:text:1.0}style-name and the value should be 'T1'However if I try print(element.attrib['{urn:oasis:names:tc:opendocument:xmlns:text:1.0}style-name']) it fails, it says it's not a key.

For a little more info, this is the tag I'm trying to get the info from: <text:span text:style-name="T1"> and I've tried 'text' and 'text:style-name' as the keys for the attribute dictionary, and they both fail. I also noticed the first tag is this:
<office:document-content xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0" xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"... and it goes on.
You'll notice it has xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0", now I believe this is a namespace??? I'm not sure, but I think it's defining text as that long string, I'm not exactly an xml expert, but I've been trying to digest this xml for a while. Any help is greatly appreciated, I hope my question hasn't been to confusing.

To clarify, what I need is a way to get the attribute, that way I can get the style-name and convert it to class for the html in the epub, I know that the attributes are a dictionary, but I can't find the key... :[

Recommended Answers

All 6 Replies

Perhaps you could upload the element using this function

import pickle
import zipfile

def save(obj):
    s = pickle.dumps(obj, 0)
    with zipfile.ZipFile('spam.zip', 'w') as myzip:
        myzip.writestr("elt", s)

save(element)

then attach the file spam.zip to a post, so that we can read it and study the element. You must also tell us which version of python you are using ...

We could read the element with

def load():
    with zipfile.ZipFile('spam.zip', 'r') as myzip:
        return pickle.loads(myzip.read("elt"))
element = load()

OK, I pickled all the elements and I'm goin to attach the last two so you can look at them. And I'm using the latest version of Python 3 on Ubuntu Linux 64 bit.

OK, so I'm looking and I can't figure out how to upload attachments... -_- I'm looking at the Files button, but it just brings me down to where it says Upload Attachments, but it just says "Files will automatically be attached to the post upon upload. Optionally, you can embed uploaded images within your post.", but there are no buttons to upload.

I had the same issue a few minutes ago, but it seems to be solved. Perhaps you could try again ?

I received the files, but I can't reproduce the same issue. Everything works fine in this code

import pickle
import zipfile
from xml.etree.ElementTree import ElementTree

def load(zipname):
    with zipfile.ZipFile(zipname, 'r') as myzip:
        return pickle.loads(myzip.read("elt"))

zipnames = "PickledElement160.zip PickledElement161.zip".split()
a, b = (load(zipname) for zipname in zipnames)

print(a, b)
print(type(a.attrib), type(b.attrib))
print(a.attrib)
print(a.attrib['{urn:oasis:names:tc:opendocument:xmlns:text:1.0}style-name'])
print(b.attrib)
print(b.attrib['{urn:oasis:names:tc:opendocument:xmlns:text:1.0}style-name'])

""" my output -->
<Element '{urn:oasis:names:tc:opendocument:xmlns:text:1.0}p' at 0x7fddfb8be490> <Element '{urn:oasis:names:tc:opendocument:xmlns:text:1.0}span' at 0x7fddfb8be0d0>
<class 'dict'> <class 'dict'>
{'{urn:oasis:names:tc:opendocument:xmlns:text:1.0}style-name': 'P5'}
P5
{'{urn:oasis:names:tc:opendocument:xmlns:text:1.0}style-name': 'T1'}
T1
"""

I'm using python 3.2.1. What is your output for the same code ?

Strange, it actually seems to be working now o_O Not sure why it wouldn't before, thanks, and at least I know about pickle now :D

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.