Hey guys,

I am looking into xbrl files and I need to extract certain data from each of them however, I can't find much information on the existing python-xbrl library, perhaps someone in here has an experience with it?
Here's an xbrl file example
Click Here
Any ideas/solutions on how to parse a certain field and get it's value?

or maybe I should implement my own parser using "re"?

I did this now though just to test it out ...

xmlContent = (requests.get("http://regnskaber.virk.dk/14502803/eGJybHN0b3JlOi8vWC1GMDk4RkNDNi0yMDE0MTIzMV8wOTE2MjFfMDk1L3hicmw.xml").content)

print "Date: " +re.findall(r">(.+)<", re.findall(r"gsd:ReportingPeriodStartDate.+", xmlContent)[0])[0]

and it works though I am not sure how efficient it is because I need to parse thousands of documents

Thanks in advance =]

Recommended Answers

All 3 Replies

Please put the shovel down before the hole gets too big and you can't climb out :) Regular expressions are not the way to go for something that is XML based. The simplest way is to grab libraries and play with them at a Python interactive prompt. Besides python-xbrl there is also http://arelle.org/documentation/api/ which seems popular. Give them a go and if you run into problems please get back to us with more detailed questions.

commented: good link +14

Hey,
that's great ,thanks for the link!
Although I seem to be unable to read the docs, they won't load/open, is that the case for you too?

Or .. do you by any chance have an example, such as how would you parse an xbrl document and extract a field "startDate" ?

The docs open fine for me and no, I don't have an example, I found the reference to it by using a tool called a "search engine". Why don't you try them, they're very good? The targetted ones are even better, e.g. http://nullege.com/ or http://code.openhub.net/

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.