954,557 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

parsing xml

I feel like kind of dumb. I have been reading python docs all day, and feel like I have not absorbed anything. I would like to parse info out of an xml document. here is a url to a sample doc
http://freetalklive.com/netcast.xml

lets say I want to parse out the urls of the mp3s, and aslo the descriptions of the shows. I have been reading about xml.sax, xml.com.minicom. They both seem like they may have the tools to do what I want, but am kind of lost. I also have been trying to use regular string methods to do it.

Any suggestions, or a good tutroial with the beginner in mind would be appreciated

shanenin
Posting Whiz in Training
217 posts since May 2005
Reputation Points: 10
Solved Threads: 17
 

I have found some ways to do this, both with feedparser.py, and cElementTree.py .

I would greatly appreciate any examples to how to do this with xml.sax. The xml.sax docs just plain dumbfouned me, it would be nice to make some sence out of them.

shanenin
Posting Whiz in Training
217 posts since May 2005
Reputation Points: 10
Solved Threads: 17
 

the feedparser module is so nice and easy to use. It is specifically designed to parse rss xml files. below is the script I wrote that downloads the mp3 files and and makes a text file which contains the mp3 file description. It stores each mp3 and description file in the same directory

here is the link to feedparser
http://feedparser.org/
[php]
#!/usr/bin/env python
#
# this program downloads mp3 files and description based on the rss feed file(xml)
#
import sys, feedparser, os, urllib

# this code allows the program to take a command argument(the location of the xml file)
rss_file = sys.argv[1]
feed_obj = feedparser.parse(rss_file)

# this function parses the rss file and places all of the info that it returns as a list
# this list contains tuples continaing the info of each item: title, link, description
def parse_rss(v_feed_obj):

rss_info = []
for i in range(len(feed_obj.entries)):
title = name_cleaner(feed_obj.entries[i].title)
link = feed_obj.entries[i].link
description = feed_obj.entries[i].description
rss_info.append((title, link, description))
return rss_info

# this module removes the symbol "/" and replaces it with "-" from directory names and
# files, replaces spaces with underscores and also makes all files or directorys lowercase
def name_cleaner(cleanfile):

newfile = cleanfile.replace('/','-')
newfile = newfile.replace(' ','_')
newfile = newfile.lower()
return newfile

# this is obviously the main function
def main():

rss_info = parse_rss(feed_obj)
for i in rss_info:
os.mkdir(i[0], 0775)
argument1 = i[1]
argument2 = "%s/%s.mp3" %(i[0], i[0])
urllib.urlretrieve(argument1, argument2)
output_file = "%s/INFO" %i[0]
text_file = open(output_file, 'w')
text_file.write(i[2]+"\n")
text_file.close()

main()[/php]

shanenin
Posting Whiz in Training
217 posts since May 2005
Reputation Points: 10
Solved Threads: 17
 

how to integrate apple pie parser with python?
Any suggestions, or a good tutroial to parsed text in mind would be appreciated..

datulaida
Newbie Poster
2 posts since Jun 2007
Reputation Points: 10
Solved Threads: 0
 

Please start a new thread! Hijacking old threads is kind of rude.

vegaseat
DaniWeb's Hypocrite
Moderator
5,989 posts since Oct 2004
Reputation Points: 1,345
Solved Threads: 1,417
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You