944,183 Members | Top Members by Rank

Ad:
  • Python Discussion Thread
  • Unsolved
  • Views: 4604
  • Python RSS
Aug 27th, 2005
0

parsing xml

Expand Post »
I feel like kind of dumb. I have been reading python docs all day, and feel like I have not absorbed anything. I would like to parse info out of an xml document. here is a url to a sample doc
http://freetalklive.com/netcast.xml

lets say I want to parse out the urls of the mp3s, and aslo the descriptions of the shows. I have been reading about xml.sax, xml.com.minicom. They both seem like they may have the tools to do what I want, but am kind of lost. I also have been trying to use regular string methods to do it.

Any suggestions, or a good tutroial with the beginner in mind would be appreciated
Similar Threads
Reputation Points: 10
Solved Threads: 17
Posting Whiz in Training
shanenin is offline Offline
217 posts
since May 2005
Aug 27th, 2005
0

Re: parsing xml

I have found some ways to do this, both with feedparser.py, and cElementTree.py .

I would greatly appreciate any examples to how to do this with xml.sax. The xml.sax docs just plain dumbfouned me, it would be nice to make some sence out of them.
Reputation Points: 10
Solved Threads: 17
Posting Whiz in Training
shanenin is offline Offline
217 posts
since May 2005
Aug 27th, 2005
0

Re: parsing xml

the feedparser module is so nice and easy to use. It is specifically designed to parse rss xml files. below is the script I wrote that downloads the mp3 files and and makes a text file which contains the mp3 file description. It stores each mp3 and description file in the same directory

here is the link to feedparser
http://feedparser.org/
[php]
#!/usr/bin/env python
#
# this program downloads mp3 files and description based on the rss feed file(xml)
#
import sys, feedparser, os, urllib

# this code allows the program to take a command argument(the location of the xml file)
rss_file = sys.argv[1]
feed_obj = feedparser.parse(rss_file)

# this function parses the rss file and places all of the info that it returns as a list
# this list contains tuples continaing the info of each item: title, link, description
def parse_rss(v_feed_obj):

rss_info = []
for i in range(len(feed_obj.entries)):
title = name_cleaner(feed_obj.entries[i].title)
link = feed_obj.entries[i].link
description = feed_obj.entries[i].description
rss_info.append((title, link, description))
return rss_info


# this module removes the symbol "/" and replaces it with "-" from directory names and
# files, replaces spaces with underscores and also makes all files or directorys lowercase
def name_cleaner(cleanfile):

newfile = cleanfile.replace('/','-')
newfile = newfile.replace(' ','_')
newfile = newfile.lower()
return newfile

# this is obviously the main function
def main():

rss_info = parse_rss(feed_obj)
for i in rss_info:
os.mkdir(i[0], 0775)
argument1 = i[1]
argument2 = "%s/%s.mp3" %(i[0], i[0])
urllib.urlretrieve(argument1, argument2)
output_file = "%s/INFO" %i[0]
text_file = open(output_file, 'w')
text_file.write(i[2]+"\n")
text_file.close()

main()[/php]
Reputation Points: 10
Solved Threads: 17
Posting Whiz in Training
shanenin is offline Offline
217 posts
since May 2005
Jun 22nd, 2007
0

Re: parsing xml

how to integrate apple pie parser with python?
Any suggestions, or a good tutroial to parsed text in mind would be appreciated..
Reputation Points: 10
Solved Threads: 0
Newbie Poster
datulaida is offline Offline
2 posts
since Jun 2007
Jun 22nd, 2007
0

Re: parsing xml

Please start a new thread! Hijacking old threads is kind of rude.
Moderator
Reputation Points: 1333
Solved Threads: 1404
DaniWeb's Hypocrite
vegaseat is offline Offline
5,792 posts
since Oct 2004

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in Python Forum Timeline: Purple books.
Next Thread in Python Forum Timeline: Text widget puzzles.





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC