parsing xml

Reply

Join Date: May 2005
Posts: 215
Reputation: shanenin is an unknown quantity at this point 
Solved Threads: 16
shanenin shanenin is offline Offline
Posting Whiz in Training

parsing xml

 
0
  #1
Aug 27th, 2005
I feel like kind of dumb. I have been reading python docs all day, and feel like I have not absorbed anything. I would like to parse info out of an xml document. here is a url to a sample doc
http://freetalklive.com/netcast.xml

lets say I want to parse out the urls of the mp3s, and aslo the descriptions of the shows. I have been reading about xml.sax, xml.com.minicom. They both seem like they may have the tools to do what I want, but am kind of lost. I also have been trying to use regular string methods to do it.

Any suggestions, or a good tutroial with the beginner in mind would be appreciated
In a perfect world exceptions would not be needed.
Reply With Quote Quick reply to this message  
Join Date: May 2005
Posts: 215
Reputation: shanenin is an unknown quantity at this point 
Solved Threads: 16
shanenin shanenin is offline Offline
Posting Whiz in Training

Re: parsing xml

 
0
  #2
Aug 27th, 2005
I have found some ways to do this, both with feedparser.py, and cElementTree.py .

I would greatly appreciate any examples to how to do this with xml.sax. The xml.sax docs just plain dumbfouned me, it would be nice to make some sence out of them.
In a perfect world exceptions would not be needed.
Reply With Quote Quick reply to this message  
Join Date: May 2005
Posts: 215
Reputation: shanenin is an unknown quantity at this point 
Solved Threads: 16
shanenin shanenin is offline Offline
Posting Whiz in Training

Re: parsing xml

 
0
  #3
Aug 27th, 2005
the feedparser module is so nice and easy to use. It is specifically designed to parse rss xml files. below is the script I wrote that downloads the mp3 files and and makes a text file which contains the mp3 file description. It stores each mp3 and description file in the same directory

here is the link to feedparser
http://feedparser.org/
[php]
#!/usr/bin/env python
#
# this program downloads mp3 files and description based on the rss feed file(xml)
#
import sys, feedparser, os, urllib

# this code allows the program to take a command argument(the location of the xml file)
rss_file = sys.argv[1]
feed_obj = feedparser.parse(rss_file)

# this function parses the rss file and places all of the info that it returns as a list
# this list contains tuples continaing the info of each item: title, link, description
def parse_rss(v_feed_obj):

rss_info = []
for i in range(len(feed_obj.entries)):
title = name_cleaner(feed_obj.entries[i].title)
link = feed_obj.entries[i].link
description = feed_obj.entries[i].description
rss_info.append((title, link, description))
return rss_info


# this module removes the symbol "/" and replaces it with "-" from directory names and
# files, replaces spaces with underscores and also makes all files or directorys lowercase
def name_cleaner(cleanfile):

newfile = cleanfile.replace('/','-')
newfile = newfile.replace(' ','_')
newfile = newfile.lower()
return newfile

# this is obviously the main function
def main():

rss_info = parse_rss(feed_obj)
for i in rss_info:
os.mkdir(i[0], 0775)
argument1 = i[1]
argument2 = "%s/%s.mp3" %(i[0], i[0])
urllib.urlretrieve(argument1, argument2)
output_file = "%s/INFO" %i[0]
text_file = open(output_file, 'w')
text_file.write(i[2]+"\n")
text_file.close()

main()[/php]
In a perfect world exceptions would not be needed.
Reply With Quote Quick reply to this message  
Join Date: Jun 2007
Posts: 2
Reputation: datulaida is an unknown quantity at this point 
Solved Threads: 0
datulaida datulaida is offline Offline
Newbie Poster

Re: parsing xml

 
0
  #4
Jun 22nd, 2007
how to integrate apple pie parser with python?
Any suggestions, or a good tutroial to parsed text in mind would be appreciated..
Reply With Quote Quick reply to this message  
Join Date: Oct 2004
Posts: 3,954
Reputation: vegaseat is just really nice vegaseat is just really nice vegaseat is just really nice vegaseat is just really nice vegaseat is just really nice 
Solved Threads: 917
Moderator
vegaseat's Avatar
vegaseat vegaseat is offline Offline
DaniWeb's Hypocrite

Re: parsing xml

 
0
  #5
Jun 22nd, 2007
Please start a new thread! Hijacking old threads is kind of rude.
May 'the Google' be with you!
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:


Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC