| | |
parsing xml
![]() |
•
•
Join Date: May 2005
Posts: 215
Reputation:
Solved Threads: 17
I feel like kind of dumb. I have been reading python docs all day, and feel like I have not absorbed anything. I would like to parse info out of an xml document. here is a url to a sample doc
http://freetalklive.com/netcast.xml
lets say I want to parse out the urls of the mp3s, and aslo the descriptions of the shows. I have been reading about xml.sax, xml.com.minicom. They both seem like they may have the tools to do what I want, but am kind of lost. I also have been trying to use regular string methods to do it.
Any suggestions, or a good tutroial with the beginner in mind would be appreciated
http://freetalklive.com/netcast.xml
lets say I want to parse out the urls of the mp3s, and aslo the descriptions of the shows. I have been reading about xml.sax, xml.com.minicom. They both seem like they may have the tools to do what I want, but am kind of lost. I also have been trying to use regular string methods to do it.
Any suggestions, or a good tutroial with the beginner in mind would be appreciated
In a perfect world exceptions would not be needed.
•
•
Join Date: May 2005
Posts: 215
Reputation:
Solved Threads: 17
the feedparser module is so nice and easy to use. It is specifically designed to parse rss xml files. below is the script I wrote that downloads the mp3 files and and makes a text file which contains the mp3 file description. It stores each mp3 and description file in the same directory
here is the link to feedparser
http://feedparser.org/
[php]
#!/usr/bin/env python
#
# this program downloads mp3 files and description based on the rss feed file(xml)
#
import sys, feedparser, os, urllib
# this code allows the program to take a command argument(the location of the xml file)
rss_file = sys.argv[1]
feed_obj = feedparser.parse(rss_file)
# this function parses the rss file and places all of the info that it returns as a list
# this list contains tuples continaing the info of each item: title, link, description
def parse_rss(v_feed_obj):
rss_info = []
for i in range(len(feed_obj.entries)):
title = name_cleaner(feed_obj.entries[i].title)
link = feed_obj.entries[i].link
description = feed_obj.entries[i].description
rss_info.append((title, link, description))
return rss_info
# this module removes the symbol "/" and replaces it with "-" from directory names and
# files, replaces spaces with underscores and also makes all files or directorys lowercase
def name_cleaner(cleanfile):
newfile = cleanfile.replace('/','-')
newfile = newfile.replace(' ','_')
newfile = newfile.lower()
return newfile
# this is obviously the main function
def main():
rss_info = parse_rss(feed_obj)
for i in rss_info:
os.mkdir(i[0], 0775)
argument1 = i[1]
argument2 = "%s/%s.mp3" %(i[0], i[0])
urllib.urlretrieve(argument1, argument2)
output_file = "%s/INFO" %i[0]
text_file = open(output_file, 'w')
text_file.write(i[2]+"\n")
text_file.close()
main()[/php]
here is the link to feedparser
http://feedparser.org/
[php]
#!/usr/bin/env python
#
# this program downloads mp3 files and description based on the rss feed file(xml)
#
import sys, feedparser, os, urllib
# this code allows the program to take a command argument(the location of the xml file)
rss_file = sys.argv[1]
feed_obj = feedparser.parse(rss_file)
# this function parses the rss file and places all of the info that it returns as a list
# this list contains tuples continaing the info of each item: title, link, description
def parse_rss(v_feed_obj):
rss_info = []
for i in range(len(feed_obj.entries)):
title = name_cleaner(feed_obj.entries[i].title)
link = feed_obj.entries[i].link
description = feed_obj.entries[i].description
rss_info.append((title, link, description))
return rss_info
# this module removes the symbol "/" and replaces it with "-" from directory names and
# files, replaces spaces with underscores and also makes all files or directorys lowercase
def name_cleaner(cleanfile):
newfile = cleanfile.replace('/','-')
newfile = newfile.replace(' ','_')
newfile = newfile.lower()
return newfile
# this is obviously the main function
def main():
rss_info = parse_rss(feed_obj)
for i in rss_info:
os.mkdir(i[0], 0775)
argument1 = i[1]
argument2 = "%s/%s.mp3" %(i[0], i[0])
urllib.urlretrieve(argument1, argument2)
output_file = "%s/INFO" %i[0]
text_file = open(output_file, 'w')
text_file.write(i[2]+"\n")
text_file.close()
main()[/php]
In a perfect world exceptions would not be needed.
![]() |
Similar Threads
- How to Parse XML in ASP? (ASP)
- parsing conf file in c++ (C++)
- Eliminating Whitespace characters while parsing XML Files using DOM (Java)
- xml file parsing in c++ (C++)
- Need some help with parsing an XML file (PHP)
- Parsing xml file (Python)
Other Threads in the Python Forum
- Previous Thread: Purple books.
- Next Thread: Text widget puzzles.
Views: 4087 | Replies: 4
| Thread Tools | Search this Thread |
Tag cloud for Python
application array beginner c++ c/c++ change character class client code command compression convert count create csv ctypes database dictionary django dll engine error examples excel exe extensions fdlib file float format framework ftp function graphics gui homework image images import input library line linux list lists logging loop loops microcontroller mouse mysql mysqldb number numbers output parse parsing path port prime processing program programming py2exe pygame pygtk pyqt python random range raw_input recursion recursive redirect remote scrolledtext server socket ssh stdout string strings syntax table terminal text thread threading tkinter transparency tuple tutorial ubuntu unicode variable variables web windows wxpython






