![]() |
| ||
| parsing xml I feel like kind of dumb. I have been reading python docs all day, and feel like I have not absorbed anything. I would like to parse info out of an xml document. here is a url to a sample doc http://freetalklive.com/netcast.xml lets say I want to parse out the urls of the mp3s, and aslo the descriptions of the shows. I have been reading about xml.sax, xml.com.minicom. They both seem like they may have the tools to do what I want, but am kind of lost. I also have been trying to use regular string methods to do it. Any suggestions, or a good tutroial with the beginner in mind would be appreciated |
| ||
| Re: parsing xml I have found some ways to do this, both with feedparser.py, and cElementTree.py . I would greatly appreciate any examples to how to do this with xml.sax. The xml.sax docs just plain dumbfouned me, it would be nice to make some sence out of them. |
| ||
| Re: parsing xml the feedparser module is so nice and easy to use. It is specifically designed to parse rss xml files. below is the script I wrote that downloads the mp3 files and and makes a text file which contains the mp3 file description. It stores each mp3 and description file in the same directory here is the link to feedparser http://feedparser.org/ [php] #!/usr/bin/env python # # this program downloads mp3 files and description based on the rss feed file(xml) # import sys, feedparser, os, urllib # this code allows the program to take a command argument(the location of the xml file) rss_file = sys.argv[1] feed_obj = feedparser.parse(rss_file) # this function parses the rss file and places all of the info that it returns as a list # this list contains tuples continaing the info of each item: title, link, description def parse_rss(v_feed_obj): rss_info = [] for i in range(len(feed_obj.entries)): title = name_cleaner(feed_obj.entries[i].title) link = feed_obj.entries[i].link description = feed_obj.entries[i].description rss_info.append((title, link, description)) return rss_info # this module removes the symbol "/" and replaces it with "-" from directory names and # files, replaces spaces with underscores and also makes all files or directorys lowercase def name_cleaner(cleanfile): newfile = cleanfile.replace('/','-') newfile = newfile.replace(' ','_') newfile = newfile.lower() return newfile # this is obviously the main function def main(): rss_info = parse_rss(feed_obj) for i in rss_info: os.mkdir(i[0], 0775) argument1 = i[1] argument2 = "%s/%s.mp3" %(i[0], i[0]) urllib.urlretrieve(argument1, argument2) output_file = "%s/INFO" %i[0] text_file = open(output_file, 'w') text_file.write(i[2]+"\n") text_file.close() main()[/php] |
| ||
| Re: parsing xml how to integrate apple pie parser with python? Any suggestions, or a good tutroial to parsed text in mind would be appreciated.. |
| ||
| Re: parsing xml Please start a new thread! Hijacking old threads is kind of rude. |
| All times are GMT -4. The time now is 1:18 am. |
Forum system based on vBulletin Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
©2003 - 2009 DaniWeb® LLC