Hi,
I'm making something of an RSS reader-ish.
I'm using the Universal Feed Parser to do this.
The feed is just a list of TV shows, and the date they air.
I'm successful at getting the feed, now what i'm trying to do is split it up into chunks that I can manipulate.
I'll give you the code bit by bit so as not to confuse too much (and I do apologize for such messy code):

def episode_info (feed_url): #get episode information, returns a dictionary
	"""This is a function that is fed an xml feed and returns a dictionary
	holding episode titles, content, numbers, and dates"""

	d = feedparser.parse(feed_url)

	entry_number = len(d['entries']) - 1

	episode_name = []
	episode_content = []
	while entry_number != -1: #add entries to episode_name and episode_content
		episode_name.append (d.entries[entry_number].title)
		episode_content.append (d.entries[entry_number].description)
		entry_number = entry_number - 1
	episode_name.reverse()

This first part just takes a feed, and splits it into two. episode_name holds the name of the entry, and episode_content takes the rest.

Here's where it gets tricky:

content = []
	episode_number = []
	episode_date = []
	entry_number = len(episode_content)-1
	while entry_number != -1: # splits content into a list with three entries, the summary, and the episode number and the date
		if '<br />' in episode_content[entry_number]:
			index1 = episode_content[entry_number].index('<br />')
			content.append (episode_content[entry_number][0:index1])			
		else:
			print 'there is no index1'
		if '–' in episode_content[entry_number]:
			index2 = episode_content[entry_number].index('–')
			episode_number.append (episode_content[entry_number][index1+6:index2])
			if '<p><sub><i>' in episode_content[entry_number]:
				index3 = episode_content[entry_number].index('<p><sub><i>')
				epdate = episode_content[entry_number][index2+13:index3].replace('/', '-')
				epdate = epdate.split('-')
				epdate = epdate[2] + '-' + epdate[0] + '-' + epdate[1]
				epdate = epdate.replace(' ', '')
				episode_date.append (epdate)
		else:
			episode_number.append (episode_content[entry_number][index1+6:])
			episode_date.append (0)
		
		entry_number = entry_number - 1

So the first part to be split is the actual content, which is divided by the rest by '<br />'. I'm using if else here, because I don't know what else to use. All the feeds I use have <br />, but I put the if in there just in case, and it seems to work fine.
Then there's the second part which is seperated by '–'. In between the '<br />' and the '–' is a small bit of text normally like "Season 2, Episode 5".
But what's after that is the date in this format ' – Aired: 12/22/2009' or 'Airs: 5/10/2010'. I realise I have to adjust the code depending on if its Airs or Aired. I'll do that later.
Here's the problem. sometimes the "Airs/Aired Date" isn't there. So I want to make it so that, if it's there, the date is added to the dictionary (as it should be doing already), but if there's no date, to just skip the dictionary, or else mark it in the dictionary as 0 so I can test to see if there's a date later on in the program. I can't figure out why this part isn't working.
The program finishes off by returning all the values gather in a dictionary:

entry_number = len(episode_content) - 1
	episodes = {}
	while entry_number != -1: # put it all in a dictionary: episodes
		if episode_date[entry_number] == 0:
			print 'There is no date, therefore the episode cannot be added to the calendar'
			break
		else:
			episodes[entry_number] = [episode_name[entry_number],content[entry_number], episode_number[entry_number], episode_date[entry_number]]
			entry_number = entry_number - 1
	
	return episodes

.
The error I get at the moment is :

Traceback (most recent call last):
File "rss.py", line 137, in <module>
print fringe[0]
KeyError: 0

Not too sure what's happening there. Here's the code as a whole to give you a better idea of what's going on:

#!/usr/bin/env python

import feedparser

def episode_info (feed_url): #get episode information, returns a dictionary
	"""This is a function that is fed an xml feed and returns a dictionary
	holding episode titles, content, numbers, and dates"""

	d = feedparser.parse(feed_url)

	entry_number = len(d['entries']) - 1

	episode_name = []
	episode_content = []
	while entry_number != -1: #add entries to episode_name and episode_content
		episode_name.append (d.entries[entry_number].title)
		episode_content.append (d.entries[entry_number].description)
		entry_number = entry_number - 1
	episode_name.reverse()

	content = []
	episode_number = []
	episode_date = []
	entry_number = len(episode_content)-1
	while entry_number != -1: # splits content into a list with three entries, the summary, and the episode number and the date
		if '<br />' in episode_content[entry_number]:
			index1 = episode_content[entry_number].index('<br />')
			content.append (episode_content[entry_number][0:index1])			
		else:
			print 'there is no index1'
		if '–' in episode_content[entry_number]:
			index2 = episode_content[entry_number].index('–')
			episode_number.append (episode_content[entry_number][index1+6:index2])
			if '<p><sub><i>' in episode_content[entry_number]:
				index3 = episode_content[entry_number].index('<p><sub><i>')
				epdate = episode_content[entry_number][index2+13:index3].replace('/', '-')
				epdate = epdate.split('-')
				epdate = epdate[2] + '-' + epdate[0] + '-' + epdate[1]
				epdate = epdate.replace(' ', '')
				episode_date.append (epdate)
		else:
			episode_number.append (episode_content[entry_number][index1+6:])
			episode_date.append (0)
		
		entry_number = entry_number - 1


	entry_number = len(episode_content) - 1
	episodes = {}
	while entry_number != -1: # put it all in a dictionary: episodes
		if episode_date[entry_number] == 0:
			print 'There is no date, therefore the episode cannot be added to the calendar'
			break
		else:
			episodes[entry_number] = [episode_name[entry_number],content[entry_number], episode_number[entry_number], episode_date[entry_number]]
			entry_number = entry_number - 1
	
	return episodes


fringe = episode_info('http://feed43.com/lietome_timetable.xml')
print fringe[0]

Again, sorry about the messiness of the code, and if there are any questions I can answer to help solve the problem, I'll try my best to answer.
Thank you.

Recommended Answers

All 2 Replies

Change last line to:

print list(fringe.keys())[0]

Seems to have done it. Thanks.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.