Hi there, I have a data set (http://app.lms.unimelb.edu.au/bbcswebdav/courses/600151_2008_1/datasets/entertainment/moviestats_large.csv), and i was just wondering the best way of tackling the question stated below:
Which director is the most productive?
I an new to python and wondering if anyone could help me out a little on this problem. I have considered trying to average total earnings for each director and then do an average on the total amount of movies but i am unsure of how to go about that. Any help would be greatly appreciated.
Thanks

I can't see your data, so I can only guess at what it might contain. But I would think that to find the "most productive" director, you'd want to find the average for each director, and sort these averages. Your best bet would probably be a dictionary tying the directors and their averages, and sorting the averages. (It's a lot easier if you use your averages as the dictionary keys).

Can you give us just a short subset of the .csv file? We are not allowed access to the page. This way we can help you.

an example of a small section of the data in the CSV file (copied and pasted from excel):

name rating studio genre actor1 actor2 actor3 writer director date origin Washington Post Chicago Sun-Times The New York Times LA Weekly Los Angeles Times Rolling Stone Wall Street Journal Entertainment Weekly Empire Variety Salon.com The Onion (A.V. Club) TV Guide Slate metascore numReviews Worldwide Gross
Titanic PG Paramount Pictures Romance Leonardo DiCaprio Kate Winslet Billy Zane James Cameron James Cameron 1999 USA 60 100 100 30 100 90 20 90 70 50 74 11 1835300000


The Lord of the Rings: The Return of the King PG New Line Cinema Action Elijah Wood Sean Astin Ian McKellen Frances Walsh Peter Jackson 2004 USA / New Zealand 90 88 100 100 100 88 100 100 100 100 90 90 70 100 94 15 1129219252


Pirates of the Caribbean: Dead Man's Chest PG Buena Vista Pictures / Walt Disney Studios Action Johnny Depp Orlando Bloom Keira Knightley Ted Elliott Gore Verbinski 2006 USA 50 50 40 50 75 60 33 60 50 60 58 63 50 53 14 1060332628


Harry Potter and the Sorcerer's Stone PG Warner Bros. Fantasy Daniel Radcliffe Rupert Grint Emma Watson J.K. Rowling Chris Columbus 2002 UK / USA 50 100 40 40 80 70 60 75 90 60 70 50 64 13 968657891


Pirates of the Caribbean: At World's End PG Buena Vista Pictures Action Johnny Depp Geoffrey Rush Orlando Bloom Ted Elliott Gore Verbinski 2007 USA 70 70 70 50 40 50 60 60 40 42 63 60 50 13 958404152


Harry Potter and the Order of the Phoenix PG Warner Bros. Adventure Daniel Radcliffe Rupert Grint Emma Watson J.K. Rowling David Yates 2007 UK / USA 80 63 70 50 88 60 83 80 70 70 67 63 70 71 14 937000866


Star Wars: Episode I - The Phantom Menace PG 20th Century Fox Film Corporation Sci-fi Liam Neeson Ewan McGregor Natalie Portman George Lucas George Lucas 2001 USA 40 88 80 60 50 67 60 50 70 60 20 52 12 922379000


The Lord of the Rings: The Two Towers PG New Line Cinema Fantasy Elijah Wood Ian McKellen Viggo Mortensen Frances Walsh Peter Jackson 2003 USA / New Zealand 100 75 90 70 70 75 100 75 90 90 100 60 70 88 14 921600000


Jurassic Park PG Universal Pictures Suspense/Thriller Sam Neill Laura Dern Jeff Goldblum Michael Crichton Steven Spielberg 2001 USA 60 75 70 50 88 91 100 70 70 68 10 919700000


Harry Potter and the Goblet of Fire PG Warner Bros. Adventure Daniel Radcliffe Emma Watson Rupert Grint Steven Kloves Mike Newell 2006 UK / USA 80 88 80 60 90 75 90 67 60 90 90 80 75 100 81 15 892194397


Spider-Man 3 PG Columbia Pictures / Sony Pictures Releasing Action Tobey Maguire Kirsten Dunst James Franco Sam Raimi Sam Raimi 2007 USA 30 50 50 40 50 75 60 67 60 50 80 67 75 70 59 15 885430303


Shrek 2 PG DreamWorks Distribution LLC Action Mike Myers Eddie Murphy Cameron Diaz J. David Stem Andrew Adamson 2004 USA 40 75 70 90 80 88 90 91 80 90 80 60 70 90 75 15 880871036


Harry Potter and the Chamber of Secrets PG Warner Bros. Fantasy Daniel Radcliffe Emma Watson Rupert Grint Steven Kloves Chris Columbus 2003 USA 30 100 60 70 60 80 83 80 50 60 70 40 63 13 866300000


Finding Nemo G Walt Disney Pictures Family/Kids Albert Brooks Alexander Gould Ellen DeGeneres Andrew Stanton Andrew Stanton 2003 USA 90 100 90 100 80 88 90 100 80 70 80 70 90 89 14 865000000

Star Wars: Episode III - Revenge of the Sith PG Twentieth Century Fox Film Corp. Action Ewan McGregor Natalie Portman Hayden Christensen George Lucas George Lucas 2005 USA 50 88 90 90 70 50 60 67 80 90 40 70 70 60 68 15 848462555

The worldwide gross per movie may give some indication of productivity. I'm no accountant, but I would think gross-production costs would be a better one.

It's impossible to parse the data you posted, because it is plain text without commas. The number of data items do not match the header items in all cases.

Assuming you can read the data into a list of dictionaries (header items would be the dict keys) and a header list, the following will produce a list of the data required to calculate the average gross.

# outputList is a list of dictionaries
# each dictionary represents one movie
# items in headerList are the keys in each dictionary
# 'director' is index 8
# 'Worldwide Gross' is index -1
# movie 'name' is index 0
director_dict = {}
for movie in outputList:
    obj = director_dict.setdefault(movie[headerList[8]], [[], []])
    obj[0].append(movie[headerList[0]])
    obj[1].append(int(movie[headerList[-1]]))

# Calculate average 'Worldwide Gross' for each director
resultList = []
for director, value in director_dict.iteritems():
    resultList.append([sum(value[1])/len(value[1]), len(value[1]), director])

Example element of resultList:

[1835300000, 1, 'James Cameron']

From there is is straightforward to calculate the average.
HTH
-BV

Thanks for the effort it has halped in some aspects but the file is a CSV (comma seperated value file). The above information was copied and pasted directly from microsoft excel and therefore does not include the commas.

###This is the first row in microsoft excel###
name, rating, studio, genre, actor1, actor2, actor3, writer, director, date, origin, Washington Post, Chicago Sun-Times, The New York Times, LA Weekly, Los Angeles Times, Rolling Stone, Wall Street Journal, Entertainment Weekly, Empire, Variety, Salon.com, The Onion (A.V. Club), TV Guide, Slate, metascore,
numReviews, Worldwide Gross

###second row###
Titanic, PG, Paramount Pictures, Romance Leonardo, DiCaprio, Kate Winslet, Billy Zane, James Cameron, James Cameron, 1999, USA, 60, 100, 100, 30, 100, 90, 20, 90, 70, 50, 74, 11, 1835300000

###third row###
The Lord of the Rings: The Return of the King, PG, New Line Cinema Action, Elijah Wood, Sean Astin, Ian McKellen, Frances Walsh, Peter Jackson, 2004, USA / New Zealand, 90, 88, 100, 100, 100, 88, 100, 100, 100, 100, 90, 90, 70, 100, 94, 15, 1129219252

###fourth row###
Pirates of the Caribbean: Dead Man's Chest, PG, Buena Vista Pictures / Walt Disney Studios, Action, Johnny Depp, Orlando Bloom, Keira Knightley, Ted Elliott, Gore Verbinski, 2006, USA, 50, 50, 40, 50, 75, 60, 33, 60, 50, 60, 58, 63, 50, 53, 14, 1060332628

###fifth row###
Harry Potter and the Sorcerer's Stone, PG, Warner Bros., Fantasy, Daniel Radcliffe, Rupert Grint, Emma Watson, J.K. Rowling, Chris Columbus, 2002, UK / USA, 50, 100, 40, 40, 80, 70, 60, 75, 90, 60, 70, 50, 64, 13, 968657891

###sixth row###
Pirates of the Caribbean: At World's End, PG, Buena Vista Pictures, Action, Johnny Depp, Geoffrey Rush, Orlando Bloom, Ted Elliott, Gore Verbinski, 2007, USA, 70, 70, 70, 50, 40, 50, 60, 60, 40, 42, 63, 60, 50, 13, 958404152

###seventh row###
Harry Potter and the Order of the Phoenix, PG, Warner Bros., Adventure, Daniel Radcliffe, Rupert Grint, Emma Watson, J.K. Rowling, David Yates, 2007, UK / USA, 80, 63, 70, 50, 88, 60, 83, 80, 70, 70, 67, 63, 70, 71, 14, 937000866

###eighth row###
Star Wars: Episode I - The Phantom Menace, PG, 20th Century Fox Film Corporation, Sci-fi, Liam Neeson, Ewan McGregor, Natalie Portman, George Lucas, George Lucas, 2001, USA, 40, 88, 80, 60, 50, 67, 60, 50, 70, 60, 20, 52, 12, 922379000


As i have previously said, this is only a small sample from the csv file. Any other help would be greatly appreciated.
Many Thanks.

sch009,

I was hoping you would make an effort at coding this yourself. You can parse the data with csv.reader(), and as mentioned in my earlier post, construct a dictionary from the header and data. A problem still exists: the number of data items does not match the number of header items in all cases. You should have blank fields to match the number of header items.
Example:
Titanic,PG,Paramount Pictures,Romance,Leonardo DiCaprio,Kate Winslet,Billy Zane,James Cameron,James Cameron,1999,USA,60,100,100,30,100,90,20,90,70,50,,,,,74,11,1835300000

The following will create a dictionary from the CSV file and pad the data as required.

import csv

fn = 'movies.csv'

insert_pos = -4

f = open(fn)
reader = csv.reader(f)
headerList = reader.next()
outputList = []
n = len(headerList)

for line in reader:
    # test for blank lines
    if line:
        diff = n-len(line)
        if diff > 0:
            for i in range(diff):
                line.insert(insert_pos,'')
        dd = {}
        for i, key in enumerate(headerList):
            dd[key]=line[i]
        
        outputList.append(dd)

f.close()

From here, the code in my earlier post will create a list of results.

This question has already been answered. Start a new discussion instead.