Hello everyone, I'm new to this website and to prgramming in general. Taking a class and we're starting with python. Doin a programming project in which I need to do Data mining for a company. I'm stuck trying to get the company's monthly average for each month. I can A month's average but not every month with the year. Can anyone help me with wat I did wrong?

Here's the program I have so far:

def getDataList(fileName):
dataFile = open(fileName, "r")
dataList = []
for line in dataFile:
dataList.append(line.strip().split(','))
return dataList

def getmonthavg(datalist):
d= datalist
monthlyaveragelist= []
tv=0
vta=0
month= d[0][0][5:7]
year= d[0][0][0:4]
for i in d:
v= float(i[5])
c= float(i[6])
if i[0][5:7] == month and i[0][0:4] == year:
vtimesc= v*c
tv= tv + v
vta= vta + vtimesc
ap= vta/tv
aptuple= (ap,i[0][0:7])
monthlyaveragelist.append(aptuple)
return monthlyaveragelist

b = getDataList('table.csv')
b.pop(0)
c=getmonthavg(b)
print c

## All 17 Replies

What are you trying to get the monthly average of? Can you explain your problem a bit better? It's unclear what problem you are having.

I'm in the same class. I'm having trouble too. THe problem is that we are given a set of data and we have to data mine. We are given a comma separated variable file that gives financial information of google from 2004 to 2008. The file gives the daily volume and adjusted close for those 4 years. i in the problem is the line in the file. i[5] is the volume and i[6] is the adjusted close. we have to take the monthly average which is the the sum of the daily (adjusted close*volume) for the month divided by the sum of the volume for the month. We have to get those monthly averages (48 of them), append it to a list, and then find the six best and the six worst averages and print them.

This is the problem:

Data mining stock prices
Data mining is the process of sorting through large amounts of data and picking
out relevant information. It is usually used by business intelligence organizations, and
ﬁnancial analysts, but is increasingly being used in the sciences to extract information
from the enormous data sets generated by modern experimental and observational
methods.
In this project, we want to do some preliminary data mining to the prices of some
company’s stock. So that we can speak in speciﬁcs, let’s look at Google. Your program
will calculate the monthly average prices of Google stock from 2004 to 2008, and tell
us the 6 best and 6 worst months for Google. We provide the data reading function;
you write the next two and a main that calls the functions.
(a) First you need a history of stock prices. Go to ﬁnance.yahoo.com, enter Google in
the search ﬁeld, select “Historical Prices” (currently on the left of the page), and
where your Python program will be saved. The default name is “table.csv” so we
will use that name. The ﬁle format is indicated by the ﬁrst few lines:
2008-09-19,461.00,462.07,443.28,449.15,10006000,449.15
2008-09-18,422.64,439.18,410.50,439.08,8589400,439.08
(b) getDataList(fileName)
The “csv” ﬁle is a “comma-separated ﬁle”, so we can split the data on commas.
The following function will read a ﬁle, split the lines in the ﬁle on commas, and
put the data in to a list that is returned. The result is a list of lists where each line
is a list. Also, every item is a string. To read our ﬁle, call it using our ﬁle name:
getDataList('table.csv'). Experiment with this function in the shell
to get a sense of what is returned.
def getDataList(fileName):
dataFile = open(fileName,"r")
for line in dataFile:
# strip end-of-line, split on commas, and append items
to list dataList.append(line.strip().split(','))
return dataList
(c) getMonthlyAverages(dataList)
In this function, you will use the dataList generated by the getDataList function as
the parameter. Use the Date, Volume, and Adj Close ﬁelds to calculate the average
monthly prices. Here is a formula for the average price for a month, where Vi is the
volume and Ci is the day i’s adjusted close price (Adj Close).
averagePrice = (V 1
∗ C 1 + V 2 ∗ C 2 + . . . + V n ∗ C n)/(V 1 + V 2 + . . . + V n)
For each month create a tuple with two items: the average for that month, and the
date (you need only the month and year). Append the tuple for each month to a list
(e.g. monthlyAveragesList), and after calculating all the monthly averages, return
this list. We use tuples here because once these values are calculated we don’t want
to accidentally change them!
(d) printInfo(monthlyAveragesList)
In this function, you need to use the list of monthly averages calculated in the
getMonthlyAverages function. Here you will need to ﬁnd and print the 6 best
(highest average price) and 6 worst (lowest average price) months for Google’s
stock. Print them in order from highest to lowest and print to 2 decimal places.
Format the output so that it looks good and include informative headers. This
function does not return anything.
(e) If you don’t call these functions, they are useless. Thus, you should write code to
call them.
Hints:
(a) The list sort() and reverse() methods will be useful. Experiment with how
they sort a list of tuples—notice how they sort on the ﬁrst item.
(b) To create a tuple, put items in a comma-separated list with parentheses: (x,y).
(c) When working with a list-of-lists (or a list-of-tuples), the ﬁrst item in the ﬁrst
list is someList[0][0] and the second item in that same ﬁrst list is someList
[0][1].
(W. F. Punch. The Practice of Computing Using Python. Addison-Wesley/CourseSmart, 02/25/2010. 336 - 337).
<vbk:PBK9780132142908#page(336)>

this is what i have so far:

def getDataList(fileName):
dataFile = open(fileName, "r")
dataList = []
for line in dataFile:
# strip end-of-line, split on commas, and append items to list
dataList.append(line.strip().split(','))
return dataList

# find monthly average
def getMonthlyAverage(dataList):
for i in dataList:
m ='08'
n='2008'
monthlyAverageList=[]
numerator=0
denominator=1
month=i[0].split('-')[1]
year=i[0].split('-')[0]
vol=float(i[5])
if m==month
denominator=denominator + vol # sum of volumes
elif not m==month:
print numerator
average=numerator/(denominator-1) #average
print average
monthlyAverageList.append(average)
m= month #redefine month
n=year #redefine year
return monthlyAverageList

def printInfo(monthlyAverageList):
monthlyAverageList.sort()
print monthlyAverageList[0:5]
monthlyAverageList.reverse()
print monthlyAverageList[0:5]

fileName='table.csv'
dataList=getDataList(fileName)
dataList.pop(0)
monthlyAverageList=getMonthlyAverage(dataList)
#print monthlyAverageList
printInfo(monthlyAverageList)

def getDataList(fileName):
dataFile = open(fileName, "r")
dataList = []
for line in dataFile:
# strip end-of-line, split on commas, and append items to list
dataList.append(line.strip().split(','))
return dataList

# find monthly average
def getMonthlyAverage(dataList):
for i in dataList:
m ='08'
n='2008'
monthlyAverageList=[]
numerator=0
denominator=1
month=i[0].split('-')[1]
year=i[0].split('-')[0]
vol=float(i[5])
if m==month
denominator=denominator + vol     # sum of volumes
elif not m==month:
print numerator
average=numerator/(denominator-1)     #average
print average
monthlyAverageList.append(average)
m= month                          #redefine month
n=year							  #redefine year
return monthlyAverageList

def printInfo(monthlyAverageList):
monthlyAverageList.sort()
print monthlyAverageList[0:5]
monthlyAverageList.reverse()
print monthlyAverageList[0:5]

fileName='table.csv'
dataList=getDataList(fileName)
dataList.pop(0)
monthlyAverageList=getMonthlyAverage(dataList)
#print monthlyAverageList
printInfo(monthlyAverageList)

if colon missing line 21, indention off

I do not see anywhere counting of monthly average:

sum of month/count

or

weighted sum/sum of weights

.

Use the Date, Volume, and Adj Close ﬁelds to calculate the average
monthly prices. Here is a formula for the average price for a month, where Vi is the
volume and Ci is the day i’s adjusted close price (Adj Close).
averagePrice = (V 1
∗ C 1 + V 2 ∗ C 2 + . . . + V n ∗ C n)/(V 1 + V 2 + . . . + V n)

So you want to accumulate [close, volume] in a list of lists, and when the month field changes, calculate the formula, which is just a for() loop, looping through the list.

For each month create a tuple with two items: the average for that month, and the
date (you need only the month and year). Append the tuple for each month to a list
(e.g. monthlyAveragesList), and after calculating all the monthly averages, return
this list.

So now that we have the calculation result, append it to the totals list with the (previous) month and year, and re-initialize the close+volume list as an empty list, and start appending close, volume to the now empty list until the next change in date.

Start by testing the month (this_month != previous_month) and print when a change occurs. Then add code to append close+volume to the list of lists, and send it to a calculation function when the month changes, etc. I got 76 months of data from Yahoo, so test the length of the totals list and make sure it is 76 (or how ever much data you actually have). That's enough hints for now. Post back with any code that you are having problems with.

##pseudo_code ---> not even close to being a complete solution
##
if this_month != previous_month:
ret_tuple = close_vol_calcs(close_vol_list, this_month, year)
monthly_averages_list.append(ret_tuple)
close_vol_list = []
previous_month = this_month
junk_list = []
junk_list.append(close)
junk_list.append(vol)
close_vol_list.append(junk_list)

I dont quite understand what you are trying to do with this part:

junk_list = []
junk_list.append(close)
junk_list.append(vol)
close_vol_list.append(junk_list)

however, this is what i have come up with. but now the command window returning a blank list

def getDataList(fileName):
dataFile = open(fileName, "r")
dataList = []
for line in dataFile:
# strip end-of-line, split on commas, and append items to list
dataList.append(line.strip().split(','))
return dataList

# find monthly average
def getMonthlyAverage(dataList):
m ='08'
n='2008'
monthlyAverageList=[]
numerator=[]
denominator=[]
for i in dataList:
month=i[0].split('-')[1]
year=i[0].split('-')[0]
#print month!=m and year!=n
vol=float(i[5])
#if m==month and n==year:
denominator.append(vol)     # sum of volumes
if month!=m and year!=n:
sumNum=sum(numerator)
sumDen=sum(denominator)
average=float(sumNum)/float(sumDen)     #average
print average
#a-tuple=(month, year, average)
# monthlyAverageList.append(a-tuple)
m=month                          #redefine month
n=year							  #redefine year
return monthlyAverageList

def printInfo(monthlyAverageList):
monthlyAverageList.sort()
print 'The 6 best averages for google are:'
print monthlyAverageList
monthlyAverageList.reverse()
print 'The 6 worst averages for google are:'
print monthlyAverageList

fileName='table.csv'
dataList=getDataList(fileName)
dataList.pop(0)
monthlyAverageList=getMonthlyAverage(dataList)
#print monthlyAverageList
printInfo(monthlyAverageList)

i didn't want to try the tuple part yet because i am still having trouble getting the averages. I have been testing and re-testing it for days and trying to de-bug that portion of the code.

The first thing you have to do is to separate the data by month. The first thing you have to do is separate the data by month.

I dont quite understand what you are trying to do with this part:
junk_list = []
junk_list.append(close)
junk_list.append(vol)
close_vol_list.append(junk_list)

I thought, perhaps wrongly, that this code would make it obvious that you are creating a list of lists. Try the following code as an illustration (untested so there may be typos). But first you have to do is separate the data by month.

def get_monthly_average(dataList):
## do 30 records as a test
data_list_short = dataList[1:31]
print data_list_short
close_vol_list = []

for rec in data_list_short:
substrs = rec.split(",")
if len(substrs) > 6:   ## allow for possible malformed records
## the following will yield an error for the header record
close = float(substrs[6].strip())
vol = float(substrs[5].strip())
junk_list = []
junk_list.append(close)
junk_list.append(vol)
print "appending junk_list =", junk_list
close_vol_list.append(junk_list)
## or
## close_vol_list.append([float(substrs[6].strip()), float(substrs[5].strip())])
print "-"*30
print "-"*30
print close_vol_list

## print and example of (close X volume) used in the numerator of
## the calculations although this is for all records in the
## short list, not for one month
for rec in close_vol_list:
print close*vol, close, vol

it is saying that for rec.split() there is no attribute split for object type list.

You are doing something wrong. Post your code for the creation of "dataList", and print and post the first 10 or so records.

My mistake. It's a csv file so a list of lists. New code:

def get_monthly_average(dataList):
## do 30 records as a test
data_list_short = dataList[1:31]
print data_list_short
close_vol_list = []

for rec in data_list_short:
## the following will yield an error for the header record
close = float(rec[6])
vol = float(rec[5])
junk_list = []
junk_list.append(close)
junk_list.append(vol)
print "appending junk_list =", junk_list
close_vol_list.append(junk_list)
## or
## close_vol_list.append([close, vol])
print "-"*30
print close_vol_list
print "-"*30

## print and example of close X volume used in the numerator of
## the calculations although this is for all records in the
## short list, not for one month
for rec in close_vol_list:
## since it was stored as close, volume, [0]=close, [1]=volume
print rec[0]*rec[1], rec[0], rec[1], rec

okay. so i applied that to my code and got this:

def getMonthlyAverage(dataList):
m ='08'
n='2008'
monthlyAverageList=[]
junkList=[]
close_vol_list=[]
numerator=[]
denominator=[]
for i in dataList:
month=i[0].split('-')[1]
year=i[0].split('-')[0]
close=i[6]
volume=i[5]
# if m!=month and n!=year:
#average=sum(float(close*volume))/sum(volume)
#a_tuple= (month, year, average)
#monthlyAverageList.append(a_tuple)
junkList.append([close,volume])
close_vol_list.append(junkList)
print close_vol_list

m=month                          #redefine month
n=year                           #redefine year
return monthlyAverageList

and i do get a list within a list. I understand that. But it is nearly impossible to split a list that big. python keeps telling me that there are too many values for it to break up.

I'd have to see the complete error message and the line that caused it. Next, check for differences in the month field. Also, you can eliminate splitting the date twice.

for i in dataList:
date = i[0].split("-")
month=date[1]
year=date[0]

We are perhaps seeing this in two different ways. I am storing all of the records for one month and then calculating because I thought it would be easier to understand. You can add close*volume to a field, and add volume to a second field, and then calculate the formula when there is a new month. It is up to you as to how you want to do it.

for i in dataList:
if month != previous_month:
average=close_volume_total/volume_total
a_tuple= (month, year, average)
monthlyAverageList.append(a_tuple)
close_volume_total = 0.0
volume_total = 0.0
close_volume_total += close * volume
volume += total
previous_month = month

Sorry for the delayed responses. I am doing many other things today.

okay. i will try it. i appreciate the help. i am an engineering student that has had no previous programming experience which is why it is so difficult for me to grasp this stuff. it's like learning and writing in another language.

oh my gosh it worked. thank you. now i just have to figure out how to sort it so that i get the 6 highest and the six lowest averages. thank you thank you thank you!!!!

def getDataList(fileName):
dataFile = open(fileName, "r")
dataList = []
for line in dataFile:
# strip end-of-line, split on commas, and append items to list
dataList.append(line.strip().split(','))
return dataList

# find monthly average
def getMonthlyAverage(dataList):
m ='08'
n='2008'
monthlyAverageList=[]
numerator=0.0
denominator=0.0

for i in dataList:
#print i
month=i[0].split('-')[1]
year=i[0].split('-')[0]
close=float(i[6])
volume=float(i[5])
if month!=m:
average=numerator/denominator
a_tuple=(month, year, average)
monthlyAverageList.append(a_tuple)

numerator+=close*volume
denominator+=volume
m=month
n=year
print monthlyAverageList
return monthlyAverageList

def printInfo(monthlyAverageList):
monthlyAverageList.sort()
print 'The 6 best averages for google are:'
print monthlyAverageList
monthlyAverageList.reverse()
print 'The 6 worst averages for google are:'
print monthlyAverageList

fileName='table.csv'
dataList=getDataList(fileName)
dataList.pop(0)
monthlyAverageList=getMonthlyAverage(dataList)
#print monthlyAverageList
printInfo(monthlyAverageList)

They are called programming __languages__ for a reason. Double check some of the arithmetic as you never zero the numerator or denominator after each month, and make sure the last month of data is in the final list. If it isn't, why not? The Python Sorting Guide (see below). Or do you have do it in a more traditional manner, i.e. a list or sort yourself.

import operator

fileName='table.csv'
dataList=getDataList(fileName)
dataList.pop(0)
monthlyAverageList=getMonthlyAverage(dataList)
#print monthlyAverageList
printInfo(monthlyAverageList)

##   assuming you are using Python2.6 or greater
monthlyAverageList.sort(key=operator.itemgetter(-1)) ## sort on last column
print monthlyAverageList

monthlyAverageList.sort(key=operator.itemgetter(-1), reverse=True)
print monthlyAverageList

I'm glad that this thread was continuing even without me posting which I'm very sorry for. Been a tad bit busy with exams, work and papers. This is the best excuse I have so far. Also glad to see someone in the same class as me on this website.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.