i have a folder that contains csv excel files that have numbers in the first column labeled 'x', second column label'y', and third column labeled'z'. i was trying to take all of the numbers in the x and calculate the averages and std, take all the numbers in the y column for all files and calculate the average and std...the same for the z column.this is what i have so far....but it doesn't seem to work

import csv
import os
from numpy import array


path="A:\\yoyo\\heyy\\folder"
dirList=os.listdir(path)
for file in dirList:
    fullpath=os.path.join(path,file)
##    print fullpath
    with open(fullpath, 'rb') as f:
        nums=[[val for val in line.split(',')] for line in f.readlines()]
##        print line
        anums = array([nums])
        for c in range(anums.shape[1]):
            column = anums[:,c]
            col_mean=column.mean()
            col_std=column.std()

            print col_mean
            print col_std

Recommended Answers

All 11 Replies

You do not do looping but let numpy to do matrix operations, when using numpy.
I think nums should contain numbers instead of strings and that you should not add [] to it for anums but keep same dimensions as nums.

import numpy

a = [1, 2, 34, 54]

mean = float(sum(a))/len(a)
s2 = sum((x - mean)**2  for x in a)/len(a)
standard_deviation = s2**0.5
print('without numpy: mean %f, standard deviation %f' % (mean,standard_deviation))

#better a without []
n = numpy.array(a)
print('with numpy: mean %f, standard deviation %f' % (n.mean(), n.std()))

# three datarows
x, y, z = a, [4, 5, 6, 7], [54, 23, 76, 98]
# transpose data rows as columns to get requested input
xyz = list(zip(*[x, y, z]))
n = numpy.array(xyz)
# calculate avg and std columnwise, first column must be same as above results
print('with numpy: mean %s, standard deviation %s' % (n.mean(0), n.std(0)))

how can i access each column by it self? to be more clear.... what i want to do is to calculate the avg and std for x,y,and z column for each file in the folder.And then print that value to a different excel.

A numpy array is accessed by row and column. Add some print statements to find out what the data looks like. Also, a numpy array is overkill. Read the records one by one and store/calculate the values. Start with some sample data that is small, say 10 records, so you don't have to go through all of the files and data each time. Also, read the directory tree and send each file name to a common function to process.

i tried to attach one of my files but i couldn't so it looks like this for each file...

x y z
0.045 0.45 0.0002
0.004 0.0001 0.0005678

so it goes like that...and i have more than 10,000 rows for each csv file. so how can i caluculate the avg and std only for the x column, calculate avg and std for the y column, and calculate avg and std for the z column......and how can i print each colum one by one???

x          y            z
0.045     0.45        0.0002
0.004     0.0001      0.0005678
x          y            z
0.045     0.45        0.0002
0.004     0.0001      0.0005678

I also suggest a hand made solution

import csv
from math import sqrt
import os

class MaxiAverageCalculator(object):
    def __init__(self):
        self.ncols = 3
        self.nrows = 0
        self.s = [0.0, 0.0, 0.0]
        self.s2 = [0.0, 0.0, 0.0]
        
    def run(self, fullpath):
        with open(fullpath, "rb") as infile:
            reader = csv.reader(infile, delimiter=",")
            self.colnames = list(next(reader)) # skip first line with column names
            assert len(self.colnames) == 3
            for row in reader:
                L = [ float(x) for x in row ]
                assert len(L) == 3
                for i, x in enumerate(L):
                    self.s[i] += x
                    self.s2[i] += x * x
                    self.nrows += 1
        self.avg = [x/self.nrows for x in self.s]
        self.std = [ sqrt((y/self.nrows) - a * a) for a, y in zip(self.avg, self.s2) ]
        print "Results for {0}".format(fullpath)
        for name, a, s in zip(self.colnames, self.avg, self.std):
            print "{0}: avg = {1:.5f}, std = {2:.5f}".format(name, a, s)
        print
            
if __name__ == "__main__":
    path="A:\\yoyo\\heyy\\folder"
    dirList=os.listdir(path)
    for file in dirList:
        fullpath=os.path.join(path,file)
        calc = MaxiAverageCalculator()
        calc.run(fullpath)

Line 23 of Gribouillis' code should be indented out of for loop, now it count row for every item of data (3 fold). Notice that code wants comma separated values not tabulated data.

commented: thanks for debugging +13
import csv
from math import sqrt
import os

class MaxiAverageCalculator(object):
    def __init__(self):
        self.fullpath = None

    def set_file(self, fullpath, ncols=3):
        if not fullpath.endswith('.csv'):
            raise ValueError("{fullpath} is not csv file!\n".format(fullpath=fullpath))
        self.fullpath = fullpath
        self.ncols = ncols
        self.nrows = 0
        self.s = [0.0, 0.0, 0.0]
        self.s2 = [0.0, 0.0, 0.0]
         with open(self.fullpath, "rb") as infile:
                reader = csv.reader(infile, delimiter=",")
                # skip first line with column names
                self.colnames = [name.strip() for name in next(reader)]
                assert len(self.colnames) == self.ncols
                for row in reader:
                    record = [float(x) for x in row]
                    assert len(record) == self.ncols
                    for i, x in enumerate(record):
                        self.s[i] += x
                        self.s2[i] += x * x
                    self.nrows += 1
                self.avg = [x/self.nrows for x in self.s]
                self.std = [sqrt((y/self.nrows) - a * a) for a, y in zip(self.avg, self.s2)]
        
    def __str__(self):
        if self.fullpath:
                result = ["Results for {0}".format(self.fullpath)]
                result.append('Rows read: {nrows}'.format(nrows=self.nrows))
                for name, a, s in zip(self.colnames, self.avg, self.std):
                    result.append("{0}: avg = {1:.5f}, std = {2:.5f}".format(name, a, s))
                return '\n'.join(result)+'\n'
        else:
            raise ValueError('CSV filename not set')
            
if __name__ == "__main__":
    path="data"
    dirList=os.listdir(path)
    mycalculator = MaxiAverageCalculator()
    for file in dirList:
        try:
            mycalculator.set_file(os.path.join(path,file))
        except ValueError as e:
            print e
        else:
            print mycalculator
import csv
from math import sqrt
import os
 
class MaxiAverageCalculator(object):
    def __init__(self):
        self.ncols = 3
        self.nrows = 0
        self.s = [0.0, 0.0, 0.0]
        self.s2 = [0.0, 0.0, 0.0]
 
    def run(self, fullpath):
        with open(fullpath, "rb") as infile:
            reader = csv.reader(infile, delimiter=",")
            self.colnames = list(next(reader))
            assert len(self.colnames) == 3
            for row in reader:
                L = [ float(x) for x in row ]
                assert len(L) == 3
                for i, x in enumerate(L):
                    self.s[i] += x
                    self.s2[i] += x * x
                self.nrows += 1
        self.avg = [x/self.nrows for x in self.s]
        self.std = [ sqrt((y/self.nrows) - a * a) for a, y in zip(self.avg, self.s2) ]
        print "Results for {0}".format(fullpath)
        for name, a, s in zip(self.colnames, self.avg, self.std):
            f.write(str(a))
            f.write(", ")
            f.write(str(s))
        f.write("\n")
##            print "{0}: avg = {1:.5f}, std = {2:.5f}".format(name, a, s)
##        print
 
if __name__ == "__main__":
    path="A:\\yoyo\\heyy\\folder"
    f=open("A\\yoy\\save.xls")
    f.write("xavg, xstd, yavg, ystd, zavg, zstd")
    f.write("\n")
    dirList=os.listdir(path)
    for file in dirList:
        fullpath=os.path.join(path,file)
        calc = MaxiAverageCalculator()
        calc.run(fullpath)

so i was trying to print the values to an 'xls' file....its printing the values as i expected it...u can see what im trying to do on the script. please explain..thank you

im meant its not printing the values as i wanted it..

its ok i got it...thanks for the help!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.