Hi guys:-) I have no experience in programming, and i realise that asking for someone to write an entire programme for me might be a stretch, but unfortunately i havent got any mates that are good at programming. I think the program i need is fairly simple once you know how to write code:-)

I have come across some very uncooperative software on a piece of scientific equipment that I'm currently using. It generates excel files with a huge amount of data points (3-400.000) with each data point having 2 parameters and a time. I need to calculate the average of each parameter for all data points within each second. There are a varying amount of data points for each second, which is why i can't simply have excel do the calculations for me...

The excel files have parameter A in column A, parameter B in column B and the time in column C, if needed i can provide a data file to try the program on:-)

If any kind person will help me, we can figure out a way for me to compensate you for your help:-)


Keeping my fingers crossed, hope to hear from one of you soon:-)

Best regards, casper

Recommended Answers

As a starting point, here is a function which computes the averages for a sequence of values (parameter A, parameter B, time). The assumption is that the input data are sorted by time. Try to run it and see if it does what you want
[code=python]
from itertools import groupby

Jump to Post

Ok, I renamed your file inputfile.txt and the code below produces an outputfile.txt containing the averages.
[code=python]

averages.py

from itertools import groupby

def gen_averages(input_triples):
for second, triples in groupby(input_triples, lambda triple: int(triple[2])):
a, b = 0.0, 0.0
n = 0
for x, y, t in triples:
a += float(x)

Jump to Post

[QUOTE=larsen182;1236334]It works perfectly!!!! Thank you so much:-)
What do you want for it? I can pay you via my paypal account, would that work for you?

Best regards, Casper:-)[/QUOTE]
It's ok. It's only a small exercise for most members of the python forum :)

Also, there is a little theoretical …

Jump to Post

All 6 Replies

As a starting point, here is a function which computes the averages for a sequence of values (parameter A, parameter B, time). The assumption is that the input data are sorted by time. Try to run it and see if it does what you want

from itertools import groupby

def gen_averages(input_triples):
    for second, triples in groupby(input_triples, lambda triple: triple[2]):
        a, b = 0.0, 0.0
        n = 0
        for x, y, t in triples:
            a += float(x)
            b += float(y)
            n += 1
        yield (a/n, b/n, second)

def test_me():
    data = [
        (1, 1, 364),
        (2, 4, 365),
        (3, 5, 365),
        (8, -1, 365),
        (3, 2, 366),
        (1, 1, 366),
        (0, 0, 367),
    ]
    for triple in gen_averages(data):
        print(triple)

if __name__ == "__main__":
    test_me()

""" My output -->
(1.0, 1.0, 364)
(4.333333333333333, 2.6666666666666665, 365)
(2.0, 1.5, 366)
(0.0, 0.0, 367)
"""

It only remains to read the data in the excel files and perhaps write the output to other excel files. We can use the modules xlrd xlwr xlutils for this.
What are your OS and your version of python ? Also, can you attach an excel file (or a part of it) to a post (you may have to zip it first) ?

Thanks mate:-) I tried to write the code into python, and that went well:-) Im almost embarrased to ask, but how do i start the program? When i press "run module" I just get:
>>>
>>>

Reading my first post, im not sure i was clear enough on the output i need, i need the average value of each parameter (FSC and SSC) taken for all data points that have t=[0;1[, t=[1;2[, t=[2;3[ and so on as i need to see whether these average values changes over time.

I couldnt attach the whole file for some reason, even after it was zipped so i uploaded a little sample.

Im working on a Windows XP OS and i downloaded python 2.6.5., but if you prefer working on a different version i will get that:-)

I really appreciate you helping me mate:-)

Best regards, Casper:-)

Ok, I renamed your file inputfile.txt and the code below produces an outputfile.txt containing the averages.

# averages.py
from itertools import groupby

def gen_averages(input_triples):
    for second, triples in groupby(input_triples, lambda triple: int(triple[2])):
        a, b = 0.0, 0.0
        n = 0
        for x, y, t in triples:
            a += float(x)
            b += float(y)
            n += 1
        yield (a/n, b/n, second)

def gen_input(textfile):
    f = open(textfile)
    f.readline()
    for line in f:
        triple = tuple(float(x) for x in line.strip().replace(",", ".").split())
        assert(len(triple) == 3)
        yield triple

def compute_averages(inputfile, outputfile):
    with open(outputfile, "w") as out:
        for triple in gen_averages(gen_input(inputfile)):
            out.write("{0:<15.2f}{1:<15.2f}{2:d}\n".format(*triple))
if __name__ == "__main__":
    compute_averages("inputfile.txt", "outputfile.txt")
    print "done (see the output file)"

To run the program, you have 2 solutions: either you start IDLE (python gui), then you open averages.py (the name of the program) and you select 'Run Module' in idle's run menu, or you open a windows cmd shell, you go to the folder containing the program and you type python averages.py on the command line.

The program must be modified if you want to read and write excel files instead of text files.

It works perfectly!!!! Thank you so much:-)
What do you want for it? I can pay you via my paypal account, would that work for you?

Best regards, Casper:-)

It works perfectly!!!! Thank you so much:-)
What do you want for it? I can pay you via my paypal account, would that work for you?

Best regards, Casper:-)

It's ok. It's only a small exercise for most members of the python forum :)

Also, there is a little theoretical problem: your time samples are not uniformly distributed over the 1 second time intervals, so are you sure that a raw averaging is meaningful ? Shouldn't we design a special formula for the average over 1 second ?

commented: nice way to handle this +10

Really, wow thanks:-) The raw average should be meaningful, the values are a meassure of how individual cells scatter light, the scatter increases when the cells shrink. So the average scatter correlates well with the fraction of the cells sampled within one second that have shrunk.

This is all i need, thanks a bunch:-)