i have about a 1000 excel files that have data sets.it was collected from about a 20 patients. so i have a script that reads data sets for each patient from all of the excel files. im having trouble printing the values in separate excel files that each file represents each patient.

Recommended Answers

All 4 Replies

It's difficult to help you since we don't know how your excel files are formatted nor the format of the output files that you want. Assuming you have a function all_files() which lists your excel files and a function find_patient(filename) which extracts the patient identification from a file, you can sort your files by patient

from itertools import groupby

def by_patient():
    L = sorted(all_files(), key = find_patient)
    return list((patient_id, list(group)) for patient_id, group in groupby(L, find_patient))

Then you can traverse the files with a loop

for patient_id, filenames in by_patient():
    # here create an output file
    for filename in filenames:
        # add the content of filename to the output file

Edit: also note that "Help with patient records in excel files" is a much better forum thread title than "Python, Help me please!!" :)

Thank you....my excel files have more than a 1000 numbers under the columns that are labeled x, y, z for each excel file. i was trying to print the x, y, and z values for all of the excel files in to a single excel file(Trying_excel.xls)that would have all of the numbers under the x,y,z column for all of the excel files. This is what i have so far...but it doesnt seem to work...

import xlrd
import os
path = "c:\\Hello\\RawTrackingData"
dirList=os.listdir(path)
f = open('C:\\Hello\\Other_files\\Trying_excel.xls', 'w')
f.write('Xvalue, Yvalues, Zvalue')
f.write("\n")

Col_values=[]
Col_values1=[]
Col_values2=[]
for file in dirList:
    fullpath = os.path.join(path,file)
    if os.path.isfile(fullpath) == 1:
        wb = xlrd.open_workbook(fullpath)
        wb.sheet_names()
        sh = wb.sheet_by_name(u'RawTrackingData')

        for j in range(21,sh.nrows):
            Col_values.append(sh.cell(j,0).value)
            Col_values1.append(sh.cell(j,1).value)
            Col_values2.append(sh.cell(j,2).value)
        a = Col_values
        b = Col_values1
        c = Col_values2
        f.write(str(a))
        f.write(", ")
        f.write(str(b))
        f.write(", ")
        f.write(str(c))
        f.write(", ")
    f.write("\n")

Thank you....my excel files have more than a 1000 numbers under the columns that are labeled x, y, z for each excel file. i was trying to print the x, y, and z values for all of the excel files in to a single excel file(Trying_excel.xls)that would have all of the numbers under the x,y,z column for all of the excel files. This is what i have so far...but it doesnt seem to work...

import xlrd
import os
path = "c:\\Hello\\RawTrackingData"
dirList=os.listdir(path)
f = open('C:\\Hello\\Other_files\\Trying_excel.xls', 'w')
f.write('Xvalue, Yvalues, Zvalue')
f.write("\n")

Col_values=[]
Col_values1=[]
Col_values2=[]
for file in dirList:
    fullpath = os.path.join(path,file)
    if os.path.isfile(fullpath) == 1:
        wb = xlrd.open_workbook(fullpath)
        wb.sheet_names()
        sh = wb.sheet_by_name(u'RawTrackingData')

        for j in range(21,sh.nrows):
            Col_values.append(sh.cell(j,0).value)
            Col_values1.append(sh.cell(j,1).value)
            Col_values2.append(sh.cell(j,2).value)
        a = Col_values
        b = Col_values1
        c = Col_values2
        f.write(str(a))
        f.write(", ")
        f.write(str(b))
        f.write(", ")
        f.write(str(c))
        f.write(", ")
    f.write("\n")

Here is a program which works for me, using xlrd and xlwt

import os
from os.path import join as pjoin
import xlrd, xlwt

SOURCE_DIR = "/home/eric/Documents/Daniweb/t377865/ezrfuhefuh"
OUTPUT_DIR = "/home/eric/Documents/Daniweb/t377865/output"

def excel_filenames(dire):
    return sorted(pjoin(dire, name)
                        for name in os.listdir(dire) if name.endswith(".xls"))

def data_sheets(dire):
    for filename in excel_filenames(dire):
        wb = xlrd.open_workbook(filename)
        yield wb.sheet_by_name(u"RawTrackingData")

def value_triples(dire):
    for sh in data_sheets(dire):
        for j in xrange(21, sh.nrows):
            yield tuple(sh.cell_value(j, i) for i in range(3))

def create_output(srcdir, dstdir, workbook_name, sheet_name):
    wb = xlwt.Workbook()
    try:
        sh = wb.add_sheet(sheet_name)
        for col, val in enumerate("Xvalue Yvalue Zvalue".split()):
            sh.write(0, col, val)
        for row, triple in enumerate(value_triples(srcdir), 1):
            for col, val in enumerate(triple):
                sh.write(row, col, val)
    finally:
        wb.save(pjoin(dstdir, workbook_name))

if __name__ == "__main__":
    create_output(SOURCE_DIR, OUTPUT_DIR, "Trying_excel.xls", u"TrackingData")

Notice that the rows and columns in xlrd and xlwt are numbered from 0, so the row 21 is named 22 in openoffice. You will need to add the part which sorts the files patient by patient, etc. I attach my whole test data

Thread closed as there is double of it here. Please follow discussion there.

@g_amanu Do not multi post same question

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.