my excel files have more than a 1000 numbers under the columns that are labeled x, y, z for each excel file. i was trying to print the x, y, and z values for all of the excel files in to a single excel file(Trying_excel.xls)that would have all of the numbers under the x,y,z column for all of the excel files. This is what i have so far...but it doesnt seem to work...

import xlrd
import os
path = "c:\\Hello\\RawTracking"
dirList=os.listdir(path)
f = open('C:\\Hello\\Other_files\\Trying_excel.xls', 'w')
f.write('Xvalue, Yvalues, Zvalue')
f.write("\n")

Col_values=[]
Col_values1=[]
Col_values2=[]
for file in dirList:
    fullpath = os.path.join(path,file)
    if os.path.isfile(fullpath) == 1:
        wb = xlrd.open_workbook(fullpath)
        wb.sheet_names()
        sh = wb.sheet_by_name(u'RawTrackingData')

        for j in range(21,sh.nrows):
            Col_values.append(sh.cell(j,0).value)
            Col_values1.append(sh.cell(j,1).value)
            Col_values2.append(sh.cell(j,2).value)
        a = Col_values
        b = Col_values1
        c = Col_values2
        f.write(str(a))
        f.write(", ")
        f.write(str(b))
        f.write(", ")
        f.write(str(c))
        f.write(", ")
    f.write("\n")

Recommended Answers

All 13 Replies

This worked for me as I put some excel files with three colummns of numbers starting at row 21 in sheet 'RawTrackingData':

import xlrd
import os

path = 'data'

with open('Trying_excel.csv', 'w') as f:
    column_values=['', '', '']

    for filename in os.listdir(path):
        fullname = os.path.join(path, filename)
        if fullname.endswith('.xls'):
            print('Handling %s' % fullname)
            wb = xlrd.open_workbook(fullname)
            sh = wb.sheet_by_name(u'RawTrackingData')

            for j in range(21,sh.nrows):
                for ind in range(3):
                    column_values[ind] += ', ' + str(sh.cell(j,ind).value)
    # more reasonable format put lists one per row, x, y and z
    # take out wrong comma in beginning
    f.write('\n'.join(cv[2:] for cv in column_values)+'\n')

print('\nResult csv file:\n')
print(open('Trying_excel.csv').read())

Data actually starts at row 22 according to Excel, like Gribouillis said in other thread. Here version which does not put wrong comma in beginning of line:

import xlrd
import os

path = 'data'

with open('Trying_excel.csv', 'w') as f:
    column_values=['', '', '']

    for filename in os.listdir(path):
        fullname = os.path.join(path, filename)
        if fullname.endswith('.xls'):
            print('Handling %r' % filename)
            wb = xlrd.open_workbook(fullname)
            sh = wb.sheet_by_name(u'RawTrackingData')

            for j in range(21,sh.nrows):
                for ind in range(3):
                    column_values[ind] += ((', ' if column_values[ind] else '') +
                                           str(sh.cell(j,ind).value))
    # more reasonable format put lists one per row, x, y and z
    f.write('\n'.join(column_values)+'\n')

print('\nResult csv file:\n')
print(open('Trying_excel.csv').read())

heyy...Thank you very much. this was very helpful. so the script you posted above prints the x, y, and z values in the first, secound, and third 'ROWS', not 'COLUMNS'. i tried to fix it..but it is just not working...

ohhh..i actually figured it out.

for 10 diffrent patients i have 100 excel files saved in a folder. the first eight letters of the excel files for all of the the excels represents patient ID. so im trying to print the x, y, and z values for a patient in to one separate excel files. and i want the excels that are going to be created will be named by the patient id. by the end i want the script to create 10 diffrent excels each represents a patient. i have this so far.....please help!

import os
import xlrd, xlwt

file_index={}

SOURCE_DIR = "E:\\IMRT_C\\Preanalysis\\"
OUTPUT_DIR = "E:\\IMRT_C\\Working_files"
dirList = os.listdir(SOURCE_DIR)

for filename in dirList:
    first_eight = filename.partition("-")[0]
    if file_index.has_key(first_eight):
        file_index[first_eight].append(filename)
    else:
        file_index[first_eight] = [filename]

##print file_index

for L in file_index.keys():
    for filename in file_index[L]:
        fullpath=os.path.join(SOURCE_DIR, filename)
        wb = xlrd.open_workbook(fullpath)
        wb.sheet_by_name(u'RawTrackingData')

def value_triples(file_index):
    for sh in data_sheets(file_index):
        for j in xrange(21, sh.nrows):
            yield tuple(sh.cell_value(j, i) for i in range(3))

def create_output(srcdir, dstdir, workbook_name, sheet_name):
    wb = xlwt.Workbook()
    try:
        sh = wb.add_sheet(sheet_name)
        for col, val in enumerate("Xvalue Yvalue Zvalue".split()):
            sh.write(0, col, val)
        for row, triple in enumerate(value_triples(srcdir), 1):
            for col, val in enumerate(triple):
                sh.write(row, col, val)
    finally:
        wb.save(os.path.join(dstdir, workbook_name))

if __name__ == "__main__":
    create_output(SOURCE_DIR, OUTPUT_DIR, file_index.keys(), u"TrackingData")

I think the following code should work, if ordering the files for each patient alphabetically gives the correct order

from itertools import groupby
import os
from os.path import join as pjoin
import xlrd, xlwt

SOURCE_DIR = "E:\\IMRT_C\\Preanalysis\\"
OUTPUT_DIR = "E:\\IMRT_C\\Working_files"

def first_eight(string):
    return string[:8]

def files_by_patient(srcdir):
    filenames = sorted(f for f in os.listdir(srcdir) if f.endswith(".xls"))
    for ID, group in groupby(filenames, first_eight):
        yield ID, [ pjoin(srcdir, filename) for filename in group ]

def triples_by_patient(srcdir):
    for ID, files in files_by_patient(srcdir):
        yield ID, value_triples(xlrd.open_workbook(f).sheet_by_name(
                                          u'RawTrackingData') for f in files)

def value_triples(sheet_sequence):
    for sh in sheet_sequence:
        for j in xrange(21, sh.nrows):
            yield tuple(sh.cell_value(j, i) for i in range(3))

def create_workbook(dstdir, ID, triples, sheet_name = u"TrackingData"):
    name = pjoin(dstdir, ID + ".xls")
    wb = xlwt.Workbook()
    try:
        sh = wb.add_sheet(sheet_name)
        for col, val in enumerate("Xvalue Yvalue Zvalue".split()):
            sh.write(0, col, val)
        for row, triple in enumerate(triples, 1):
            for col, val in enumerate(triple):
                sh.write(row, col, val)
    finally:
        wb.save(name)
        
def main(srcdir, dstdir):
    for ID, triples in triples_by_patient(srcdir):
        create_workbook(dstdir, ID, triples)

if __name__ == "__main__":
    main(SOURCE_DIR, OUTPUT_DIR)

the script works when i have few excels saved in a folder, but it does work when i run the script on a folder that has more than 100 excels that has more than 10 excels for a patient.......this is the error message it prints out....please explain??

Traceback (most recent call last):
  File "C:\Amanu\Amanu_summer_2011_Scripts\Patient_ID_Columnvalues.py", line 45, in <module>
    main(SOURCE_DIR, OUTPUT_DIR)
  File "C:\Amanu\Amanu_summer_2011_Scripts\Patient_ID_Columnvalues.py", line 42, in main
    create_workbook(dstdir, ID, triples)
  File "C:\Amanu\Amanu_summer_2011_Scripts\Patient_ID_Columnvalues.py", line 36, in create_workbook
    sh.write(row, col, val)
  File "C:\Python26\Lib\site-packages\xlwt\Worksheet.py", line 1003, in write
    self.row(r).write(c, label, style)
  File "C:\Python26\Lib\site-packages\xlwt\Worksheet.py", line 1048, in row
    self.__rows[indx] = self.Row(indx, self)
  File "C:\Python26\Lib\site-packages\xlwt\Row.py", line 40, in __init__
    raise ValueError("row index (%r) not an int in range(65536)" % rowx)
ValueError: row index (65536) not an int in range(65536)

the script works when i have few excels saved in a folder, but it does work when i run the script on a folder that has more than 100 excels that has more than 10 excels for a patient.......this is the error message it prints out....please explain??

Traceback (most recent call last):
  File "C:\Amanu\Amanu_summer_2011_Scripts\Patient_ID_Columnvalues.py", line 45, in <module>
    main(SOURCE_DIR, OUTPUT_DIR)
  File "C:\Amanu\Amanu_summer_2011_Scripts\Patient_ID_Columnvalues.py", line 42, in main
    create_workbook(dstdir, ID, triples)
  File "C:\Amanu\Amanu_summer_2011_Scripts\Patient_ID_Columnvalues.py", line 36, in create_workbook
    sh.write(row, col, val)
  File "C:\Python26\Lib\site-packages\xlwt\Worksheet.py", line 1003, in write
    self.row(r).write(c, label, style)
  File "C:\Python26\Lib\site-packages\xlwt\Worksheet.py", line 1048, in row
    self.__rows[indx] = self.Row(indx, self)
  File "C:\Python26\Lib\site-packages\xlwt\Row.py", line 40, in __init__
    raise ValueError("row index (%r) not an int in range(65536)" % rowx)
ValueError: row index (65536) not an int in range(65536)

There seems to be a limitation of xlwt: row indexes are stored in a 16 bits integer, therefore a sheet can't have more than 65536 rows. From what I read elsewhere it was also a limitation of excel 2003.

I suggest that you write a csv file instead of an excel file (then perhaps you can convert the csv file to xls with openoffice or excel)

def create_workbook(dstdir, ID, triples):
    name = pjoin(dstdir, ID + ".csv")
    import csv
    with open(name, "wb") as outfile:
        wb = csv.writer(outfile, delimiter = ',')
        wb.writerow("Xvalue Yvalue Zvalue".split())
        wb.writerows(triples)

Another solution would be to write in more than 3 columns. I understand that according to the same limitation of excel 2003, there can be 256 colums, which means 85 patient records. This would allow 85 * 65535 = 5570475 records in an excel file. This may be sufficient for your needs. All you have to do is to chang

i changed it to CSV files and it prints out an error message that says the local variable 'wb' referenced before assignment

i changed it to CSV files and it prints out an error message that says the local variable 'wb' referenced before assignment

It's probably because the first time I wrote open(name, wb) instead of open(name, "wb") . Try with the new version.

ohh and heads up....take out "wb.save(name)" , csv.writer does not support that

thank you very much..this was very helpful!!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.