There seems to be a variety of ways to do this on the net, does anybody have a tested solution?
Suggestions to look at various modules and such appreciated. I don't need the full code.
OK, so I used the info here:
To modify the script to batch save .xlsx to .csv It was a tremendous help and I thought I'd share my code here.
You'll need Python for Windows (I used 2.6) and the pywin32 extensions...easily downloadable off the internet. Install them on the Windows PC that has Excel 2007 on it. Save the following file as "xls2csv.py" into the directory that contains all your .XLSX files. This script will not delete the original .XLSX files.
# You must have Python for Windows (I used 2.6) and pywin32 extensions # installed and Excel 2007 on a Windows PC # Put this script into the dir where all the .XLSX files are and then cd to that dir # Usage: c:\python26\python.exe xls2csv.py import glob import os import time import win32com.client xlsx_files = glob.glob('*.xlsx') if len(xlsx_files) == 0: raise RuntimeError('No XLSX files to convert.') xlApp = win32com.client.Dispatch('Excel.Application') for file in xlsx_files: xlWb = xlApp.Workbooks.Open(os.path.join(os.getcwd(), file)) xlWb.SaveAs(os.path.join(os.getcwd(), file.split('.xlsx') + '.csv'), FileFormat=6) xlApp.Quit() time.sleep(2) # give Excel time to quit, otherwise files may be locked # Uncomment the two lines below if you want the script to remove # the orig .xlsx files when done #for file in xlsx_files: # os.unlink(file)
One thing that needs to be fixed here is that the .CSV files are not closing when the python script completes. It appears that some instances of Excel were staying open too.
Winzip was warning that the .CSV files I were adding were possibly corrupt. I worked around it by logging out and back into Windows.
If any python gurus know how to fix this in the code, I'd appreciate you sharing with us.
to the linux users, who don't have access to those windows modules basically those files (xlsx) are zipped files that contain xml files where the data is contained, so essentially you could write a programme that unzips the xlsx file and parse the xml files contained in the "xl/worksheets" folder to your preferred xml module parser. i used xml.dom.minidom module and i've tested it with the etree module.
xlWb.Close(SaveChanges = True)
insert this line before xlWb.Quit() to completely close excel