I have run into a new problem, this time with the re.findall() module.
The objective of this code is to iterate over rows in a Excel sheet and print them in a other Excel sheet with a separation of column between the species name and the gene name.
It seems that the regular expression is working fine for the first 4 rows of the excel sheet. But the last 3 are not printed, but they contain the same sorts of names as the 4 that are working. So it should be working...
species name: gene name:
Homo sapiens CYP2C19
Homo sapiens CYP2C9
Danio rerio CYP39A1
Xenopus leavis CYP39A1 **Text is printed until here **
Mus musculus Cyp2c65
Mus musculus Cyp2c66
Danio rerio Cyp2c38
Without the re.findall() module all the rows are read, so is it a bug in the re.findall() module?
Can someone tell me what I'm doing wrong?
Here is part of my code (reading only the rows of the Excel sheet):
(see below for the Excel sheet used in this test)
import xlrd import xlwt import re # inputfile: wb = xlrd.open_workbook('Test_input.xls') #Get the first sheet either by index or by name sh = wb.sheet_by_index(0) print "Number of rows: %s Number of cols: %s" % (sh.nrows, sh.ncols) # Create a output workbook and worksheet wbk = xlwt.Workbook() sheet_total = wbk.add_sheet('names total') sheet_split = wbk.add_sheet('names split') #Check the sheet names wb.sheet_names() #Algorithm for reading en writing from file to file per row: #Index individual cells: rowx = 1 colx = 0 row = 0 # row counter for new Excel sheet counter_row = 1 # while counter print 'Printing rows of Excel sheet:' sheet_total.write(row,0,'Rows') # writes heater in new Excel sheet sheet_split.write(row,0,'Rows') sheet_split.write(row,1,'Rows') while counter_row < sh.nrows: row_cell = sh.cell(rowx,colx).value tuples = re.findall(r'(\w+\s\w+)\s*(CYP\w+)', row_cell) print 'TUPLES:', tuples rowx += 1 print 'print_row:', rowx, colx, row_cell row += 1 for tuple in tuples: print tuple ## The whole match, print on sheet 1 sheet_total.write(row,0,tuple) print tuple ## Species name (group 1), print sheet 2, col 1 sheet_split.write(row,0,tuple) print tuple ## Gene name (group 2), print sheet 2, col 2 sheet_split.write(row,1,tuple) if rowx == sh.nrows: rowx = 1 counter_row += 1 wbk.save('reformatted.data.xls')
Edited by sinnebril: file uplode