Problem with re.findall() module (stops afther 4 times)

Question

sinnebril 0 Newbie Poster

13 Years Ago

Hi there!

I have run into a new problem, this time with the re.findall() module.
The objective of this code is to iterate over rows in a Excel sheet and print them in a other Excel sheet with a separation of column between the species name and the gene name.

It seems that the regular expression is working fine for the first 4 rows of the excel sheet. But the last 3 are not printed, but they contain the same sorts of names as the 4 that are working. So it should be working...

*Example:
species name: gene name:
Homo sapiens CYP2C19
Homo sapiens CYP2C9
Danio rerio CYP39A1
Xenopus leavis CYP39A1 **Text is printed until here **
Mus musculus Cyp2c65
Mus musculus Cyp2c66
Danio rerio Cyp2c38
*
Without the re.findall() module all the rows are read, so is it a bug in the re.findall() module?

Can someone tell me what I'm doing wrong?

Here is part of my code (reading only the rows of the Excel sheet):
(see below for the Excel sheet used in this test)

import xlrd
import xlwt
import re

# inputfile:
wb = xlrd.open_workbook('Test_input.xls') 

#Get the first sheet either by index or by name
sh = wb.sheet_by_index(0)

print "Number of rows: %s   Number of cols: %s" % (sh.nrows, sh.ncols)

# Create a output workbook and worksheet
wbk = xlwt.Workbook()
sheet_total = wbk.add_sheet('names total') 
sheet_split = wbk.add_sheet('names split')

#Check the sheet names
wb.sheet_names()

#Algorithm for reading en writing from file to file per row:

#Index individual cells:
rowx = 1
colx = 0
row = 0  # row counter for new Excel sheet
counter_row = 1 # while counter

print 'Printing rows of Excel sheet:'
sheet_total.write(row,0,'Rows') # writes heater in new Excel sheet
sheet_split.write(row,0,'Rows')
sheet_split.write(row,1,'Rows')

while counter_row < sh.nrows:
  row_cell = sh.cell(rowx,colx).value
  tuples = re.findall(r'(\w+\s\w+)\s*(CYP\w+)', row_cell)
  print 'TUPLES:', tuples

  rowx += 1
  print 'print_row:', rowx, colx, row_cell
  row += 1

  for tuple in tuples:
    print tuple   ## The whole match, print on sheet 1
    sheet_total.write(row,0,tuple)

    print tuple[0]  ## Species name (group 1), print sheet 2, col 1
    sheet_split.write(row,0,tuple[0])

    print tuple[1]  ## Gene name (group 2), print sheet 2, col 2
    sheet_split.write(row,1,tuple[1])

  if rowx == sh.nrows:
    rowx = 1
    counter_row += 1

wbk.save('reformatted.data.xls')

algorithm python

This attachment is potentially unsafe to open. It may be an executable that is capable of making changes to your file system, or it may require specific software to open. Use caution and only open this attachment if you are comfortable working with zip files.

Test_input.zip (7.03 KB)

Edited 13 Years Ago by sinnebril because: file uplode

3 Contributors
4 Replies
204 Views
1 Day Discussion Span
Latest Post 13 Years Ago Latest Post by sinnebril

All 4 Replies

TrustyTony 888 ex-Moderator

13 Years Ago

Last rows has mixed case and you do not set the re.I flag to ignore case.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

sinnebril 0 Newbie Poster · Answer 1 · 2012-04-12T13:35:28+00:00

Last rows has mixed case and you do not set the re.I flag to ignore case.

Thanks pyTony!
What do you mean by mixed case?
Do you mean upper and lower case?
But this is for all the rows (so names of spiecies) the same.
So why is it only working for the fist 4 rows?

M.S. 53 Light Poster · Answer 2 · 2012-04-12T15:14:54+00:00

as pyTony said, It only finds the cells containing "CYP", and ignores the cells containing "Cyp"

sinnebril 0 Newbie Poster · Answer 3 · 2012-04-13T14:33:49+00:00

Thanks!

This is my new code and it works!

tuples = re.findall(r'(\w+\s\w+)\s*(CYP\w+)', row_cell, re.IGNORECASE)

Problem with re.findall() module (stops afther 4 times)

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers