Reading columns and rows in Excel file and writing output into a new Excel file

Question

sinnebril 0 Newbie Poster

12 Years Ago

Hi there!

I have been searching on the forum and google, but I can't find a clear answer to my problem.
What I can find is how to read data from a file, but not from specific cells in Excel.

A small example of the input Excel file:

An/gene B  An/gene C  An/gene D  An/gene E  An/gene F  An/gene G
An/gene2						
An/gene3   Nr  B3					
An/gene4   Nr  B4    Nr  C4				
An/gene5   Nr  B5    Nr  C5	Nr  D5			
An/gene6   Nr  B6    Nr  C6	Nr  D6	   Nr  E6		
An/gene7   Nr  B7    Nr  C7	Nr  D7	   Nr  E7	Nr  F7	
An/gene8   Nr  B8    Nr  C8	Nr  D8	   Nr  E8	Nr  F8	 Nr  G8

Legend:
An/gene = animal and gene
Nr = number of identical scores

And the (raw) output should look like this:

An/gene2	An/gene B	
An/gene3	An/gene B	Nr  B3
An/gene4	An/gene B	Nr  B4
An/gene5	An/gene B	Nr  B5
An/gene6	An/gene B	Nr  B6
An/gene7	An/gene B	Nr  B7
An/gene8	An/gene B	Nr  B8
An/gene2	An/gene C	
An/gene3	An/gene C	
An/gene4	An/gene C	Nr  C4
An/gene5	An/gene C	Nr  C5
An/gene6	An/gene C	Nr  C6
An/gene7	An/gene C	Nr  C7
An/gene8	An/gene C	Nr  C8

The empty cells should be skipped of course.

Explanation of the input data:
I have a huge table with data that I want to print like the output file, to make it more clear what is happening. The data consist of a animal gene compared to a other animal gene. This comparison gives a identical score, that is what is printed in the third column of the output table.

So I would like to access cell by cell in a iteration, but in a structured way.
I know I can use a count function like + 1, but how do you tell Python to access for example cell B2?
An other option is to transform the data to a .csv file, but how do read then the file number by number in a specific order?

Can anyone give me some suggestions?
Much appreciated!

python

Edited 12 Years Ago by sinnebril because: pasting files

4 Contributors
17 Replies
3K Views
2 Months Discussion Span
Latest Post 12 Years Ago Latest Post by pk87

TrustyTony 888 ex-Moderator

12 Years Ago

Check http://www.daniweb.com/software-development/python/threads/378039., http://www.python-excel.org/ and
http://www.daniweb.com/software-development/python/code/293490, there is loads of posts on data files here.

Edited 12 Years Ago by TrustyTony because: n/a

hughesadam_87 54 Junior Poster

12 Years Ago

You can use f.readlines() to bring in your entire file into a list. By default, then use some manipulation to split this list at the '\n' characters and write each row into a dictionary. Then you can access rows by a row number keys and columns by a value key.

Do you know what I'm saying or is this unclear?

TrustyTony 888 ex-Moderator

12 Years Ago

At least you should remove line 8 or 9 as they overwrite each other.

Otherwise we would need sample excel data to see what happens (by attaching from Files from menu stripe)

TrustyTony 888 ex-Moderator

12 Years Ago

zip them first.

TrustyTony 888 ex-Moderator

12 Years Ago

You must use regular zip, not rar. I verified that Files accept zip duck

This attachment is potentially unsafe to open. It may be an executable that is capable of making changes to your file system, or it may require specific software to open. Use caution and only open this attachment if you are comfortable working with zip files.

duck.zip (2.6 KB)

Edited 12 Years Ago by TrustyTony

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

sinnebril 0 Newbie Poster · Answer 1 · 2012-03-08T22:12:51+00:00

Check http://www.daniweb.com/software-development/python/threads/378039., http://www.python-excel.org/ and
http://www.daniweb.com/software-development/python/code/293490, there is loads of posts on data files here.

Thanks pyTony, I didn't know http://www.daniweb.com/software-development/python/code/293490 jet, but the others I did know.

But I'm still not able to reverend to a specific cell.

Does anyone know how to do that?

sinnebril 0 Newbie Poster · Answer 2 · 2012-03-09T16:10:18+00:00

You can use f.readlines() to bring in your entire file into a list. By default, then use some manipulation to split this list at the '\n' characters and write each row into a dictionary. Then you can access rows by a row number keys and columns by a value key.
Do you know what I'm saying or is this unclear?

Thank you shoemoodoshaloo!
This also works for a Excel sheet? Or do I have to use the xlrd and xlwt library?

Can you give a example of how to access rows by row number keys and columns by a value key. The other commands like readlines() etc. are clear to me. Thanks!
I will try some things and get back to you.

sinnebril 0 Newbie Poster · Answer 3 · 2012-03-21T09:44:02+00:00

I figured out the question above, see code, but now I have other problem:
The code runs over one column, than it stops. How can I say, go to next column? Or read until EOF (end of file).

import xlrd
import xlwt
wb = xlrd.open_workbook('Test_input.xls')

# Create workbook and worksheet
wbk = xlwt.Workbook()
sheet = wbk.add_sheet('test')

#Check the sheet names
wb.sheet_names()

#Get the first sheet either by index or by name
sh = wb.sheet_by_index(0)
sh = wb.sheet_by_name(u'Sheet1')
print "Number of rows: %s   Number of cols: %s" % (sh.nrows, sh.ncols)

#Index individual cells:
rowx = 0
colx = 0
row = 0  # row counter for new Excel sheet

print 'Printing rows of Excel sheet:'

for row_cell in range(sh.nrows):
  row_cell = sh.cell(rowx,colx).value
  print_row = row_cell
  rowx += 1

  print 'print_row:', rowx, colx, print_row
  sheet.write(row,0,print_row)
  row += 1

rowx = 0
colx = 0
row = 0  # row counter for new Excel sheet

print 'Printing columns of Excel sheet:'

for col_cell in range(sh.ncols):
  col_cell = sh.cell(rowx,colx).value
  colx += 1

  print 'print_col:', col_cell
  sheet.write(row,1,col_cell)
  row += 1 
# The positions of the identical score index are 
rowx = 1
colx = 1
row = 0  # row counter for new Excel sheet

print 'Printing specific cells of Excel sheet:'
total = sh.nrows * sh.ncols
while range(total):
  cell = sh.cell(rowx,colx).value
  rowx += 1

  print 'print_cell:', cell
  sheet.write(row,2,cell)
  row += 1

  if sh.nrows == rowx:
    rowx = 1
    colx = 1
    colx += 1

    for cell in range(sh.ncols):
      cell = sh.cell(rowx,colx).value
      rowx += 1


  print 'print_cell:', cell
  sheet.write(row,2,cell)
  row += 1

wbk.save('test_output.xls')

Does anyone knows how to do that and what can I improve?
I know working with def() is better, but I havn't masterd that skill just jet.

sinnebril 0 Newbie Poster · Answer 4 · 2012-03-26T08:53:46+00:00

My biggest trouble is now, how can I get the python script to read the next column?
The loop just stops when it is at the end of the column, but it needs to go to the next.

Any suggestions? Thanks!

sinnebril 0 Newbie Poster · Answer 5 · 2012-03-27T11:31:54+00:00

Hi there!

I got my script finally working, see here below. But it is a bit big and 'quick and dirty'.
Any suggestions on how to make the code more compact and easier to read for outsiders?

import xlrd
import xlwt

# inputfile:
wb = xlrd.open_workbook('Test_input.xls') #--- adjust the input file!!!---#

#Get the first sheet either by index or by name
sh = wb.sheet_by_index(0)
sh = wb.sheet_by_name(u'Sheet1')

print "Number of rows: %s   Number of cols: %s" % (sh.nrows, sh.ncols)

# Create a output workbook and worksheet
wbk = xlwt.Workbook()
sheet = wbk.add_sheet('test') #--- adjust the sheet!!!---#

#Check the sheet names
wb.sheet_names()

#Algorithm for reading en writing from file to file per row, column or cell:

#Index individual cells:
rowx = 1
colx = 0
row = 0  # row counter for new Excel sheet
counter_row = 1 # while counter

print 'Printing rows of Excel sheet:'
sheet.write(row,0,'Rows') # writes heater in new Excel sheet

while counter_row < sh.nrows:
  row_cell = sh.cell(rowx,colx).value
  print_row = row_cell
  rowx += 1
  print 'print_row:', rowx, colx, print_row
  row += 1
  sheet.write(row,0,print_row)

  if rowx == sh.nrows:
    rowx = 1
    counter_row += 1

#Index individual cells:
rowx = 0
colx = 1
row = 0  # row counter for new Excel sheet
counter_col = 1 # while counter

print 'Printing columns of Excel sheet:'
sheet.write(row,1,'Column') # writes heater in new Excel sheet

while colx < sh.ncols:
  col_cell = sh.cell(rowx,colx).value
  print 'print_col:', rowx, colx, col_cell

  row += 1
  sheet.write(row,1,col_cell)
  counter_col += 1

  if counter_col == sh.ncols:
    colx += 1
    counter_col = 1

#Index individual cells 
rowx = 1
colx = 1
row = 0  # row counter for new Excel sheet
counter_cell = 2
total = sh.nrows + 1

print 'Printing specific cells of Excel sheet:'
sheet.write(row,2,'Identical score') # writes heater in new Excel sheet

while counter_cell < total:
  cell = sh.cell(rowx,colx).value
  rowx += 1
  print 'print_cell:', rowx, colx, cell
  row += 1
  sheet.write(row,2,cell)

  if rowx == sh.nrows:
    rowx = 1
    colx = counter_cell
    counter_cell += 1

wbk.save('reformatted.data.xls')

sinnebril 0 Newbie Poster · Answer 6 · 2012-03-29T09:06:27+00:00

At least you should remove line 8 or 9 as they overwrite each other.

Otherwise we would need sample excel data to see what happens (by attaching from Files from menu stripe)

Thanks pyTony, You are right about line 9, i will remove it.

Is it true that it is not posible to uplode '.xls' files? Also '.csv' files can't be uploded...?

sinnebril 0 Newbie Poster · Answer 7 · 2012-03-29T13:19:16+00:00

Zip is also not working...
(.rar files)
But you could copy/past my first post into a excel file...

sinnebril 0 Newbie Poster · Answer 8 · 2012-04-05T12:53:11+00:00

sinnebril 0 Newbie Poster

12 Years Ago

Sorry my bad, you are totaly right of course!
I always use .rar, but .zip is better.

Here is finaly the file: (test_input.xls)

This attachment is potentially unsafe to open. It may be an executable that is capable of making changes to your file system, or it may require specific software to open. Use caution and only open this attachment if you are comfortable working with zip files.

Test_input.zip (6.8 KB)

pk87 0 Newbie Poster · Answer 9 · 2012-05-09T12:34:43+00:00

@sinnebril
Simmilar to ur script, i need to read from one exel workbook and write to another. suppose the workbook have 10 columns and i only want to read 3 out them, also the position of those columns keep on changing with the udtaed data sheet so is there a way i can read the columns by name rather than number.
Also i tried your script as it is i gave me an error -

" Traceback (most recent call last):
  File "C:/Documents and Settings/pyadav/Desktop/try_python/New Folder/readXL4444.py", line 57, in <module>
    cell = sh.cell(rowx,colx).value
  File "C:\Program Files\Common Files\dSPACE\Python25\Lib\site-packages\xlrd\sheet.py", line 245, in cell
    self._cell_types[rowx][colx],
IndexError: array index out of range"

I am new to python, could you please help me out a bit.

sinnebril 0 Newbie Poster · Answer 10 · 2012-05-10T07:05:24+00:00

@ pk87,
Can you give me the exact file and script that you want to run?
You have to adjust the script, because I read all the collums and rows in a specific order. I use sh.ncols to determen how many collumns I have, but you have to refer to specific rows and collumns.

Did you already instal the xlrd and xlwt modules?

pk87 0 Newbie Poster · Answer 11 · 2012-05-10T12:07:25+00:00

@sinnebril
Thanks for responding. I managed to read columns by name and place them in seperate sheet as intented. Is there a way that i can write the data to a specific sheet of already existing exel file. I want to overwrite that sheet every time i run the script. Thanks

import xlrd
import xlwt
wb = xlrd.open_workbook('DiagnosticDataList_L494_BB90574_CSW03_Prerelease_Updated format.xls')
sh = wb.sheet_by_name(u'SystemReleaseTable expanded')
#print "Number of rows: %s   Number of cols: %s" % (sh.nrows, sh.ncols)


wbook = xlwt.Workbook()
sheet = wbook.add_sheet('test') 
wb.sheet_names()


rowx = 3
colx = 0
col  = 1
row = 3
for count_col in range(0,sh.ncols):
    row_cell = sh.cell(rowx,colx).value
    read_name = row_cell
    colx += 1
    if read_name == 'Group Name':
        found_col = count_col
        sheet.write(row,col,read_name)
        break
rowx = 4
colx = found_col
row = 4
for count_row in range(rowx,sh.nrows):
  row_cell = sh.cell(rowx,colx).value
  print_row = row_cell
  rowx += 1
  row += 1
  sheet.write(row,col,print_row)






rowx = 3
colx = 0
col  = 2
row = 3
for count_col in range(0,sh.ncols):
    row_cell = sh.cell(rowx,colx).value
    read_name = row_cell
    colx += 1
    if read_name == 'EBD':
        found_col = count_col
        sheet.write(row,col,read_name)
        break
rowx = 4
colx = found_col
row = 4
for count_row in range(rowx,sh.nrows):
  row_cell = sh.cell(rowx,colx).value
  rowx += 1
  row += 1
  if row_cell == 'Off':
      print_row = 1
  else :
      print_row = 0
  sheet.write(row,col,print_row)



rowx = 3
colx = 0
col  = 3
row = 3
for count_col in range(0,sh.ncols):
    row_cell = sh.cell(rowx,colx).value
    read_name = row_cell
    colx += 1
    if read_name == 'Abs':
        found_col = count_col
        sheet.write(row,col,read_name)
        break
rowx = 4
colx = found_col
row = 4
for count_row in range(rowx,sh.nrows):
  row_cell = sh.cell(rowx,colx).value
  rowx += 1
  row += 1
  if row_cell == 'On':
      print_row = 1
  else :
      print_row = 0
  sheet.write(row,col,print_row)





rowx = 3
colx = 0
col  = 4
row = 3
for count_col in range(0,sh.ncols):
    row_cell = sh.cell(rowx,colx).value
    read_name = row_cell
    colx += 1
    if read_name == 'VDC_Controller':
        found_col = count_col
        sheet.write(row,col,read_name)
        break
rowx = 4
colx = found_col
row = 4
for count_row in range(rowx,sh.nrows):
  row_cell = sh.cell(rowx,colx).value
  rowx += 1
  row += 1
  if row_cell == 'On':
      print_row = 0
  elif row_cell == 'Off' :
      print_row = 1
  else:
      print_row = 0
  sheet.write(row,col,print_row)  




rowx = 3
colx = 0
col  = 5
row = 3
for count_col in range(0,sh.ncols):
    row_cell = sh.cell(rowx,colx).value
    read_name = row_cell
    colx += 1
    if read_name == 'HDC':
        found_col = count_col
        sheet.write(row,col,read_name)
        break
rowx = 4
colx = found_col
row = 4
for count_row in range(rowx,sh.nrows):
  row_cell = sh.cell(rowx,colx).value
  rowx += 1
  row += 1
  if row_cell == 'On':
      print_row = 0
  elif row_cell == 'Off' :
      print_row = 1
  elif row_cell == 'RampOff' :
      print_row = 1    
  else:
      print_row = 0
  sheet.write(row,col,print_row)




rowx = 3
colx = 0
col  = 6
row = 3
for count_col in range(0,sh.ncols):
    row_cell = sh.cell(rowx,colx).value
    read_name = row_cell
    colx += 1
    if read_name == 'HBA':
        found_col = count_col
        sheet.write(row,col,read_name)
        break
rowx = 4
colx = found_col
row = 4
for count_row in range(rowx,sh.nrows):
  row_cell = sh.cell(rowx,colx).value
  rowx += 1
  row += 1
  if row_cell == 'On':
      print_row = 0
  elif row_cell == 'Off' :
      print_row = 1
  elif row_cell == 'RampOff' :
      print_row = 1    
  else:
      print_row = 0
  sheet.write(row,col,print_row)





rowx = 3
colx = 0
col  = 7
row = 3
for count_col in range(0,sh.ncols):
    row_cell = sh.cell(rowx,colx).value
    read_name = row_cell
    colx += 1
    if read_name == 'RSC':
        found_col = count_col
        sheet.write(row,col,read_name)
        break
rowx = 4
colx = found_col
row = 4
for count_row in range(rowx,sh.nrows):
  row_cell = sh.cell(rowx,colx).value
  rowx += 1
  row += 1
  if row_cell == 'On':
      print_row = 0
  elif row_cell == 'Off' :
      print_row = 1
  elif row_cell == 'RampOff' :
      print_row = 1    
  else:
      print_row = 0
  sheet.write(row,col,print_row)




wbook.save('reformatted.data.xls')

pk87 0 Newbie Poster · Answer 12 · 2012-05-10T12:10:06+00:00

@sinnebril
your script helped a lot here is the excel file.