Hi there,

My problem involves reading in and working with an excel file that contains blank cells randomly throughout the spreadsheet. The reason there are blank cells is because I have expression data for various cell types but for some of the cell types the expression data was not available. My program works great (Finding standard deviations, averages etc.) when I fill the empty cells with 0, but when I leave them blank I get the following error:

   Resistance.append(map(float, row[4:]))
ValueError: could not convert string to float:


Basically when I try to append the data to a new array and convert it to floats I get the error because of the blank cells. If I don't do the float function, I get errors everywhere in the programd because I can't do the Stdev, for example I will have the following error:

withinA1 = numpy.std(a1)
File "C:\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 2433, in std
return _wrapit(a, 'std', axis, dtype, out, ddof)
File "C:\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 37, in _wrapit
result = getattr(asarray(obj),method)(*args, **kwds)
TypeError: cannot perform reduce with flexible type


How can I read through and tell python that the empty cells should just be skipped over? I've tried:

for label in Reader:
while i < len(label):
# if either the item in col1 or col2 is empty remove both of them
if label[i] == '':
del label[i]
# otherwise, increment the index
else:
i += 1
if len(label) == 0 or label[j] == '':
continue
else:
Resistance.append(map(float, label[4:]))
i += 1


and:

for row in Reader:
if len(row) == 0 or row[i] == '':
i+=1
continue
else:
Resistance.append(map(float, row[4:]))
i+=1


To no avail. Any help would be greatly appreciated, thank you.

2
Contributors
8
Replies
14
Views
4 Years
Discussion Span
Last Post by Gribouillis

You could insert 0.0 when there is no data

Resistance.append(tuple((float(x) if x.strip() else 0.0) for x in row[4:]))


Edited by Gribouillis

That doesn't work because adding 0 changes the standard deviation

Well, here is something which should always work

def to_float(sequence):
"""Generate a sequence of floats by converting every item in
a given sequence to float, and ignoring failing conversions"""
for x in sequence:
try:
yield float(x)
except ValueError:
pass


Now if you want to append all the values in columns 4, 5, ... for all rows,
you can write a generator

def sheet_values(reader):
"""Generate the cell values that we want to convert and append to Resistance"""
for row in reader:
for x in row[4:]:
yield x


Finally, here is how to append all the values

Resistance.append(list(to_float(sheet_values(Reader)))


Perhaps you could describe what is this variable Resistance. Is it a list, a list of lists ? What
is your expected content for Resistance ?

Edited by Gribouillis

Thank you for your help.

The code does work indeed, the only problem is that now it is one large array, and I need to be able to distinguish the original rows from one another so that I may calculate standard deviations as I see fit. Sorry if that was not clear

Yes Resistance is a list of lists. My raw data is in the form of an excel spreadsheet. I have A1, A2, B1..B6, C1-C15, etc... My goal is to calculate a running standard deviation of each column (index) from A1 to A2, so STDEV OF A1[0]A2[0], A1[1]A2[1] etc. for each of the subclasses. I will then find the average standard deviation within each subclass

The only thing you need to do is determine which lists you want to append to Resistance. For example if you want to append the values from a column numbered k, you can write

def column_values(rows, k):
for row in rows:
yield row[k]

rows = list(Reader) # convert to list in case Reader is a generator.
Resistance.append(list(to_float(column_values(rows, 0))))


Edited by Gribouillis

Great, I guess I'm more of a beginner than I thought.

I am now getting this error:

ValueError: setting an array element with a sequence.

and I believe it is because since we took out some zeros, the arrays are of different sizes and the stdev indexing gets thrown off. Is there a way to tell python to completely ignore an index position rather than no longer assuming it's not there?

So, having something like:

[[[-1.0, -1.38, na, 0.24, 0.01, 0.47, -0.38, -0.64, -0.47, -0.71, 0.79, 1.61, 0.88, -0.99, -0.46, -0.23, -0.23, -1.0, -0.01, -0.52, -0.58, -1.17, 1.3, -1.08, -0.53, -0.75, 0.8, -0.23, -0.19, -0.83, -0.43, -0.07, 1.84, -0.4, -0.54, -0.4, 1.82, -0.61, 0.17]], [[-0.89, -0.59, -0.27, -0.63, 0.14, 1.35, -0.28, -0.51, -0.36, -0.52, 0.1, 1.58, -0.78, -0.86, -0.92, -0.41, -0.22, -1.0, -0.24, -0.2, -0.15, -0.73, 0.05, -0.68, -0.52, -0.27, 1.36, -0.44, -0.06, -0.78, -0.02, -0.22, 2.04, -0.16, -0.45, -0.34, 1.33, -0.71, 1.14]]]

I put an na up there as an example just so that the boxes are the same dimension

You should have a look in numpy masked arrays. I can not help you much here, because I never tried it.

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.