Hi there,

My problem involves reading in and working with an excel file that contains blank cells randomly throughout the spreadsheet. The reason there are blank cells is because I have expression data for various cell types but for some of the cell types the expression data was not available. My program works great (Finding standard deviations, averages etc.) when I fill the empty cells with 0, but when I leave them blank I get the following error:

   Resistance.append(map(float, row[4:]))
ValueError: could not convert string to float: 

Basically when I try to append the data to a new array and convert it to floats I get the error because of the blank cells. If I don't do the float function, I get errors everywhere in the programd because I can't do the Stdev, for example I will have the following error:

withinA1 = numpy.std(a1)
  File "C:\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 2433, in std
    return _wrapit(a, 'std', axis, dtype, out, ddof)
  File "C:\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 37, in _wrapit
    result = getattr(asarray(obj),method)(*args, **kwds)
TypeError: cannot perform reduce with flexible type

How can I read through and tell python that the empty cells should just be skipped over? I've tried:

for label in Reader:
    while i < len(label):
        # if either the item in col1 or col2 is empty remove both of them
        if label[i] == '':
            del label[i]
        # otherwise, increment the index
        else:
            i += 1
    if len(label) == 0 or label[j] == '':
        continue
    else:
        Resistance.append(map(float, label[4:]))
        i += 1

and:

for row in Reader:
    if len(row) == 0 or row[i] == '':
        i+=1
        continue
    else:
        Resistance.append(map(float, row[4:]))
        i+=1

To no avail. Any help would be greatly appreciated, thank you.

Recommended Answers

All 8 Replies

You could insert 0.0 when there is no data

Resistance.append(tuple((float(x) if x.strip() else 0.0) for x in row[4:]))

That doesn't work because adding 0 changes the standard deviation

Well, here is something which should always work

def to_float(sequence):
    """Generate a sequence of floats by converting every item in
    a given sequence to float, and ignoring failing conversions"""
    for x in sequence:
        try:
            yield float(x)
        except ValueError:
            pass

Now if you want to append all the values in columns 4, 5, ... for all rows,
you can write a generator

def sheet_values(reader):
    """Generate the cell values that we want to convert and append to Resistance"""
    for row in reader:
        for x in row[4:]:
            yield x

Finally, here is how to append all the values

Resistance.append(list(to_float(sheet_values(Reader)))

Perhaps you could describe what is this variable Resistance. Is it a list, a list of lists ? What
is your expected content for Resistance ?

Thank you for your help.

The code does work indeed, the only problem is that now it is one large array, and I need to be able to distinguish the original rows from one another so that I may calculate standard deviations as I see fit. Sorry if that was not clear

Yes Resistance is a list of lists. My raw data is in the form of an excel spreadsheet. I have A1, A2, B1..B6, C1-C15, etc... My goal is to calculate a running standard deviation of each column (index) from A1 to A2, so STDEV OF A1[0]A2[0], A1[1]A2[1] etc. for each of the subclasses. I will then find the average standard deviation within each subclass

The only thing you need to do is determine which lists you want to append to Resistance. For example if you want to append the values from a column numbered k, you can write

def column_values(rows, k):
    for row in rows:
        yield row[k]

rows = list(Reader) # convert to list in case Reader is a generator.
Resistance.append(list(to_float(column_values(rows, 0))))

Great, I guess I'm more of a beginner than I thought.

I am now getting this error:

ValueError: setting an array element with a sequence.

and I believe it is because since we took out some zeros, the arrays are of different sizes and the stdev indexing gets thrown off. Is there a way to tell python to completely ignore an index position rather than no longer assuming it's not there?

So, having something like:

[[[-1.0, -1.38, na, 0.24, 0.01, 0.47, -0.38, -0.64, -0.47, -0.71, 0.79, 1.61, 0.88, -0.99, -0.46, -0.23, -0.23, -1.0, -0.01, -0.52, -0.58, -1.17, 1.3, -1.08, -0.53, -0.75, 0.8, -0.23, -0.19, -0.83, -0.43, -0.07, 1.84, -0.4, -0.54, -0.4, 1.82, -0.61, 0.17]], [[-0.89, -0.59, -0.27, -0.63, 0.14, 1.35, -0.28, -0.51, -0.36, -0.52, 0.1, 1.58, -0.78, -0.86, -0.92, -0.41, -0.22, -1.0, -0.24, -0.2, -0.15, -0.73, 0.05, -0.68, -0.52, -0.27, 1.36, -0.44, -0.06, -0.78, -0.02, -0.22, 2.04, -0.16, -0.45, -0.34, 1.33, -0.71, 1.14]]]

I put an na up there as an example just so that the boxes are the same dimension

You should have a look in numpy masked arrays. I can not help you much here, because I never tried it.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.