1,105,344 Community Members

Python CSV Blank Cells Read as strings

Member Avatar
pbnoj
Newbie Poster
6 posts since Dec 2012
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Hi there,

My problem involves reading in and working with an excel file that contains blank cells randomly throughout the spreadsheet. The reason there are blank cells is because I have expression data for various cell types but for some of the cell types the expression data was not available. My program works great (Finding standard deviations, averages etc.) when I fill the empty cells with 0, but when I leave them blank I get the following error:

   Resistance.append(map(float, row[4:]))
ValueError: could not convert string to float: 

Basically when I try to append the data to a new array and convert it to floats I get the error because of the blank cells. If I don't do the float function, I get errors everywhere in the programd because I can't do the Stdev, for example I will have the following error:

withinA1 = numpy.std(a1)
  File "C:\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 2433, in std
    return _wrapit(a, 'std', axis, dtype, out, ddof)
  File "C:\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 37, in _wrapit
    result = getattr(asarray(obj),method)(*args, **kwds)
TypeError: cannot perform reduce with flexible type

How can I read through and tell python that the empty cells should just be skipped over? I've tried:

for label in Reader:
    while i < len(label):
        # if either the item in col1 or col2 is empty remove both of them
        if label[i] == '':
            del label[i]
        # otherwise, increment the index
        else:
            i += 1
    if len(label) == 0 or label[j] == '':
        continue
    else:
        Resistance.append(map(float, label[4:]))
        i += 1

and:

for row in Reader:
    if len(row) == 0 or row[i] == '':
        i+=1
        continue
    else:
        Resistance.append(map(float, row[4:]))
        i+=1

To no avail. Any help would be greatly appreciated, thank you.

Member Avatar
Gribouillis
Posting Maven
3,454 posts since Jul 2008
Reputation Points: 1,140 [?]
Q&As Helped to Solve: 884 [?]
Skill Endorsements: 18 [?]
Moderator
 
0
 

You could insert 0.0 when there is no data

Resistance.append(tuple((float(x) if x.strip() else 0.0) for x in row[4:]))
Member Avatar
pbnoj
Newbie Poster
6 posts since Dec 2012
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

That doesn't work because adding 0 changes the standard deviation

Member Avatar
Gribouillis
Posting Maven
3,454 posts since Jul 2008
Reputation Points: 1,140 [?]
Q&As Helped to Solve: 884 [?]
Skill Endorsements: 18 [?]
Moderator
 
1
 

Well, here is something which should always work

def to_float(sequence):
    """Generate a sequence of floats by converting every item in
    a given sequence to float, and ignoring failing conversions"""
    for x in sequence:
        try:
            yield float(x)
        except ValueError:
            pass

Now if you want to append all the values in columns 4, 5, ... for all rows,
you can write a generator

def sheet_values(reader):
    """Generate the cell values that we want to convert and append to Resistance"""
    for row in reader:
        for x in row[4:]:
            yield x

Finally, here is how to append all the values

Resistance.append(list(to_float(sheet_values(Reader)))

Perhaps you could describe what is this variable Resistance. Is it a list, a list of lists ? What
is your expected content for Resistance ?

Member Avatar
pbnoj
Newbie Poster
6 posts since Dec 2012
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Thank you for your help.

The code does work indeed, the only problem is that now it is one large array, and I need to be able to distinguish the original rows from one another so that I may calculate standard deviations as I see fit. Sorry if that was not clear

Member Avatar
pbnoj
Newbie Poster
6 posts since Dec 2012
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Yes Resistance is a list of lists. My raw data is in the form of an excel spreadsheet. I have A1, A2, B1..B6, C1-C15, etc... My goal is to calculate a running standard deviation of each column (index) from A1 to A2, so STDEV OF A1[0]A2[0], A1[1]A2[1] etc. for each of the subclasses. I will then find the average standard deviation within each subclass

Member Avatar
Gribouillis
Posting Maven
3,454 posts since Jul 2008
Reputation Points: 1,140 [?]
Q&As Helped to Solve: 884 [?]
Skill Endorsements: 18 [?]
Moderator
 
1
 

The only thing you need to do is determine which lists you want to append to Resistance. For example if you want to append the values from a column numbered k, you can write

def column_values(rows, k):
    for row in rows:
        yield row[k]

rows = list(Reader) # convert to list in case Reader is a generator.
Resistance.append(list(to_float(column_values(rows, 0))))
Member Avatar
pbnoj
Newbie Poster
6 posts since Dec 2012
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Great, I guess I'm more of a beginner than I thought.

I am now getting this error:

ValueError: setting an array element with a sequence.

and I believe it is because since we took out some zeros, the arrays are of different sizes and the stdev indexing gets thrown off. Is there a way to tell python to completely ignore an index position rather than no longer assuming it's not there?

So, having something like:

[[[-1.0, -1.38, na, 0.24, 0.01, 0.47, -0.38, -0.64, -0.47, -0.71, 0.79, 1.61, 0.88, -0.99, -0.46, -0.23, -0.23, -1.0, -0.01, -0.52, -0.58, -1.17, 1.3, -1.08, -0.53, -0.75, 0.8, -0.23, -0.19, -0.83, -0.43, -0.07, 1.84, -0.4, -0.54, -0.4, 1.82, -0.61, 0.17]], [[-0.89, -0.59, -0.27, -0.63, 0.14, 1.35, -0.28, -0.51, -0.36, -0.52, 0.1, 1.58, -0.78, -0.86, -0.92, -0.41, -0.22, -1.0, -0.24, -0.2, -0.15, -0.73, 0.05, -0.68, -0.52, -0.27, 1.36, -0.44, -0.06, -0.78, -0.02, -0.22, 2.04, -0.16, -0.45, -0.34, 1.33, -0.71, 1.14]]]

I put an na up there as an example just so that the boxes are the same dimension

Member Avatar
Gribouillis
Posting Maven
3,454 posts since Jul 2008
Reputation Points: 1,140 [?]
Q&As Helped to Solve: 884 [?]
Skill Endorsements: 18 [?]
Moderator
 
1
 

You should have a look in numpy masked arrays. I can not help you much here, because I never tried it.

You
This article has been dead for over three months: Start a new discussion instead
Post:
Start New Discussion
Tags Related to this Article