Background:

I'm extracting values from a file which is sometimes an xls and sometimes an xlsx file. An xls is easily read with xlrd, but xlrd nor any other Python library (as far as I could find) supports xlsx, so instead I'm using xlsx2csv to convert to csv and then reading values from that.

I'm new to Python and only have beginner's knowledge of other scripting languages.

Problem:

I'd like to select values from a specific column of the csv file (and fill a list with them). This is where I'm getting confused. When I convert the CSV with

infile = path + "\\" + filename
xlsx2csv(infile, open("data.csv", "w+"))

and open it manually in Excel, I can see the values I want in column 5 at row 23 and onwards (with columns up to 5 and rows up to 23 containing values I do not want).

But when I open the csv file within Python with

values = csv.reader(open('data.csv', 'rb'), delimiter=' ')

I'm getting a list of lists. Printing the values with

for row in values:
    print row

gives the following (edited) result:

I feel like I've already gone wrong at this point. Or should I go with this and then split up the lists and get the last value from each separate list except the empty ones? I hope someone can help me to an elegant solution.

Could you post beginning of the original csv file or attach it from advanced view, manage attachments? I would guess that delimeter is ','

Could you post beginning of the original csv file or attach it from advanced view, manage attachments? I would guess that delimeter is ','

Thank you! I've changed the delimiter to ',' and the values now stay intact. I've also flagellated myself for such a silly oversight (I now realise that the words in sentences were getting broken up because they were separated by spaces and I used a space as a delimiter), but this doesn't seem to have made me any smarter so far.

Let me try and select the values again and report back with how I failed or succeeded.

Thank you! I've changed the delimiter to ',' and the values now stay intact. I've also flagellated myself for such a silly oversight (I now realise that the words in sentences were getting broken up because they were separated by spaces and I used a space as a delimiter), but this doesn't seem to have made me any smarter so far.

Let me try and select the values again and report back with how I failed or succeeded.

When you open a file with the CSV module, you're given a 'list', which is indexed. If you want the data in a spcific column to be put into another list, just scroll through it. If the data you need is in the 5th column (zero based), use that index. i.e.

MyValues = [] #create an empty list
values = csv.reader(open('data.csv', 'rb'), delimiter=' ')
for row in values:
  MyValues.append(row[5])

Thats pretty much it.

Thank you, that does it. (With row[4] instead of row[5], since as you pointed out it is zero based.)

There is one problem that pops up. I found that I was only getting values up to a certain row, then it failed with "IndexError: list index out of range". Looking at the original list I noticed that the breaking point was at a list with 4 items instead of 5 (so it can't reach row[4]). I don't even need that list, so I would have to start the for loop at a particular list (23).

You can put one list comprehension or normal for before your for reading out the unneeded rows, or from itertools import islice and list(islice(values,22) Alternatively you can try this line or variation of it for line 4:

MyValues.append(row[4] if len(row)>4 else None)

Edited 6 Years Ago by pyTony: n/a

You can put one list comprehension or normal for before your for reading out the unneeded rows, or from itertools import islice and list(islice(values,22) Alternatively you can try this line or variation of it for line 4:

MyValues.append(row[4] if len(row)>4 else None)

Brilliant. I went with the code snippet. All I had to do after that was put in another for loop to select values of a particular format and I had exactly what I wanted!

This question has already been answered. Start a new discussion instead.