How to parse in tricky .csv file content?

Question

shigehiro 0 Newbie Poster

17 Years Ago

Hi all,

I would need your expertise/advice on the problem I encounter right now when I tried to parse in the contents of .csv file.

Here is the scenario:
1) I have csv file with the possible entries as follow:
ProjCat,RefNum,ProjTitle,MemberName,ProjDeadline,ProjGrade --> Header
I,0001,"Medical Research in XXX Field,2007","Gary,Susan",20.05.07,80
R,0023,Grid Computing in today era,"Henry Williams,Tulali Mark",04-May-07,--NA--
MP,0100,"Thinking in Logical Way, How to do it?","Williams,Harly Dimitry",10.02.07,NA
1,0114,"Computational Research for Biological Science, How to?",Alalaa,15-Mar-06,

2) I have to parse in the contents of this file to preferably, list of dictionaries.
So, the expected output list would be something like this:

outputList = [{'projCat':I,'RefNum':0001,'ProjTitle':"Medical Research in XXX Field,2007",'MemberName':"Gary,Susan",'ProjDeadline':20.05.07,ProjGrade:80},
              {'projCat':R,'RefNum':0023,'ProjTitle':Grid Computing in today era,'MemberName':"Henry Williams,Tulali Mark",'ProjDeadline':04-May-07,ProjGrade:--NA--},
              {'projCat':MP,'RefNum':0100,'ProjTitle':"Thinking in Logical Way, How to do it?",'MemberName':"Williams,Harly Dimitry",'ProjDeadline':10.02.07,ProjGrade:NA},
              {'projCat':1,'RefNum':0114,'ProjTitle':"Computational Research for Biological Science, How to approach it?",'MemberName':Alalaa,'ProjDeadline':15-Mar-06,ProjGrade:}
          ]

3) Now, I have a problem when it comes to reading a line level of the file as the CSV file may consist of string data that can contain commas (such as, 'ProjTile' & 'MemberName' field)
What currently I have in hand right now is strings of line.
If I just use 'split' method of str, it will give me a misleading result, for e.g. "Medical Research in XXX Field,2007" will be splitted into ['Medical Research in XXX Field', '2007'] which is not what I want
Is there any other ways that I can split the fields correctly? using regular expression? any good approach for solving this?

4) Is it possible that the value of certain key in dictionary is left empty (as in the value for 'ProjGrad' key of the last entry of the above outputList)?

Any suggestions would be welcomed.

Thanks in advance

Shige

data-science file-system python

Edited 12 Years Ago by Reverend Jim because: Fixed formatting

3 Contributors
10 Replies
260 Views
1 Day Discussion Span
Latest Post 17 Years Ago Latest Post by a1eio

All 10 Replies

a1eio 16 Junior Poster

17 Years Ago

well as far as i can see, whatever resides within self[filename] is clearly not a valid filepath.

There was a sort of hint to what it might contain in the second error.

[IOError] of [Errno 36] File name too long: 'ProjCat,RefNum,ProjTitle,MemberName,ProjDeadline,ProjGrade\nI,0001,"Medical Research in XXX Field,2007","Gary,Susan",20.05.07,80\nR,0023,Grid Computing in today era,"Henry Williams,Tulali Mark",04-May-07,--NA--........'

ProjCat,RefNum,ProjTitle,.... etc etc, that is not a filepath.

the open function needs a filepath string like 'textfile.txt' and you appear to be passing a very very long string of words, comma's etc.

You should read the documentation for the csv module, and i think Zope is confusing things. are you trying to parse a file thats stored somewhere (has a filename .csv etc)? or are you reading the csv data from zope. If it's from zope then i don't think the open() command is what you want.
Open opens a file and returns a fileobject, which csv.reader() then parses, if self[filename] isn't a file then it will not work.

bvdet 75 Junior Poster

17 Years Ago

Try this:

csv.reader(str(self[filename]).split('\n')

The argument to csv.reader can be any iterable object that produces a string each time its next() method is called.

vegaseat commented: Nice solution +8

a1eio 16 Junior Poster

17 Years Ago

You don't need to get a fileobject though. As solsteel pointed out, the csv module doesn't need a fileobject, it just needs something to iterate through, so if you split the self[filename] string by every newline ('\n') then you will end up with a list of lines which the csv reader module can parse.

Solsteel's example looks perfect.

csv.reader(str(self[filename]).split('\n'))

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

bvdet 75 Junior Poster · Answer 1 · 2008-05-06T23:13:21+00:00

The csv module is ideal for your parsing your data.

import csv

fn = 'data.csv'
f = open(fn)
reader = csv.reader(f)
headerList = reader.next()
outputList = []
for line in reader:
    # test for True in case there is a blank line
    if line:
        dd = {}
        for i, key in enumerate(headerList):
            dd[key]=line[i]
        outputList.append(dd)

f.close()

shigehiro 0 Newbie Poster · Answer 2 · 2008-05-07T14:06:23+00:00

The csv module is ideal for your parsing your data.

import csv

fn = 'data.csv'
f = open(fn)
reader = csv.reader(f)
headerList = reader.next()
outputList = []
for line in reader:
    # test for True in case there is a blank line
    if line:
        dd = {}
        for i, key in enumerate(headerList):
            dd[key]=line[i]
        outputList.append(dd)

f.close()

Hi Solsteel,
Thank you for your reply.

So I headed up to your suggestion and tried out the simpler scenario by putting the each line as entry in dictionary.
Here are my tries:
First attempt":

import csv
  outputDict = {}

  inputFile = open(self[filename],'r')
  fileReader = csv.reader(inputFile)
  
  keyIndex = 0
  for line in fileReader:
    outputDict[keyIndex] = line
    
    keyIndex+=1
  
  inputFile.close()
  
  return outputDict

The above gave me an error of "coercing to Unicode: need string or buffer, ImplicitAcquirerWrapper found", pointing to line No. 4 (inputFile = open(self[filename],'r'))

So.. my second attempt was trying to cast the self[filename] into 'str' type... with slight mod as below:

import csv
  outputDict = {}

  inputFile = open(str(self[filename]),'r')
  fileReader = csv.reader(inputFile)
  
  keyIndex = 0
  for line in fileReader:
    outputDict[keyIndex] = line
    
    keyIndex+=1
  
  inputFile.close()
  
  return outputDict

Now it gave me an [IOError] of [Errno 36] File name too long: 'ProjCat,RefNum,ProjTitle,MemberName,ProjDeadline,ProjGrade\nI,0001,"Medical Research in XXX Field,2007","Gary,Susan",20.05.07,80\nR,0023,Grid Computing in today era,"Henry Williams,Tulali Mark",04-May-07,--NA--........'

1) How can I workaround this problem?
Was I right to attempt my 2nd approach as above?

2) And also, my file might content latin words such as 'México', 'Sã Joã' . How can I include encoding in parsing the file so that these words can be rendered correctly?

Thanks again.!

bvdet 75 Junior Poster · Answer 3 · 2008-05-07T18:10:36+00:00

self[filename] does not appear to be a valid file name. Following is an example of a valid file name:

r'H:\Zip_Files\618 Johnston\IFA040308\24747-IFA040308.zip'

shigehiro 0 Newbie Poster · Answer 4 · 2008-05-07T18:15:39+00:00

self[filename] does not appear to be a valid file name. Following is an example of a valid file name:
r'H:\Zip_Files\618 Johnston\IFA040308\24747-IFA040308.zip'

Aha... pardon me for not saying this earlier, actually I am developing Python in Zope
So the csv file is stored in ZopeDB...
If I am only using f = open(filename-path), it will prompt me that it can't find a filename.
As such, I have to retrieve the file object by using 'self[filename]', instead of specifying the exact path to file..

Shige

shigehiro 0 Newbie Poster · Answer 5 · 2008-05-07T19:10:54+00:00

well as far as i can see, whatever resides within self[filename] is clearly not a valid filepath.
There was a sort of hint to what it might contain in the second error.
ProjCat,RefNum,ProjTitle,.... etc etc, that is not a filepath.
the open function needs a filepath string like 'textfile.txt' and you appear to be passing a very very long string of words, comma's etc.
You should read the documentation for the csv module, and i think Zope is confusing things. are you trying to parse a file thats stored somewhere (has a filename .csv etc)? or are you reading the csv data from zope. If it's from zope then i don't think the open() command is what you want.
Open opens a file and returns a fileobject, which csv.reader() then parses, if self[filename] isn't a file then it will not work.

Hmm.. it seems that you are right.
I will have to refer to Zope documentation of how to retrieve file object correctly. Will do that now...

:)Thanks .

shigehiro 0 Newbie Poster · Answer 6 · 2008-05-07T19:30:07+00:00

Cool!
it works now, I just have to pass in the iterable object as solsteel pointed out!

:D

a1eio 16 Junior Poster · Answer 7 · 2008-05-07T19:31:40+00:00

a1eio 16 Junior Poster

17 Years Ago

excellent, happy coding

How to parse in tricky .csv file content?

Recommended Answers Collapse Answers

All 10 Replies

Recommended Answers