How to parse in tricky .csv file content?

Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
Thread Solved

Join Date: Mar 2008
Posts: 5
Reputation: shigehiro is an unknown quantity at this point 
Solved Threads: 0
shigehiro shigehiro is offline Offline
Newbie Poster

How to parse in tricky .csv file content?

 
0
  #1
May 6th, 2008
Hi all,

I would need your expertise/advice on the problem I encounter right now when I tried to parse in the contents of .csv file.

Here is the scenario:
1) I have csv file with the possible entries as follow:
ProjCat,RefNum,ProjTitle,MemberName,ProjDeadline,ProjGrade --> Header
I,0001,"Medical Research in XXX Field,2007","Gary,Susan",20.05.07,80
R,0023,Grid Computing in today era,"Henry Williams,Tulali Mark",04-May-07,--NA--
MP,0100,"Thinking in Logical Way, How to do it?","Williams,Harly Dimitry",10.02.07,NA
1,0114,"Computational Research for Biological Science, How to?",Alalaa,15-Mar-06,

2) I have to parse in the contents of this file to preferably, list of dictionaries.
So, the expected output list would be something like this:
outputList = [{'projCat':I,'RefNum':0001,'ProjTitle':"Medical Research in XXX Field,2007",'MemberName':"Gary,Susan",'ProjDeadline':20.05.07,ProjGrade:80},
{'projCat':R,'RefNum':0023,'ProjTitle':Grid Computing in today era,'MemberName':"Henry Williams,Tulali Mark",'ProjDeadline':04-May-07,ProjGrade:--NA--},
{'projCat':MP,'RefNum':0100,'ProjTitle':"Thinking in Logical Way, How to do it?",'MemberName':"Williams,Harly Dimitry",'ProjDeadline':10.02.07,ProjGrade:NA},
{'projCat':1,'RefNum':0114,'ProjTitle':"Computational Research for Biological Science, How to approach it?",'MemberName':Alalaa,'ProjDeadline':15-Mar-06,ProjGrade:}
]

3) Now, I have a problem when it comes to reading a line level of the file as the CSV file may consist of string data that can contain commas (such as, 'ProjTile' & 'MemberName' field)
What currently I have in hand right now is strings of line.
If I just use 'split' method of str, it will give me a misleading result, for e.g. "Medical Research in XXX Field,2007" will be splitted into ['Medical Research in XXX Field', '2007'] which is not what I want
Is there any other ways that I can split the fields correctly? using regular expression? any good approach for solving this?

4) Is it possible that the value of certain key in dictionary is left empty (as in the value for 'ProjGrad' key of the last entry of the above outputList)?

Any suggestions would be welcomed.

Thanks in advance

Shige
Reply With Quote Quick reply to this message  
Join Date: Mar 2007
Posts: 110
Reputation: solsteel is on a distinguished road 
Solved Threads: 31
solsteel solsteel is offline Offline
Junior Poster

Re: How to parse in tricky .csv file content?

 
0
  #2
May 6th, 2008
The csv module is ideal for your parsing your data.
  1. import csv
  2.  
  3. fn = 'data.csv'
  4. f = open(fn)
  5. reader = csv.reader(f)
  6. headerList = reader.next()
  7. outputList = []
  8. for line in reader:
  9. # test for True in case there is a blank line
  10. if line:
  11. dd = {}
  12. for i, key in enumerate(headerList):
  13. dd[key]=line[i]
  14. outputList.append(dd)
  15.  
  16. f.close()
Reply With Quote Quick reply to this message  
Join Date: Mar 2008
Posts: 5
Reputation: shigehiro is an unknown quantity at this point 
Solved Threads: 0
shigehiro shigehiro is offline Offline
Newbie Poster

Re: How to parse in tricky .csv file content?

 
0
  #3
May 7th, 2008
Originally Posted by solsteel View Post
The csv module is ideal for your parsing your data.
  1. import csv
  2.  
  3. fn = 'data.csv'
  4. f = open(fn)
  5. reader = csv.reader(f)
  6. headerList = reader.next()
  7. outputList = []
  8. for line in reader:
  9. # test for True in case there is a blank line
  10. if line:
  11. dd = {}
  12. for i, key in enumerate(headerList):
  13. dd[key]=line[i]
  14. outputList.append(dd)
  15.  
  16. f.close()
Hi Solsteel,
Thank you for your reply.

So I headed up to your suggestion and tried out the simpler scenario by putting the each line as entry in dictionary.
Here are my tries:
First attempt":
  import csv
  outputDict = {}

  inputFile = open(self[filename],'r')
  fileReader = csv.reader(inputFile)
  
  keyIndex = 0
  for line in fileReader:
    outputDict[keyIndex] = line
    
    keyIndex+=1
  
  inputFile.close()
  
  return outputDict
The above gave me an error of "coercing to Unicode: need string or buffer, ImplicitAcquirerWrapper found", pointing to line No. 4 (inputFile = open(self[filename],'r'))

So.. my second attempt was trying to cast the self[filename] into 'str' type... with slight mod as below:
  import csv
  outputDict = {}

  inputFile = open(str(self[filename]),'r')
  fileReader = csv.reader(inputFile)
  
  keyIndex = 0
  for line in fileReader:
    outputDict[keyIndex] = line
    
    keyIndex+=1
  
  inputFile.close()
  
  return outputDict
Now it gave me an [IOError] of [Errno 36] File name too long: 'ProjCat,RefNum,ProjTitle,MemberName,ProjDeadline,ProjGrade\nI,0001,"Medical Research in XXX Field,2007","Gary,Susan",20.05.07,80\nR,0023,Grid Computing in today era,"Henry Williams,Tulali Mark",04-May-07,--NA--........'

1) How can I workaround this problem?
Was I right to attempt my 2nd approach as above?

2) And also, my file might content latin words such as 'México', 'Sã Joã' . How can I include encoding in parsing the file so that these words can be rendered correctly?

Thanks again.!
Last edited by shigehiro; May 7th, 2008 at 5:11 am.
Reply With Quote Quick reply to this message  
Join Date: Mar 2007
Posts: 110
Reputation: solsteel is on a distinguished road 
Solved Threads: 31
solsteel solsteel is offline Offline
Junior Poster

Re: How to parse in tricky .csv file content?

 
0
  #4
May 7th, 2008
self[filename] does not appear to be a valid file name. Following is an example of a valid file name:
  1. r'H:\Zip_Files\618 Johnston\IFA040308\24747-IFA040308.zip'
Reply With Quote Quick reply to this message  
Join Date: Mar 2008
Posts: 5
Reputation: shigehiro is an unknown quantity at this point 
Solved Threads: 0
shigehiro shigehiro is offline Offline
Newbie Poster

Re: How to parse in tricky .csv file content?

 
0
  #5
May 7th, 2008
Originally Posted by solsteel View Post
self[filename] does not appear to be a valid file name. Following is an example of a valid file name:
  1. r'H:\Zip_Files\618 Johnston\IFA040308\24747-IFA040308.zip'
Aha... pardon me for not saying this earlier, actually I am developing Python in Zope
So the csv file is stored in ZopeDB...
If I am only using f = open(filename-path), it will prompt me that it can't find a filename.
As such, I have to retrieve the file object by using 'self[filename]', instead of specifying the exact path to file..

Shige
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 138
Reputation: a1eio is an unknown quantity at this point 
Solved Threads: 21
a1eio's Avatar
a1eio a1eio is offline Offline
Junior Poster

Re: How to parse in tricky .csv file content?

 
0
  #6
May 7th, 2008
well as far as i can see, whatever resides within self[filename] is clearly not a valid filepath.

There was a sort of hint to what it might contain in the second error.

[IOError] of [Errno 36] File name too long: 'ProjCat,RefNum,ProjTitle,MemberName,ProjDeadline,ProjGrade\nI,0001,"Medical Research in XXX Field,2007","Gary,Susan",20.05.07,80\nR,0023,Grid Computing in today era,"Henry Williams,Tulali Mark",04-May-07,--NA--........'
ProjCat,RefNum,ProjTitle,.... etc etc, that is not a filepath.

the open function needs a filepath string like 'textfile.txt' and you appear to be passing a very very long string of words, comma's etc.

You should read the documentation for the csv module, and i think Zope is confusing things. are you trying to parse a file thats stored somewhere (has a filename .csv etc)? or are you reading the csv data from zope. If it's from zope then i don't think the open() command is what you want.
Open opens a file and returns a fileobject, which csv.reader() then parses, if self[filename] isn't a file then it will not work.
Last edited by a1eio; May 7th, 2008 at 9:42 am.
Reply With Quote Quick reply to this message  
Join Date: Mar 2007
Posts: 110
Reputation: solsteel is on a distinguished road 
Solved Threads: 31
solsteel solsteel is offline Offline
Junior Poster

Re: How to parse in tricky .csv file content?

 
1
  #7
May 7th, 2008
Try this:
  1. csv.reader(str(self[filename]).split('\n')
The argument to csv.reader can be any iterable object that produces a string each time its next() method is called.
Reply With Quote Quick reply to this message  
Join Date: Mar 2008
Posts: 5
Reputation: shigehiro is an unknown quantity at this point 
Solved Threads: 0
shigehiro shigehiro is offline Offline
Newbie Poster

Re: How to parse in tricky .csv file content?

 
0
  #8
May 7th, 2008
Originally Posted by a1eio View Post
well as far as i can see, whatever resides within self[filename] is clearly not a valid filepath.

There was a sort of hint to what it might contain in the second error.



ProjCat,RefNum,ProjTitle,.... etc etc, that is not a filepath.

the open function needs a filepath string like 'textfile.txt' and you appear to be passing a very very long string of words, comma's etc.

You should read the documentation for the csv module, and i think Zope is confusing things. are you trying to parse a file thats stored somewhere (has a filename .csv etc)? or are you reading the csv data from zope. If it's from zope then i don't think the open() command is what you want.
Open opens a file and returns a fileobject, which csv.reader() then parses, if self[filename] isn't a file then it will not work.
Hmm.. it seems that you are right.
I will have to refer to Zope documentation of how to retrieve file object correctly. Will do that now...

Thanks .
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 138
Reputation: a1eio is an unknown quantity at this point 
Solved Threads: 21
a1eio's Avatar
a1eio a1eio is offline Offline
Junior Poster

Re: How to parse in tricky .csv file content?

 
0
  #9
May 7th, 2008
You don't need to get a fileobject though. As solsteel pointed out, the csv module doesn't need a fileobject, it just needs something to iterate through, so if you split the self[filename] string by every newline ('\n') then you will end up with a list of lines which the csv reader module can parse.

Solsteel's example looks perfect.
Originally Posted by solsteel View Post
  1. csv.reader(str(self[filename]).split('\n'))
Last edited by a1eio; May 7th, 2008 at 10:21 am.
Reply With Quote Quick reply to this message  
Join Date: Mar 2008
Posts: 5
Reputation: shigehiro is an unknown quantity at this point 
Solved Threads: 0
shigehiro shigehiro is offline Offline
Newbie Poster

Re: How to parse in tricky .csv file content?

 
0
  #10
May 7th, 2008
Cool!
it works now, I just have to pass in the iterable object as solsteel pointed out!

Last edited by shigehiro; May 7th, 2008 at 10:39 am.
Reply With Quote Quick reply to this message  
Reply

This thread has been marked solved.
Perhaps start a new thread instead?
Message:




Views: 2578 | Replies: 10
Thread Tools Search this Thread



Tag cloud for Python
About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC