943,722 Members | Top Members by Rank

Ad:
  • Python Discussion Thread
  • Marked Solved
  • Views: 5179
  • Python RSS
You are currently viewing page 1 of this multi-page discussion thread
May 6th, 2008
0

How to parse in tricky .csv file content?

Expand Post »
Hi all,

I would need your expertise/advice on the problem I encounter right now when I tried to parse in the contents of .csv file.

Here is the scenario:
1) I have csv file with the possible entries as follow:
ProjCat,RefNum,ProjTitle,MemberName,ProjDeadline,ProjGrade --> Header
I,0001,"Medical Research in XXX Field,2007","Gary,Susan",20.05.07,80
R,0023,Grid Computing in today era,"Henry Williams,Tulali Mark",04-May-07,--NA--
MP,0100,"Thinking in Logical Way, How to do it?","Williams,Harly Dimitry",10.02.07,NA
1,0114,"Computational Research for Biological Science, How to?",Alalaa,15-Mar-06,

2) I have to parse in the contents of this file to preferably, list of dictionaries.
So, the expected output list would be something like this:
outputList = [{'projCat':I,'RefNum':0001,'ProjTitle':"Medical Research in XXX Field,2007",'MemberName':"Gary,Susan",'ProjDeadline':20.05.07,ProjGrade:80},
{'projCat':R,'RefNum':0023,'ProjTitle':Grid Computing in today era,'MemberName':"Henry Williams,Tulali Mark",'ProjDeadline':04-May-07,ProjGrade:--NA--},
{'projCat':MP,'RefNum':0100,'ProjTitle':"Thinking in Logical Way, How to do it?",'MemberName':"Williams,Harly Dimitry",'ProjDeadline':10.02.07,ProjGrade:NA},
{'projCat':1,'RefNum':0114,'ProjTitle':"Computational Research for Biological Science, How to approach it?",'MemberName':Alalaa,'ProjDeadline':15-Mar-06,ProjGrade:}
]

3) Now, I have a problem when it comes to reading a line level of the file as the CSV file may consist of string data that can contain commas (such as, 'ProjTile' & 'MemberName' field)
What currently I have in hand right now is strings of line.
If I just use 'split' method of str, it will give me a misleading result, for e.g. "Medical Research in XXX Field,2007" will be splitted into ['Medical Research in XXX Field', '2007'] which is not what I want
Is there any other ways that I can split the fields correctly? using regular expression? any good approach for solving this?

4) Is it possible that the value of certain key in dictionary is left empty (as in the value for 'ProjGrad' key of the last entry of the above outputList)?

Any suggestions would be welcomed.

Thanks in advance

Shige
Reputation Points: 10
Solved Threads: 0
Newbie Poster
shigehiro is offline Offline
5 posts
since Mar 2008
May 6th, 2008
0

Re: How to parse in tricky .csv file content?

The csv module is ideal for your parsing your data.
Python Syntax (Toggle Plain Text)
  1. import csv
  2.  
  3. fn = 'data.csv'
  4. f = open(fn)
  5. reader = csv.reader(f)
  6. headerList = reader.next()
  7. outputList = []
  8. for line in reader:
  9. # test for True in case there is a blank line
  10. if line:
  11. dd = {}
  12. for i, key in enumerate(headerList):
  13. dd[key]=line[i]
  14. outputList.append(dd)
  15.  
  16. f.close()
Reputation Points: 86
Solved Threads: 40
Junior Poster
solsteel is offline Offline
141 posts
since Mar 2007
May 7th, 2008
0

Re: How to parse in tricky .csv file content?

Click to Expand / Collapse  Quote originally posted by solsteel ...
The csv module is ideal for your parsing your data.
Python Syntax (Toggle Plain Text)
  1. import csv
  2.  
  3. fn = 'data.csv'
  4. f = open(fn)
  5. reader = csv.reader(f)
  6. headerList = reader.next()
  7. outputList = []
  8. for line in reader:
  9. # test for True in case there is a blank line
  10. if line:
  11. dd = {}
  12. for i, key in enumerate(headerList):
  13. dd[key]=line[i]
  14. outputList.append(dd)
  15.  
  16. f.close()
Hi Solsteel,
Thank you for your reply.

So I headed up to your suggestion and tried out the simpler scenario by putting the each line as entry in dictionary.
Here are my tries:
First attempt":
  import csv
  outputDict = {}

  inputFile = open(self[filename],'r')
  fileReader = csv.reader(inputFile)
  
  keyIndex = 0
  for line in fileReader:
    outputDict[keyIndex] = line
    
    keyIndex+=1
  
  inputFile.close()
  
  return outputDict
The above gave me an error of "coercing to Unicode: need string or buffer, ImplicitAcquirerWrapper found", pointing to line No. 4 (inputFile = open(self[filename],'r'))

So.. my second attempt was trying to cast the self[filename] into 'str' type... with slight mod as below:
  import csv
  outputDict = {}

  inputFile = open(str(self[filename]),'r')
  fileReader = csv.reader(inputFile)
  
  keyIndex = 0
  for line in fileReader:
    outputDict[keyIndex] = line
    
    keyIndex+=1
  
  inputFile.close()
  
  return outputDict
Now it gave me an [IOError] of [Errno 36] File name too long: 'ProjCat,RefNum,ProjTitle,MemberName,ProjDeadline,ProjGrade\nI,0001,"Medical Research in XXX Field,2007","Gary,Susan",20.05.07,80\nR,0023,Grid Computing in today era,"Henry Williams,Tulali Mark",04-May-07,--NA--........'

1) How can I workaround this problem?
Was I right to attempt my 2nd approach as above?

2) And also, my file might content latin words such as 'México', 'Sã Joã' . How can I include encoding in parsing the file so that these words can be rendered correctly?

Thanks again.!
Last edited by shigehiro; May 7th, 2008 at 5:11 am.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
shigehiro is offline Offline
5 posts
since Mar 2008
May 7th, 2008
0

Re: How to parse in tricky .csv file content?

self[filename] does not appear to be a valid file name. Following is an example of a valid file name:
Python Syntax (Toggle Plain Text)
  1. r'H:\Zip_Files\618 Johnston\IFA040308\24747-IFA040308.zip'
Reputation Points: 86
Solved Threads: 40
Junior Poster
solsteel is offline Offline
141 posts
since Mar 2007
May 7th, 2008
0

Re: How to parse in tricky .csv file content?

Click to Expand / Collapse  Quote originally posted by solsteel ...
self[filename] does not appear to be a valid file name. Following is an example of a valid file name:
Python Syntax (Toggle Plain Text)
  1. r'H:\Zip_Files\618 Johnston\IFA040308\24747-IFA040308.zip'
Aha... pardon me for not saying this earlier, actually I am developing Python in Zope
So the csv file is stored in ZopeDB...
If I am only using f = open(filename-path), it will prompt me that it can't find a filename.
As such, I have to retrieve the file object by using 'self[filename]', instead of specifying the exact path to file..

Shige
Reputation Points: 10
Solved Threads: 0
Newbie Poster
shigehiro is offline Offline
5 posts
since Mar 2008
May 7th, 2008
0

Re: How to parse in tricky .csv file content?

well as far as i can see, whatever resides within self[filename] is clearly not a valid filepath.

There was a sort of hint to what it might contain in the second error.

Quote ...
[IOError] of [Errno 36] File name too long: 'ProjCat,RefNum,ProjTitle,MemberName,ProjDeadline,ProjGrade\nI,0001,"Medical Research in XXX Field,2007","Gary,Susan",20.05.07,80\nR,0023,Grid Computing in today era,"Henry Williams,Tulali Mark",04-May-07,--NA--........'
ProjCat,RefNum,ProjTitle,.... etc etc, that is not a filepath.

the open function needs a filepath string like 'textfile.txt' and you appear to be passing a very very long string of words, comma's etc.

You should read the documentation for the csv module, and i think Zope is confusing things. are you trying to parse a file thats stored somewhere (has a filename .csv etc)? or are you reading the csv data from zope. If it's from zope then i don't think the open() command is what you want.
Open opens a file and returns a fileobject, which csv.reader() then parses, if self[filename] isn't a file then it will not work.
Last edited by a1eio; May 7th, 2008 at 9:42 am.
Reputation Points: 26
Solved Threads: 24
Junior Poster
a1eio is offline Offline
140 posts
since Aug 2005
May 7th, 2008
1

Re: How to parse in tricky .csv file content?

Try this:
Python Syntax (Toggle Plain Text)
  1. csv.reader(str(self[filename]).split('\n')
The argument to csv.reader can be any iterable object that produces a string each time its next() method is called.
Reputation Points: 86
Solved Threads: 40
Junior Poster
solsteel is offline Offline
141 posts
since Mar 2007
May 7th, 2008
0

Re: How to parse in tricky .csv file content?

Click to Expand / Collapse  Quote originally posted by a1eio ...
well as far as i can see, whatever resides within self[filename] is clearly not a valid filepath.

There was a sort of hint to what it might contain in the second error.



ProjCat,RefNum,ProjTitle,.... etc etc, that is not a filepath.

the open function needs a filepath string like 'textfile.txt' and you appear to be passing a very very long string of words, comma's etc.

You should read the documentation for the csv module, and i think Zope is confusing things. are you trying to parse a file thats stored somewhere (has a filename .csv etc)? or are you reading the csv data from zope. If it's from zope then i don't think the open() command is what you want.
Open opens a file and returns a fileobject, which csv.reader() then parses, if self[filename] isn't a file then it will not work.
Hmm.. it seems that you are right.
I will have to refer to Zope documentation of how to retrieve file object correctly. Will do that now...

Thanks .
Reputation Points: 10
Solved Threads: 0
Newbie Poster
shigehiro is offline Offline
5 posts
since Mar 2008
May 7th, 2008
0

Re: How to parse in tricky .csv file content?

You don't need to get a fileobject though. As solsteel pointed out, the csv module doesn't need a fileobject, it just needs something to iterate through, so if you split the self[filename] string by every newline ('\n') then you will end up with a list of lines which the csv reader module can parse.

Solsteel's example looks perfect.
Click to Expand / Collapse  Quote originally posted by solsteel ...
Python Syntax (Toggle Plain Text)
  1. csv.reader(str(self[filename]).split('\n'))
Last edited by a1eio; May 7th, 2008 at 10:21 am.
Reputation Points: 26
Solved Threads: 24
Junior Poster
a1eio is offline Offline
140 posts
since Aug 2005
May 7th, 2008
0

Re: How to parse in tricky .csv file content?

Cool!
it works now, I just have to pass in the iterable object as solsteel pointed out!

Last edited by shigehiro; May 7th, 2008 at 10:39 am.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
shigehiro is offline Offline
5 posts
since Mar 2008

This thread is solved

Either the thread starter or a moderator has marked this thread as solved. You can most likely trust the responses and answers given. There is most likely no reason for any further responses to be posted here. If you have a related question, please start a new thread in this forum instead.

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in Python Forum Timeline: A List of Class Objects
Next Thread in Python Forum Timeline: How do I hide the command prompt while my Tk Inter program runs





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC