| | |
How to parse in tricky .csv file content?
Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
Thread Solved |
•
•
Join Date: Mar 2008
Posts: 5
Reputation:
Solved Threads: 0
Hi all,
I would need your expertise/advice on the problem I encounter right now when I tried to parse in the contents of .csv file.
Here is the scenario:
1) I have csv file with the possible entries as follow:
ProjCat,RefNum,ProjTitle,MemberName,ProjDeadline,ProjGrade --> Header
I,0001,"Medical Research in XXX Field,2007","Gary,Susan",20.05.07,80
R,0023,Grid Computing in today era,"Henry Williams,Tulali Mark",04-May-07,--NA--
MP,0100,"Thinking in Logical Way, How to do it?","Williams,Harly Dimitry",10.02.07,NA
1,0114,"Computational Research for Biological Science, How to?",Alalaa,15-Mar-06,
2) I have to parse in the contents of this file to preferably, list of dictionaries.
So, the expected output list would be something like this:
outputList = [{'projCat':I,'RefNum':0001,'ProjTitle':"Medical Research in XXX Field,2007",'MemberName':"Gary,Susan",'ProjDeadline':20.05.07,ProjGrade:80},
{'projCat':R,'RefNum':0023,'ProjTitle':Grid Computing in today era,'MemberName':"Henry Williams,Tulali Mark",'ProjDeadline':04-May-07,ProjGrade:--NA--},
{'projCat':MP,'RefNum':0100,'ProjTitle':"Thinking in Logical Way, How to do it?",'MemberName':"Williams,Harly Dimitry",'ProjDeadline':10.02.07,ProjGrade:NA},
{'projCat':1,'RefNum':0114,'ProjTitle':"Computational Research for Biological Science, How to approach it?",'MemberName':Alalaa,'ProjDeadline':15-Mar-06,ProjGrade:}
]
3) Now, I have a problem when it comes to reading a line level of the file as the CSV file may consist of string data that can contain commas (such as, 'ProjTile' & 'MemberName' field)
What currently I have in hand right now is strings of line.
If I just use 'split' method of str, it will give me a misleading result, for e.g. "Medical Research in XXX Field,2007" will be splitted into ['Medical Research in XXX Field', '2007'] which is not what I want
Is there any other ways that I can split the fields correctly? using regular expression? any good approach for solving this?
4) Is it possible that the value of certain key in dictionary is left empty (as in the value for 'ProjGrad' key of the last entry of the above outputList)?
Any suggestions would be welcomed.
Thanks in advance
Shige
I would need your expertise/advice on the problem I encounter right now when I tried to parse in the contents of .csv file.
Here is the scenario:
1) I have csv file with the possible entries as follow:
ProjCat,RefNum,ProjTitle,MemberName,ProjDeadline,ProjGrade --> Header
I,0001,"Medical Research in XXX Field,2007","Gary,Susan",20.05.07,80
R,0023,Grid Computing in today era,"Henry Williams,Tulali Mark",04-May-07,--NA--
MP,0100,"Thinking in Logical Way, How to do it?","Williams,Harly Dimitry",10.02.07,NA
1,0114,"Computational Research for Biological Science, How to?",Alalaa,15-Mar-06,
2) I have to parse in the contents of this file to preferably, list of dictionaries.
So, the expected output list would be something like this:
outputList = [{'projCat':I,'RefNum':0001,'ProjTitle':"Medical Research in XXX Field,2007",'MemberName':"Gary,Susan",'ProjDeadline':20.05.07,ProjGrade:80},
{'projCat':R,'RefNum':0023,'ProjTitle':Grid Computing in today era,'MemberName':"Henry Williams,Tulali Mark",'ProjDeadline':04-May-07,ProjGrade:--NA--},
{'projCat':MP,'RefNum':0100,'ProjTitle':"Thinking in Logical Way, How to do it?",'MemberName':"Williams,Harly Dimitry",'ProjDeadline':10.02.07,ProjGrade:NA},
{'projCat':1,'RefNum':0114,'ProjTitle':"Computational Research for Biological Science, How to approach it?",'MemberName':Alalaa,'ProjDeadline':15-Mar-06,ProjGrade:}
]
3) Now, I have a problem when it comes to reading a line level of the file as the CSV file may consist of string data that can contain commas (such as, 'ProjTile' & 'MemberName' field)
What currently I have in hand right now is strings of line.
If I just use 'split' method of str, it will give me a misleading result, for e.g. "Medical Research in XXX Field,2007" will be splitted into ['Medical Research in XXX Field', '2007'] which is not what I want
Is there any other ways that I can split the fields correctly? using regular expression? any good approach for solving this?
4) Is it possible that the value of certain key in dictionary is left empty (as in the value for 'ProjGrad' key of the last entry of the above outputList)?
Any suggestions would be welcomed.
Thanks in advance
Shige
•
•
Join Date: Mar 2007
Posts: 110
Reputation:
Solved Threads: 31
The csv module is ideal for your parsing your data.
Python Syntax (Toggle Plain Text)
import csv fn = 'data.csv' f = open(fn) reader = csv.reader(f) headerList = reader.next() outputList = [] for line in reader: # test for True in case there is a blank line if line: dd = {} for i, key in enumerate(headerList): dd[key]=line[i] outputList.append(dd) f.close()
•
•
Join Date: Mar 2008
Posts: 5
Reputation:
Solved Threads: 0
•
•
•
•
The csv module is ideal for your parsing your data.
Python Syntax (Toggle Plain Text)
import csv fn = 'data.csv' f = open(fn) reader = csv.reader(f) headerList = reader.next() outputList = [] for line in reader: # test for True in case there is a blank line if line: dd = {} for i, key in enumerate(headerList): dd[key]=line[i] outputList.append(dd) f.close()
Thank you for your reply.
So I headed up to your suggestion and tried out the simpler scenario by putting the each line as entry in dictionary.
Here are my tries:
First attempt":
import csv
outputDict = {}
inputFile = open(self[filename],'r')
fileReader = csv.reader(inputFile)
keyIndex = 0
for line in fileReader:
outputDict[keyIndex] = line
keyIndex+=1
inputFile.close()
return outputDictSo.. my second attempt was trying to cast the self[filename] into 'str' type... with slight mod as below:
import csv
outputDict = {}
inputFile = open(str(self[filename]),'r')
fileReader = csv.reader(inputFile)
keyIndex = 0
for line in fileReader:
outputDict[keyIndex] = line
keyIndex+=1
inputFile.close()
return outputDict1) How can I workaround this problem?
Was I right to attempt my 2nd approach as above?
2) And also, my file might content latin words such as 'México', 'Sã Joã' . How can I include encoding in parsing the file so that these words can be rendered correctly?
Thanks again.!
Last edited by shigehiro; May 7th, 2008 at 5:11 am.
•
•
Join Date: Mar 2007
Posts: 110
Reputation:
Solved Threads: 31
self[filename] does not appear to be a valid file name. Following is an example of a valid file name:
Python Syntax (Toggle Plain Text)
r'H:\Zip_Files\618 Johnston\IFA040308\24747-IFA040308.zip'
•
•
Join Date: Mar 2008
Posts: 5
Reputation:
Solved Threads: 0
•
•
•
•
self[filename] does not appear to be a valid file name. Following is an example of a valid file name:
Python Syntax (Toggle Plain Text)
r'H:\Zip_Files\618 Johnston\IFA040308\24747-IFA040308.zip'
So the csv file is stored in ZopeDB...
If I am only using f = open(filename-path), it will prompt me that it can't find a filename.
As such, I have to retrieve the file object by using 'self[filename]', instead of specifying the exact path to file..
Shige
well as far as i can see, whatever resides within
There was a sort of hint to what it might contain in the second error.
ProjCat,RefNum,ProjTitle,.... etc etc, that is not a filepath.
the open function needs a filepath string like 'textfile.txt' and you appear to be passing a very very long string of words, comma's etc.
You should read the documentation for the csv module, and i think Zope is confusing things. are you trying to parse a file thats stored somewhere (has a filename .csv etc)? or are you reading the csv data from zope. If it's from zope then i don't think the open() command is what you want.
Open opens a file and returns a fileobject, which csv.reader() then parses, if
self[filename] is clearly not a valid filepath.There was a sort of hint to what it might contain in the second error.
•
•
•
•
[IOError] of [Errno 36] File name too long: 'ProjCat,RefNum,ProjTitle,MemberName,ProjDeadline,ProjGrade\nI,0001,"Medical Research in XXX Field,2007","Gary,Susan",20.05.07,80\nR,0023,Grid Computing in today era,"Henry Williams,Tulali Mark",04-May-07,--NA--........'
the open function needs a filepath string like 'textfile.txt' and you appear to be passing a very very long string of words, comma's etc.
You should read the documentation for the csv module, and i think Zope is confusing things. are you trying to parse a file thats stored somewhere (has a filename .csv etc)? or are you reading the csv data from zope. If it's from zope then i don't think the open() command is what you want.
Open opens a file and returns a fileobject, which csv.reader() then parses, if
self[filename] isn't a file then it will not work. Last edited by a1eio; May 7th, 2008 at 9:42 am.
•
•
Join Date: Mar 2007
Posts: 110
Reputation:
Solved Threads: 31
Try this:
The argument to csv.reader can be any iterable object that produces a string each time its next() method is called.
Python Syntax (Toggle Plain Text)
csv.reader(str(self[filename]).split('\n')
•
•
Join Date: Mar 2008
Posts: 5
Reputation:
Solved Threads: 0
•
•
•
•
well as far as i can see, whatever resides withinself[filename]is clearly not a valid filepath.
There was a sort of hint to what it might contain in the second error.
ProjCat,RefNum,ProjTitle,.... etc etc, that is not a filepath.
the open function needs a filepath string like 'textfile.txt' and you appear to be passing a very very long string of words, comma's etc.
You should read the documentation for the csv module, and i think Zope is confusing things. are you trying to parse a file thats stored somewhere (has a filename .csv etc)? or are you reading the csv data from zope. If it's from zope then i don't think the open() command is what you want.
Open opens a file and returns a fileobject, which csv.reader() then parses, ifself[filename]isn't a file then it will not work.
I will have to refer to Zope documentation of how to retrieve file object correctly. Will do that now...
Thanks . You don't need to get a fileobject though. As solsteel pointed out, the csv module doesn't need a fileobject, it just needs something to iterate through, so if you split the self[filename] string by every newline ('\n') then you will end up with a list of lines which the csv reader module can parse.
Solsteel's example looks perfect.
Solsteel's example looks perfect.
Last edited by a1eio; May 7th, 2008 at 10:21 am.
![]() |
Other Threads in the Python Forum
- Previous Thread: A List of Class Objects
- Next Thread: How do I hide the command prompt while my Tk Inter program runs
Views: 2578 | Replies: 10
| Thread Tools | Search this Thread |
Tag cloud for Python
abrupt apache application argv beginner binary calculator character code command cx-freeze development dictionary dynamic error event examples excel file float format ftp function google gui hints homework ideas import input java keyboard launcher line linux list lists loop microphone mouse movingimageswithpygame newb number numbers obexftp output parsing path permissions phonebook port prime program programming projects py2exe pygame pyglet pyqt pysimplewizard python random recursion recursive refresh return reverse scrolledtext session shebang signal simple sprite ssh string strings table terminal text thread threading time tkinter tlapse trick tuple tutorial ubuntu unicode unit urllib urllib2 valueerror variable verify voip windows wordgame wxpython xlib





