Is there a way to read in a two-columned CSV file, and based on the fields in 1st column, output many different files? The input/output looks something like:

input.csv
call    Call Mom.
call    Call T-Mobile.
go      Go home.
go      Go to school.
go      Go to gas station.
play    Play music.
play    Play Beatles.


outputs 3 files:
call.xml
<value><tokens><token>Call</token><token>Mom</token></tokens></value>
<value><tokens><token>Call</token><token>T-Mobile</token></tokens></value>

go.xml
<value><tokens><token>Go</token><token>home</token></tokens></value>
<value><tokens><token>Go</token><token>to</token><token>school</token></tokens></value>
<value><tokens><token>Go</token><token>to</token><token>gas</token><token>station</token></tokens></value>

play.xml
<value><tokens><token>Play</token><token>music</token></tokens></value>
<value><tokens><token>Play</token><token>Beatles</token></tokens></value>

I'm stuck at the part of checking which items in 1st column are the same, then saving all those identical items along with their rows into a new list??

You could initialize a dictionary, then use the first column as the key and accumulate a growing value separated by some character you don't expect to find in the data, like, say, "_":

>>> d={}
>>> d["test"]="_".join((d.get("test",""),"abc"))
>>> d["test"]
'_abc'
>>> d["test"]="_".join((d.get("test",""),"xyz"))
>>> d["test"]
'_abc_xyz'
>>>

where instead of "test" you'll have: line.split('\t')[0] (it appears your data is separated by tabs) and instead of "abc" you'll have line.split('\t')[1]

This article has been dead for over six months. Start a new discussion instead.