I have a file like this a csv file

DB01967 ZIPA
DB01967 PFAZ
DB01992 YVBK
DB01992 ZAP70
DB02191 ZIPA
DB02319 YQHD
DB02552 ZFPP

I want to print a file of csv in the format like

DB01967 ZIPA PFAZ
DB01992 YVBK ZAP70
DB02191 ZIPA
DB02319 YQHD
DB02552 ZFPP

i am totally new to python problem in parsing.

The Python standard library features a csv module for CSV file reading and writing. Read the module documentation to learn how to use it.

I'd use a dictionary or collections.OrderedDict to associate each unique key with a list of values.

A somewhat old fashioned approach that better shows what you have to do ...

# space separated data
raw_data = '''DB01967 ZIPA
DB01967 PFAZ
DB01992 YVBK
DB01992 ZAP70
DB02191 ZIPA
DB02319 YQHD
DB02552 ZFPP'''

# save the raw data as text
fname = 'data1.txt'
with open(fname, 'w') as fout:
    fout.write(raw_data)

# read the data back line by line
data_dict = {}
for line in open(fname, 'r'):
    # split the line into key and value at the space
    key, val = line.split()
    # form the dictionary and handle key collisions
    data_dict.setdefault(key, []).append(val)

# pretty print the dictionary (shows keys in order too)
import pprint
pprint.pprint(data_dict)

'''
{'DB01967': ['ZIPA', 'PFAZ'],
 'DB01992': ['YVBK', 'ZAP70'],
 'DB02191': ['ZIPA'],
 'DB02319': ['YQHD'],
 'DB02552': ['ZFPP']}
'''

print('-'*30)  # print 30 dashes

# convert dictionary to space separated data text
new_data = ""
space = " "
newline = "\n"
# sort the keys
for key in sorted(data_dict.keys()):
    new_data += key + space
    # iterate through each value list
    for val in data_dict[key]:
        new_data += val + space
    new_data += newline

print(new_data)

'''
DB01967 ZIPA PFAZ 
DB01992 YVBK ZAP70 
DB02191 ZIPA 
DB02319 YQHD 
DB02552 ZFPP 
'''

And here with itertools 'magic', as here seems to be time for present solutions:

from itertools import groupby
# space separated data
raw_data = '''DB01967 ZIPA
DB01967 PFAZ
DB01992 YVBK
DB01992 ZAP70
DB02191 ZIPA
DB02319 YQHD
DB02552 ZFPP'''.splitlines()

data_groups = groupby(raw_data, key = lambda x: x.split()[0])
for group, items in data_groups:
    print group, ' '.join(item.split()[1] for item in items)

Edited 4 Years Ago by pyTony

This article has been dead for over six months. Start a new discussion instead.