Python parse

Question

abhik1368 0 Newbie Poster

12 Years Ago

I have a file like this a csv file

DB01967 ZIPA
DB01967 PFAZ
DB01992 YVBK
DB01992 ZAP70
DB02191 ZIPA
DB02319 YQHD
DB02552 ZFPP

I want to print a file of csv in the format like

DB01967 ZIPA PFAZ
DB01992 YVBK ZAP70
DB02191 ZIPA
DB02319 YQHD
DB02552 ZFPP

i am totally new to python problem in parsing.

python

4 Contributors
3 Replies
130 Views
17 Hours Discussion Span
Latest Post 12 Years Ago Latest Post by TrustyTony

TrustyTony 888 pyMod

12 Years Ago

And here with itertools 'magic', as here seems to be time for present solutions:

from itertools import groupby
# space separated data
raw_data = '''DB01967 ZIPA
DB01967 PFAZ
DB01992 YVBK
DB01992 ZAP70
DB02191 ZIPA
DB02319 YQHD
DB02552 ZFPP'''.splitlines()

data_groups = groupby(raw_data, key = lambda x: x.split()[0])
for group, items in data_groups:
    print group, ' '.join(item.split()[1] for item in items)

Edited 12 Years Ago by TrustyTony

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

lrh9 95 Posting Whiz in Training · Answer 1 · 2012-04-09T03:59:28+00:00

The Python standard library features a csv module for CSV file reading and writing. Read the module documentation to learn how to use it.

I'd use a dictionary or collections.OrderedDict to associate each unique key with a list of values.

vegaseat 1,735 DaniWeb's Hypocrite Team Colleague · Answer 2 · 2012-04-09T18:46:25+00:00

A somewhat old fashioned approach that better shows what you have to do ...

# space separated data
raw_data = '''DB01967 ZIPA
DB01967 PFAZ
DB01992 YVBK
DB01992 ZAP70
DB02191 ZIPA
DB02319 YQHD
DB02552 ZFPP'''

# save the raw data as text
fname = 'data1.txt'
with open(fname, 'w') as fout:
    fout.write(raw_data)

# read the data back line by line
data_dict = {}
for line in open(fname, 'r'):
    # split the line into key and value at the space
    key, val = line.split()
    # form the dictionary and handle key collisions
    data_dict.setdefault(key, []).append(val)

# pretty print the dictionary (shows keys in order too)
import pprint
pprint.pprint(data_dict)

'''
{'DB01967': ['ZIPA', 'PFAZ'],
 'DB01992': ['YVBK', 'ZAP70'],
 'DB02191': ['ZIPA'],
 'DB02319': ['YQHD'],
 'DB02552': ['ZFPP']}
'''

print('-'*30)  # print 30 dashes

# convert dictionary to space separated data text
new_data = ""
space = " "
newline = "\n"
# sort the keys
for key in sorted(data_dict.keys()):
    new_data += key + space
    # iterate through each value list
    for val in data_dict[key]:
        new_data += val + space
    new_data += newline

print(new_data)

'''
DB01967 ZIPA PFAZ 
DB01992 YVBK ZAP70 
DB02191 ZIPA 
DB02319 YQHD 
DB02552 ZFPP 
'''