I have a file of the following format:
a 1
a 2
a 3
b 4
b 5
b 6
c 7
c 8
c 9

Here is my code:

def file_to_dict(fname):
    f = open("file.txt")
    d = {}
    for line in f:
        columns = line.split(" ")
        letters = columns[0]
        numbers = columns[1].strip()
        d[letters] = list(numbers)
    print d
    
if __name__ == "__main__":
    fname = "file.txt"

The output must be {"a": ["1", "2", "3"], "b": ["4", "5", "6"], "c": ["7", "8", "9"]}. But my output shows only the last repeated key and its value, i.e. {"a": ["3"], "b": ["6"], "c": ["9"]}. Can you help?

Can you help?

Why, yes! This case would be a good one to use the dictionary's get method, which will allow you to determine if the key is already in the dictionary or not, and act accordingly.

def file_to_dict(fname):
    f = open("file.txt")
    d = {}
    for line in f:
        columns = line.split(" ")
        letters = columns[0]
        numbers = columns[1].strip()
        if d.get(letters):
            d[letters].append(numbers)
        else:
            d[letters] = list(numbers)
    print d
    
if __name__ == "__main__":
    fname = "file.txt"

Try that on for size.

Basically, by default get will return None if the key is not in the dictionary (you can pass a second parameter to mean default but I prefer None type). So first we check to see if the key is in the dictionary already. If it is, we use the list method append to add the new number onto the end of our list. If the get statement returned None, we instead do what you used to do (create a list as the value to the key).

HTH

Wow, thanks a lot! The idea of using a method didn't cross my mind. But for one of my test cases I changed the numbers to two-digit numbers and now the first two-digit number in a list becomes "ripped" as in {a:}. I am just wondering how does this happen? How come it messes up only the first number?

Edited 7 Years Ago by pyprog: n/a

Wow, thanks a lot! The idea of using a method didn't cross my mind. But for one of my test cases I changed the numbers to two-digit numbers and now the first two-digit number in a list becomes "ripped" as in {a:}. I am just wondering how does this happen? How come it messes up only the first number?

Because a string is an iterable object. When using the list() method, it converts any such object to a list by iterating over it. Example:

>>> list( (1,2,3,4,5) )
[1, 2, 3, 4, 5]
>>> list( 'Hi my name is bob' )
['H', 'i', ' ', 'm', 'y', ' ', 'n', 'a', 'm', 'e', ' ', 'i', 's', ' ', 'b', 'o', 'b']
>>> [ 'Hi my name is bob' ]
['Hi my name is bob']
>>>

Just use the square brackets instead of list() and you should be good to go.

dict.setdefault is the way to go:

import re

def parse(text):
    items = re.findall('^(\w+)\s+(\d+)\s*$', text, re.M)
    data = {}
    for key, val in items:
        # data.setdefault(key, []).append(val) #<= add string values
        data.setdefault(key, []).append(int(val)) #<= add int values
    return data

Test:

#text = file('data.txt', 'rt').read()
text = '''\
a 1
a 2
a 3
b 4
b 5
b 6
c 7
c 8
c 9
'''
>>> print parse(text)
{'a': [1, 2, 3], 'c': [7, 8, 9], 'b': [4, 5, 6]}

Edited 7 Years Ago by pythopian: n/a

here's my version using the .get() method:

def file_to_dict( filename ):
    f = open( filename, "r" )
    lines = f.readlines()
    f.close()
    d = {}

    for line in lines:
        key = line.split( " " )[ 0 ]
        value = line.split( " " )[ 1 ].strip()
        d[ key ] = d.get( key, [] ) + [ value ]
    return d

#test...
>>> file_to_dict( "t.txt" )
{'a': ['1', '2', '3'], 'c': ['7', '8', '9'], 'b': ['4', '5', '6']}
>>>

This might be a little easier to understand for a beginner (thanks to pythopian) ...

# parse text data into a dictionary using 
# split at newline and split at space

def parse2dict(text):
    data_dict = {}
    for line in text.split('\n'):
        if line:
            key, val = line.split()
            data_dict.setdefault(key, []).append(val)
    return data_dict

#text = file('data.txt', 'r').read()
text = """\
a 123
a 456
a 789
b 4
b 5
b 6
c 7
c 8
c 9
"""

print( parse2dict(text) )

"""
{'a': ['123', '456', '789'], 'c': ['7', '8', '9'], 'b': ['4', '5', '6']}
"""

Edited 7 Years Ago by vegaseat: pythopian

here's my version using the .get() method ...

d.setdefault(key, []).append(value) is the preferred (and more efficient) python way to express
d[ key ] = d.get( key, [] ) + [ value ]. (Actually it's one of the recipes in the Python Cookbook.)

Edited 7 Years Ago by pythopian: n/a

This might be a little easier to understand for a beginner (thanks to pythopian) ...

Vegaseat, you are right that you code is probably easier to understand for a beginner than mine. I'd like to add though that there also is a substantial semantic difference in function between the two:

Yours would fail with for lines containing unexpected patterns (ex. "ValueError: too many values to unpack" if there are more than 2 terms in the line). Mine would skip such lines. This is not to say that one behavior is better than the other, but the reader should be aware of the difference.

This question has already been answered. Start a new discussion instead.