Hi, I'm new to programming so please excuse any obvious questions.

I have a text file with the following entries:

mydescription, myword

yellow, mango
yellow, banana
orange, orange
green, pineapple
green, mango
pink, mango

What's important is that there is no order, both the 'mydescription' field and 'myword' change randomly. I need to count the number of unique mydescriptions each myword appears with. So for example in the above lines mango appears with two unique colours - yellow and green so it would have a count of 2.

I'm trying to put them into a dictionary with the mywords as keys and mydescriptions as a list of values.

dict = {}

for line in file:
  
    mydesc,myword = line.split(',')
    if myword in dict:
        dict[myword] = list.append(mydescription) //is this correct?
    else:
        dict[myword] = create a new list with first entry here..

I'm unable to put the above algorithm into Python syntax. How do I create a list as a value to the dict so it can be:

dict[myword] = [number of mynumbers associated with that word]

Thanks,
Mendo

Edited 6 Years Ago by mandoza671: n/a

First, don't use the names of fundamental data types (like dict and list) as variable names. You could write

mydict = {}

for line in file:
  
    mydesc,myword = [w.strip() for w in line.split(',')]
    if myword in mydict:
        mydict[myword].append(mydesc)
    else:
        mydict[myword] = [mydesc,]

Edited 6 Years Ago by Gribouillis: n/a

For unique descriptions, you would want to test the list also, only appending if that color is not already in the list, or use a set. Also, strip() both the description and the word before adding to the dictionary to get rid of spaces and newlines. Then you would use the length of the list/set.

Edited 6 Years Ago by woooee: n/a

I prefer to use setdefault to avoid branching in the python code (I assume that the builtin type is more efficient at such things)

mydict = {}

for line in file:
    mydesc,myword = [w.strip() for w in line.split(',')]
    mydict.setdefault(myword,[]) # after this, we know mydict[myword] is a list
    mydict[myword].append(mydesc)

If your file might have duplicates, you will want the default value to be a set() not the empty list; and you change append to add:

file = [ # a stub for testing
    "yellow, mango",
    "yellow, banana",
    "orange, orange",
    "green, pineapple",
    "green, mango",
    "pink, mango",
    "pink, mango",
    "pink, mango",
  ]
mydict = {}
for line in file:
    mydesc,myword = [w.strip() for w in line.split(',')]
    mydict.setdefault(myword,set())
    mydict[myword].add(mydesc)
for k,v in mydict.items():
    print(k,v)

Edited 6 Years Ago by griswolf: handle dupiicates

This question has already been answered. Start a new discussion instead.