Can someone please show me the best way to achieve this with the least amount of lines? Im a recovering PHP coder, I have one solution. I was wondering if there was a quicker more pythony way to do this:

I have in "results" [('Basp1', 'Aen2'), ('Basp1', 'Ahy18'), ('Basp1', 'Ahy26'), ('Yeps1', 'Eco1'), ('Yeps1', 'Ahy7'), ('Yeps1', 'Asa1'), ('Cagi1', 'Ahy15'), ('Cagi1', 'Ahy24'), ('Cagi1', 'Ahy31'), ('Bado1', 'Aen2'), ('Bado1', 'Eco1'), ('Bado1', 'Ahy38'), ('Rhle1', 'Ahy2'), ('Rhle1', 'Aen2'), ('Rhle1', 'Aen1'), ('Prma1', 'Aen2'), ('Prma1', 'Aen1'), ('Prma1', 'Ahy26'), ('Cysp1', 'Aen2'), ('Cysp1', 'Aen1'), ('Cysp1', 'Ahy26'), ('Basp2', 'Aen2'), ('Basp2', 'Eco1'), ('Basp2', 'Ahy38'), ('Naph1', 'Eco1'), ('Naph1', 'Ahy40'), ('Naph1', 'Ahy30'), ('Phte1', 'Eco1'), ('Phte1', 'Ahy2'), ('Phte1', 'Aen2'), ('Phte1', 'Aen2'), ('Sepr1', 'Ahy2'), ('Sepr1', 'Eco1'), ('Sepr1', 'Ahy23'), ('Mesp1', 'Asa1'), ('Mesp1', 'Ahy33'), ('Mesp1', 'Ahy34'), ('Prsp1', 'Aen2'), ('Prsp1', 'Aen1'), ('Prsp1', 'Ahy26'), ('Brsp2', 'Aen2'), ('Brsp2', 'Aen1'), ('Brsp2', 'Ahy2'), ('Lapl1', 'Aca1'), ('Lapl1', 'Eco1'), ('Lapl1', 'Ave1')] The first values of each 3 consecutive tuples are the same.

I want to build a dictionary that looks like this
keys = {Basp1: [Aen2,Ahy18,Ahy26]} etc...

Right now I have two solutions:

for pairs in results:
	if keys.has_key(pairs[0]):
		keys[pairs[0]].append(pairs[1])
	else:
		keys[pairs[0]]=[pairs[1]]
for pairs in results:
	try:
		keys[pairs[0]].append(pairs[1])
	except:
		keys[pairs[0]]=[pairs[1]]

Thanks!

Look at the setdefault method on dictionaries. If you first setdefault(key,[]) you can then always keys[key].append(pairs[1])

Edited 5 Years Ago by griswolf: n/a

You can try this as suggested above:

keys = {}
for pair in results:
    keys.setdefault(pair[0], []).append(pair[1])

Would you mind telling us what the different strings represent? (My initial impression was that they were the names of genes and gene complexes, but I'm not certain.) Having some idea of what the data represents might give us some new ideas as to how to manipulate it more effectively.

Yes, these are symbols for genes. These are results extracted from a Smith-Waterman Search.
Each tuple is a (Query, Hit).

One more way is to use defaultdict, but I think this is unnecessary optimization, as your alternatives are what I consider good Python style.

from collections import defaultdict
keys = defaultdict(list)
for key, value in results:
	keys[key].append(value)

Then there is more functional programming style with groupby, must transpose and take only second values:

import itertools 
keys = dict((key, list(zip(*group))[1])
	   for key, group in itertools.groupby(results, key = lambda x: x[0]))

Another simple example ...

# convert to a dictionary and handle key collisions

a = [
('a', 1),
('b', 5),
('c', 7),
('a', 2),
('a', 3)
]

d = {}
[d.setdefault(k, []).append(v) for k, v in a]

print(d)  # {'a': [1, 2, 3], 'c': [7], 'b': [5]}
Comments
Just what I was looking for
This question has already been answered. Start a new discussion instead.