Hello everybody!

As I thought I became more comfortable with data structure I've got the problem I can't solve. It looks so:

I have that kind of text file:

ABCD vvvv 1e-12
ABCD hhhh 1e-6
ABCD ggggg 1e-3
ASDE ffffff 1e-57
ASDE dddd 0.001

I would like to read (and write into another file) only the lines with the lowest number in the third column according to the parameter in the first column. The result should look like this:

ABCD vvvv 1e-12
ASDE ffffff 1e-57

I would be grateful for any tip you will give me. Thank you!

So does my script look like:

data = open('list','r')
lista = data.readlines()
list = lista.reverse()
for i in range(len(list)-1):
....if list[0:3] != list[i+1][0:3]:
........print list[0:-1]

Actually I have problem with reverse() function (it gives me an empty list), others work fine beacuse the list file is already sorted as it has to be. Please, suggest other solution if this one isn't fine.


Use a dictionary with the key="ABCD". If the key is found in the dictionary compare the values. If the current rec has a lower value, replace the rec for that dictionary item.

Woooee, thanks but I think I'll need some more help. I am not pretty familiar with dictionaries.

woooee is on the right track. I broke out a sample list into distinct steps, so you can figure out how to do this ...

# raw data test list from file via readlines()
q = ['A aa 9\n', 'A ac 3\n', 'B ff 4\n',
'B vv 1\n', 'C hh 5\n', 'A qq 3\n', 'C dd 8\n']

# create a list of lists
qq = []
for item in q:

print qq

print '-'*60

# create a dictionary, multiple values are in a list
d = {}
for item in qq:
    d.setdefault(item[0], []).append((float(item[2]), item[1]))

print d

print '-'*60

# modify the dictionary so the values are sorted
dd = {}
for k, v in d.items():
    v = sorted(v)
    # pick the lowest value
    dd[k] = v[0]

print dd

print '-'*60

# reform the list of the lowest values
sp = ' '
z = []
for k, v in dd.items():
    z.append(k + sp + v[1] + sp + str(v[0]))

print z

my output -->
[['A', 'aa', '9'], ['A', 'ac', '3'], ['B', 'ff', '4'], ['B', 'vv', '1'], ['C', 'hh', '5'], ['A', 'qq', '3'], ['C', 'dd', '8']]
{'A': [(9.0, 'aa'), (3.0, 'ac'), (3.0, 'qq')], 'C': [(5.0, 'hh'), (8.0, 'dd')], 'B': [(4.0, 'ff'), (1.0, 'vv')]}
{'A': (3.0, 'ac'), 'C': (5.0, 'hh'), 'B': (1.0, 'vv')}
['A ac 3.0', 'C hh 5.0', 'B vv 1.0']

Here is an example using woooee's suggestion:

s = '''ABCD vvvv 1e-12
ABCD hhhh 1e-6
ABCD ggggg 1e-3
ASDE ffffff 1e-57
ASDE dddd 0.001'''

dd = {}
for item in s.split('\n'):
    itemList = item.split()
    if itemList[0] in dd:
        if float(dd[itemList[0]][1]) > float(itemList[2]):
            dd[itemList[0]] = itemList[1:]
        dd[itemList[0]] = itemList[1:]

for key in dd:
    print '%s %s' % (key, ' '.join(dd[key]))

>>> ABCD vvvv 1e-12
ASDE ffffff 1e-57

Here is a Python 2.5 version that uses groupby and generator comprehensions.

I guess its not much use if you haven't got to them in your Python studies as yet but here goes:

>>> from StringIO import StringIO
>>> from itertools import groupby
>>> # Dummy file
>>> instring = '''ABCD vvvv 1e-12
... ABCD hhhh 1e-6
... ABCD ggggg 1e-3
... ASDE ffffff 1e-57
... ASDE dddd 0.001'''; infile = StringIO(instring)
>>> # Assumes already sorted on first field
>>> print '\n'.join(
... ' '.join(
...         max(grp[1], key=lambda x: float(x[2]))
...        ) 
...        for grp in groupby(
...                         (line.strip().split() for line in infile), 
...                         key=lambda x: x[0]
...                        ) 
...        )
ABCD ggggg 1e-3
ASDE dddd 0.001
  • line 1: StrigIO just makes a text string look like a file - good for examples.
  • line 2: groupby is used to form sub-iterators, one for each value in column 0 of your input.

    The nested generator comprehension is best explained from the innermost levels outwards:

  • line 15: Inputs a line from the file, strips() leading & trailing blanks, then splits() into fields.
  • line 14: forms sub-iterators grouped on the key (line16), which is a function returning the leftmost column.
  • line 12: For each sub-iterator (the value grp[1] returned from groupby), return just the maximum, where the maximum is extracted from the field given by the key function in line 12, i.e. the floating point representation of the string from the third column (index is 2).
  • line 11: max gives a list of split fields. the join turns this back into a line of space separated words, one for each group.
  • line 10: prints the maximal lines using a newline to separate each value.

Thinking about it, if you have trouble with dicts then this only serves to keep my mind working as I sit at home with a summer cold. Yesterday I wouldn't have been able to focus, so maybe I'm on the mend!

Have fun, :)

- Paddy.

This question has already been answered. Start a new discussion instead.