ENSTRUG00000000009      ENSTRUT00000000011      1026    509     5896
ENSTRUG00000000011      ENSTRUT00000000014      420     63      482
ENSTRUG00000000012      ENSTRUT00000000015      10902   15313   93157
ENSTRUG00000000012      ENSTRUT00000000016      2844    23243   60985

as this is my input file it has five coloumns and there is for each line , we have to identify unique entry from the first coloumns which high value for the third coloumns like

ENSTRUG00000000009      ENSTRUT00000000012      1026    1503    6379
ENSTRUG00000000011      ENSTRUT00000000014      420     63      482
ENSTRUG00000000012      ENSTRUT00000000015      10902   15313   93157

my code is

from sys import *
import operator
file = open(argv[1],'r')
outfile = open(argv[2],'w')
buffer = []
gene = ''
cds = {}
rec = file.readlines()
for line in rec :
        field = line.split()
        if (gene != field[0]):
                header = field[0]#header is the variable caries the values
                print header,
                #outfile.writelines(header+"\t")
                gene = field[0]
                transcript = field[1]
                #print transcript
        cds[field[1]]=field[2]
        #print cds
        protein = max(cds.iteritems(), key=operator.itemgetter(1))[0]
        print protein

Edited 6 Years Ago by parijat24: n/a

I do not understand the first line of your result:

import sys
with open(sys.argv[1] if sys.argv[1:] else 'test.txt','r') as infile:
    with open(sys.argv[2] if sys.argv[1:] else 'test_out.txt','w') as outfile:
        rec = (line.split(None, 1) for line in sorted(infile, key=lambda x:int(x[47:55])))
        result = dict(rec)
        for key,item in sorted(result.items()):
            line = "%s      %s" % (key, item)
            print line,
            outfile.write(line)
"""Output:
ENSTRUG00000000009      ENSTRUT00000000011      1026    509     5896
ENSTRUG00000000011      ENSTRUT00000000014      420     63      482
ENSTRUG00000000012      ENSTRUT00000000015      10902   15313   93157
"""

Edited 6 Years Ago by pyTony: n/a

This article has been dead for over six months. Start a new discussion instead.