Hi all,

I have text file as follows...

>s1
MPPRRSIVEVKVLDVQKRRVPNKHYVYIIRVTWSSGATEAIYRRYSKFFDLQMQMLDKFP MEGGQKDPKQRIIPFLPGKILFRRSHIRDVAVKRLIPIDEYCKALIQLPPYISQCDEVLQ FFETRPEDLNPPKEEHIGKKKSGNDPTSVDPMVLEQYVVVADYQKQESSEISLSVGQVVD

>s2
MAEVRKFTKRLSKPGTAAELRQSVSEAVRGSVVLEKAKLVEPLDYENVITQRKTQIYSDP LRDLLMFPMEDISISVIGRQRRTVQSTVPEDAEKRAQSLFVKECIKTYSTDWHVVNYKYE DFSGDFRMLPCKSLRPEKIPNHVFEIDEDCEKDEDSSSLCSQKGGVIKQGWLHKANVNST

. . .

I wanted to count letter 'P' in each sequences output should be

> s1:10

> s2:20

To acheive this python script as follows

infile=open("file1.txt",'r')

out=open("file2.csv",'w')

for line in infile:

line = line.strip("\n")

if line.startswith('>'):

name=line

else:

pattern = line.count('P')

print '%s:%s' %(name,pattern)

out.write('%s:%s\n' %(name,pattern))

it reads line by line and gives result as follows

> s1:2

> s1:3

> s1:5

> s2:10

> s2:10

But i would like to have out put as follows

> s1:10

> s2:20 . . .

Can any body help how to do this...

Thanks in Advance Ni

>>> f1
'>s1
MPPRRSIVEVKVLDVQKRRVPNKHYVYIIRVTWSSGATEAIYRRYSKFFDLQMQMLDKFP MEGGQKDPKQRIIPFLPGKILFRRSHIRDVAVKRLIPIDEYCKALIQLPPYISQCDEVLQ FFETRPEDLNPPKEEHIGKKKSGNDPTSVDPMVLEQYVVVADYQKQESSEISLSVGQVVD'
>>> name, data = f1.split(None, 1)
>>> print name[1:], ':', data.count('P')
s1 : 15

Edited 5 Years Ago by pyTony: n/a

The easiest to understand is to store the previous record, and find the number for the record previous to the ">" record.

seq=""">s1\n
 MPPRRSIVEVKVLDVQKRRVPNKHYVYIIRVTWSSGATEAIYRRYSKFFDLQMQMLDKFP MEGGQKDPKQRIIPFLPGKILFRRSHIRDVAVKRLIPIDEYCKALIQLPPYISQCDEVLQ FFETRPEDLNPPKEEHIGKKKSGNDPTSVDPMVLEQYVVVADYQKQESSEISLSVGQVVD\n

 >s2\n
 MAEVRKFTKRLSKPGTAAELRQSVSEAVRGSVVLEKAKLVEPLDYENVITQRKTQIYSDP LRDLLMFPMEDISISVIGRQRRTVQSTVPEDAEKRAQSLFVKECIKTYSTDWHVVNYKYE DFSGDFRMLPCKSLRPEKIPNHVFEIDEDCEKDEDSSSLCSQKGGV\n"""

split_seq = seq.split("\n")
name = ""
pattern = 0
previous_rec = ""
for line in split_seq:
    line = line.strip()
    if line.startswith('>'):     ## print previous name and use previous record
        if len(name):            ## first ">" won't have a previous
            pattern = previous_rec.count('P') 
            print '%s:%s' %(name,pattern)
        name=line
    else:
        if len(line):            ## skip any empty lines
            previous_rec = line

if len(name):
    pattern = previous_rec.count('P') 
    print '%s:%s' %(name,pattern)

Edited 5 Years Ago by woooee: n/a

This article has been dead for over six months. Start a new discussion instead.