0

Hi all,

I have text file as follows...

>s1
MPPRRSIVEVKVLDVQKRRVPNKHYVYIIRVTWSSGATEAIYRRYSKFFDLQMQMLDKFP MEGGQKDPKQRIIPFLPGKILFRRSHIRDVAVKRLIPIDEYCKALIQLPPYISQCDEVLQ FFETRPEDLNPPKEEHIGKKKSGNDPTSVDPMVLEQYVVVADYQKQESSEISLSVGQVVD

>s2
MAEVRKFTKRLSKPGTAAELRQSVSEAVRGSVVLEKAKLVEPLDYENVITQRKTQIYSDP LRDLLMFPMEDISISVIGRQRRTVQSTVPEDAEKRAQSLFVKECIKTYSTDWHVVNYKYE DFSGDFRMLPCKSLRPEKIPNHVFEIDEDCEKDEDSSSLCSQKGGVIKQGWLHKANVNST

. . .

I wanted to count letter 'P' in each sequences output should be

> s1:10

> s2:20

To acheive this python script as follows

infile=open("file1.txt",'r')

out=open("file2.csv",'w')

for line in infile:

line = line.strip("\n")

if line.startswith('>'):

name=line

else:

pattern = line.count('P')

print '%s:%s' %(name,pattern)

out.write('%s:%s\n' %(name,pattern))

it reads line by line and gives result as follows

> s1:2

> s1:3

> s1:5

> s2:10

> s2:10

But i would like to have out put as follows

> s1:10

> s2:20 . . .

Can any body help how to do this...

Thanks in Advance Ni

3
Contributors
2
Replies
3
Views
5 Years
Discussion Span
Last Post by woooee
Featured Replies
  • 1

    The easiest to understand is to store the previous record, and find the number for the record previous to the ">" record. [code]seq=""">s1\n MPPRRSIVEVKVLDVQKRRVPNKHYVYIIRVTWSSGATEAIYRRYSKFFDLQMQMLDKFP MEGGQKDPKQRIIPFLPGKILFRRSHIRDVAVKRLIPIDEYCKALIQLPPYISQCDEVLQ FFETRPEDLNPPKEEHIGKKKSGNDPTSVDPMVLEQYVVVADYQKQESSEISLSVGQVVD\n >s2\n MAEVRKFTKRLSKPGTAAELRQSVSEAVRGSVVLEKAKLVEPLDYENVITQRKTQIYSDP LRDLLMFPMEDISISVIGRQRRTVQSTVPEDAEKRAQSLFVKECIKTYSTDWHVVNYKYE DFSGDFRMLPCKSLRPEKIPNHVFEIDEDCEKDEDSSSLCSQKGGV\n""" split_seq = seq.split("\n") name = "" pattern = 0 previous_rec = "" for line in split_seq: line = line.strip() if … Read More

0
>>> f1
'>s1
MPPRRSIVEVKVLDVQKRRVPNKHYVYIIRVTWSSGATEAIYRRYSKFFDLQMQMLDKFP MEGGQKDPKQRIIPFLPGKILFRRSHIRDVAVKRLIPIDEYCKALIQLPPYISQCDEVLQ FFETRPEDLNPPKEEHIGKKKSGNDPTSVDPMVLEQYVVVADYQKQESSEISLSVGQVVD'
>>> name, data = f1.split(None, 1)
>>> print name[1:], ':', data.count('P')
s1 : 15

Edited by pyTony: n/a

1

The easiest to understand is to store the previous record, and find the number for the record previous to the ">" record.

seq=""">s1\n
 MPPRRSIVEVKVLDVQKRRVPNKHYVYIIRVTWSSGATEAIYRRYSKFFDLQMQMLDKFP MEGGQKDPKQRIIPFLPGKILFRRSHIRDVAVKRLIPIDEYCKALIQLPPYISQCDEVLQ FFETRPEDLNPPKEEHIGKKKSGNDPTSVDPMVLEQYVVVADYQKQESSEISLSVGQVVD\n

 >s2\n
 MAEVRKFTKRLSKPGTAAELRQSVSEAVRGSVVLEKAKLVEPLDYENVITQRKTQIYSDP LRDLLMFPMEDISISVIGRQRRTVQSTVPEDAEKRAQSLFVKECIKTYSTDWHVVNYKYE DFSGDFRMLPCKSLRPEKIPNHVFEIDEDCEKDEDSSSLCSQKGGV\n"""

split_seq = seq.split("\n")
name = ""
pattern = 0
previous_rec = ""
for line in split_seq:
    line = line.strip()
    if line.startswith('>'):     ## print previous name and use previous record
        if len(name):            ## first ">" won't have a previous
            pattern = previous_rec.count('P') 
            print '%s:%s' %(name,pattern)
        name=line
    else:
        if len(line):            ## skip any empty lines
            previous_rec = line

if len(name):
    pattern = previous_rec.count('P') 
    print '%s:%s' %(name,pattern)

Edited by woooee: n/a

This article has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.