hey, thanks to all of them who helps me in learning this language,
again there is one text file
file 1.txt

>sp|P81928[/B]|140U_DROME

67 198 Tim17 8.9e-19 No_clan

>sp|P20905|5HT1R_DROME

179 507 7tm_1 1.1e-97 CL0192

>sp|P28285|5HT2A_DROME

243 805 7tm_1 3.2e-73 CL0192

>sp|P28286|5HT2B_DROME

107 588 7tm_1 7.2e-82 CL0192


* here the number represents the start and ending of subsequence which has to be extracted.

the next file is sequence file2.txt

>sp|P81928|140U_DROME RPII140-upstream gene protein OS=Drosophila melanogaster GN=140up PE=2 SV=2
MNFLWKGRRFLIAGILPTFEGAADEIVDKENKTYKAFLASKPPEETGLERLKQMFTIDEF
GSISSELNSVYQAGFLGFLIGAIYGGVTQSRVAYMNFMENNQATAFKSHFDAKKKLQDQF
TVNFAKGGFKWGWRVGLFTTSYFGIITCMSVYRGKSSIYEYLAAGSITGSLYKVSLGLRG
MAAGGIIGGFLGGVAGVTSLLLMKASGTSMEEVRYWQYKWRLDRDENIQQAFKKLTEDEN
PELFKAHDEKTSEHVSLDTIK
>sp|P20905|5HT1R_DROME 5-hydroxytryptamine receptor 1 OS=Drosophila melanogaster GN=5-HT7 PE=2 SV=1
MALSGQDWRRHQSHRQHRNHRTQGNHQKLISTATLTLFVLFLSSWIAYAAGKATVPAPLV
EGETESATSQDFNSSSAFLGAIASASSTGSGSGSGSGSGSGSGSGSYGLASMNSSPIAIV
SYQGITSSNLGDSNTTLVPLSDTPLLLEEFAAGEFVLPPLTSIFVSIVLLIVILGTVVGN
VLVCIAVCMVRKLRRPCNYLLVSLALSDLCVALLVMPMALLYEVLEKWNFGPLLCDIWVS
FDVLCCTASILNLCAISVDRYLAITKPLEYGVKRTPRRMMLCVGIVWLAAACISLPPLLI
LGNEHEDEEGQPICTVCQNFAYQIYATLGSFYIPLSVMLFVYYQIFRAARRIVLEEKRAQ
THLQQALNGTGSPSAPQAPPLGHTELASSGNGQRHSSVGNTSLTYSTCGGLSSGGGALAG
HGSGGGVSGSTGLLGSPHHKKLRFQLAKEKKASTTLGIIMSAFTVCWLPFFILALIRPFE
TMHVPASLSSLFLWLGYANSLLNPIIYATLNRDFRKPFQEILYFRCSSLNTMMRENYYQD
QYGEPPSQRVMLGDERHGARESFL
>sp|P28285|5HT2A_DROME 5-hydroxytryptamine receptor 2A OS=Drosophila melanogaster GN=5-HT1A PE=2 SV=2
MAHETSFNDALDYIYIANSMNDRAFLIAEPHPEQPNVDGQDQDDAELEELDDMAVTDDGQ
LEDTNNNNNSKRYYSSGKRRADFIGSLALKPPPTDVNTTTTTAGSPLATAALAAAAASAS
VAAAAARITAKAAHRALTTKQDATSSPASSPALQLIDMDNNYTNVAVGLGAMLLNDTLLL
EGNDSSLFGEMLANRSGQLDLINGTGGLNVTTSKVAEDDFTQLLRMAVTSVLLGLMILVT
IIGNVFVIAAIILERNLQNVANYLVASLAVADLFVACLVMPLGAVYEISQGWILGPELCD
IWTSCDVLCCTASILHLVAIAVDRYWAVTNIDYIHSRTSNRVFMMIFCVWTAAVIVSLAP
QFGWKDPDYLQRIEQQKCMVSQDVSYQVFATCCTFYVPLLVILALYWKIYQTARKRIHRR
RPRPVDAAVNNNQPDGGAATDTKLHRLRLRLGRFSTAKSKTGSAVGVSGPASGGRALGLV
DGNSTNTVNTVEDTEFSSSNVDSKSRAGVEAPSTSGNQIATVSHLVALAKQQGKSTAKSS
AAVNGMAPSGRQEDDGQRPEHGEQEDREELEDQDEQVGPQPTTATSATTAAGTNESEDQC
KANGVEVLEDPQLQQQLEQVQQLQKSVKSGGGGGASTSNATTITSISALSPQTPTSQGVG
IAAAAAGPMTAKTSTLTSCNQSHPLCGTANESPSTPEPRSRQPTTPQQQPHQQAHQQQQQ
QQQLSSIANPMQKVNKRKETLEAKRERKAAKTLAIITGAFVVCWLPFFVMALTMPLCAAC
QISDSVASLFLWLGYFNSTLNPVIYTIFSPEFRQAFKRILFGGHRPVHYRSGKL


i want to extract the subsequence from this sequences with respect to the proteins id

Recommended Answers

All 6 Replies

I see you are working with Drosophila genetics.
I'm not sure what you are trying to do.. can you explain it further?

wow that's some huge geek cred for knowing what that is.

Anyhow i don't exactly know what you want either so yeah.

For example that first line of first file looks different format (bold finishing tag before but no starting tag), otherwise, is it so that yo want to pick identifier between > and | psoition 1:9, (>sp|P20905| > id = sp|P20905) and when it is found take start and end indexes of two lines down of start and end index (0 or 1 based?)

Something like this for file1:

inp=open('file1.txt').read()

sep='>'
data= []
while sep:
    part,sep, inp = inp.partition(sep)
    if sep and part: data.append(part.strip().split('\n\n'))

idend = len('sp|P81928')-1
info = [( id[:idend],)+tuple(loc.split(' ',2)[:2])
        for id,loc in data if id.startswith('s')
         ]
print info
info = [(a, int(b), int (c)) for a,b,c in info]
print info

no it not like that sorry ,

i just want that from given sequence of lengh 400 , i have two cut the sequence ranging from one index to another index

If those where not right ids and indexes, I am afraid I can not help you.

wow that's some huge geek cred for knowing what that is.

Anyhow i don't exactly know what you want either so yeah.

Heh, I'm great at Biology (got first place in a state competition), so Drosophila melanogaster is very familiar to me. It's a fruitfly. I've helped raise them before.

Anyways, OP, can you provide us with some kind of desired output, so that we know exactly what you want?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.