I have this sequence in the text file

>sp|P20905|5HT1R_DROME 5-hydroxytryptamine receptor 1 OS=Drosophila melanogaster GN=5-HT7 PE=2 SV=1
MALSGQDWRRHQSHRQHRNHRTQGNHQKLISTATLTLFVLFLSSWIAYAAGKATVPAPLV
EGETESATSQDFNSSSAFLGAIASASSTGSGSGSGSGSGSGSGSGSYGLASMNSSPIAIV
SYQGITSSNLGDSNTTLVPLSDTPLLLEEFAAGEFVLPPLTSIFVSIVLLIVILGTVVGN
VLVCIAVCMVRKLRRPCNYLLVSLALSDLCVALLVMPMALLYEVLEKWNFGPLLCDIWVS
FDVLCCTASILNLCAISVDRYLAITKPLEYGVKRTPRRMMLCVGIVWLAAACISLPPLLI
LGNEHEDEEGQPICTVCQNFAYQIYATLGSFYIPLSVMLFVYYQIFRAARRIVLEEKRAQ
THLQQALNGTGSPSAPQAPPLGHTELASSGNGQRHSSVGNTSLTYSTCGGLSSGGGALAG
HGSGGGVSGSTGLLGSPHHKKLRFQLAKEKKASTTLGIIMSAFTVCWLPFFILALIRPFE
TMHVPASLSSLFLWLGYANSLLNPIIYATLNRDFRKPFQEILYFRCSSLNTMMRENYYQD
QYGEPPSQRVMLGDERHGARESFLD

I want to split this into list as

','MALSGQDWRRHQSHRQHRNHRTQGNHQKLISTATLTLFVLFLSSWIAYAAGKATVPAPLV
EGETESATSQDFNSSSAFLGAIASASSTGSGSGSGSGSGSGSGSGSYGLASMNSSPIAIV
SYQGITSSNLGDSNTTLVPLSDTPLLLEEFAAGEFVLPPLTSIFVSIVLLIVILGTVVGN
VLVCIAVCMVRKLRRPCNYLLVSLALSDLCVALLVMPMALLYEVLEKWNFGPLLCDIWVS
FDVLCCTASILNLCAISVDRYLAITKPLEYGVKRTPRRMMLCVGIVWLAAACISLPPLLI
LGNEHEDEEGQPICTVCQNFAYQIYATLGSFYIPLSVMLFVYYQIFRAARRIVLEEKRAQ
THLQQALNGTGSPSAPQAPPLGHTELASSGNGQRHSSVGNTSLTYSTCGGLSSGGGALAG
HGSGGGVSGSTGLLGSPHHKKLRFQLAKEKKASTTLGIIMSAFTVCWLPFFILALIRPFE
TMHVPASLSSLFLWLGYANSLLNPIIYATLNRDFRKPFQEILYFRCSSLNTMMRENYYQD
QYGEPPSQRVMLGDERHGARESFLD']

Recommended Answers

All 3 Replies

If the input value is in variable info:

>>> info.split('\n',1)
['>sp|P20905|5HT1R_DROME 5-hydroxytryptamine receptor 1 OS=Drosophila melanogaster GN=5-HT7 PE=2 SV=1', 'MALSGQDWRRHQSHRQHRNHRTQGNHQKLISTATLTLFVLFLSSWIAYAAGKATVPAPLV\nEGETESATSQDFNSSSAFLGAIASASSTGSGSGSGSGSGSGSGSGSYGLASMNSSPIAIV\nSYQGITSSNLGDSNTTLVPLSDTPLLLEEFAAGEFVLPPLTSIFVSIVLLIVILGTVVGN\nVLVCIAVCMVRKLRRPCNYLLVSLALSDLCVALLVMPMALLYEVLEKWNFGPLLCDIWVS\nFDVLCCTASILNLCAISVDRYLAITKPLEYGVKRTPRRMMLCVGIVWLAAACISLPPLLI\nLGNEHEDEEGQPICTVCQNFAYQIYATLGSFYIPLSVMLFVYYQIFRAARRIVLEEKRAQ\nTHLQQALNGTGSPSAPQAPPLGHTELASSGNGQRHSSVGNTSLTYSTCGGLSSGGGALAG\nHGSGGGVSGSTGLLGSPHHKKLRFQLAKEKKASTTLGIIMSAFTVCWLPFFILALIRPFE\nTMHVPASLSSLFLWLGYANSLLNPIIYATLNRDFRKPFQEILYFRCSSLNTMMRENYYQD\nQYGEPPSQRVMLGDERHGARESFLD']
>>

You have to find some unique combination that you can split on. If SV= is always the final string then you can split on that. You could also split on finding X number of capital letters in a row.

## an easy way to create the string from the post
seq = ["sp|P20905|5HT1R_DROME 5-hydroxytryptamine receptor 1", 
"OS=Drosophila melanogaster GN=5-HT7 PE=2 SV=1 ",
"MALSGQDWRRHQSHRQHRNHRTQGNHQKLISTATLTLFVLFLSSWIAYAAGKATVPAPLV",
"EGETESATSQDFNSSSAFLGAIASASSTGSGSGSGSGSGSGSGSGSYGLASMNSSPIAIV",
"SYQGITSSNLGDSNTTLVPLSDTPLLLEEFAAGEFVLPPLTSIFVSIVLLIVILGTVVGN",
"VLVCIAVCMVRKLRRPCNYLLVSLALSDLCVALLVMPMALLYEVLEKWNFGPLLCDIWVS",
"FDVLCCTASILNLCAISVDRYLAITKPLEYGVKRTPRRMMLCVGIVWLAAACISLPPLLI",
"LGNEHEDEEGQPICTVCQNFAYQIYATLGSFYIPLSVMLFVYYQIFRAARRIVLEEKRAQ",
"THLQQALNGTGSPSAPQAPPLGHTELASSGNGQRHSSVGNTSLTYSTCGGLSSGGGALAG",
"HGSGGGVSGSTGLLGSPHHKKLRFQLAKEKKASTTLGIIMSAFTVCWLPFFILALIRPFE",
"TMHVPASLSSLFLWLGYANSLLNPIIYATLNRDFRKPFQEILYFRCSSLNTMMRENYYQD",
"QYGEPPSQRVMLGDERHGARESFLD"]

seq_str = "".join(seq)
result = seq_str.find("SV=")
if result > -1:     ## 'SV=' found 
    result += 5     ## skip over SV=
    print seq_str[:result]
    print
    print seq_str[result:]

thanks alot for the help ,

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.