Finding inverted repeat pattern from a FASTA sequence using python

Question

sudipta.mml 0 Newbie Poster

12 Years Ago

Suppose my long sequence looks like, 5’-AGGGTTTCCCTGACCTTCACTGCAGGTCATGCA-3 The two italics subsequences (here within the two stars) in this long sequence are combinedly called as inverted repeat pattern. The length and the combination of the four letters such as A,T,G,C in those two subsequences will be varying. But there is a relation between these two subsequence. Notice that, when you consider the first subsequence then its complementary subsequence is ACTGGA (according to A combines with T and G combine with C) and when you invert this complementary subsequence (i,e last letter comes first) then it matches with the second subsequence. There are large no of such patterns are present in a FASTA sequence (contains 10 million ATGC letters ) and I want to find such pattern and their start and end position. Could anyone help me in this regard.

python

2 Contributors
4 Replies
471 Views
1 Day Discussion Span
Latest Post 12 Years Ago Latest Post by TrustyTony

All 4 Replies

TrustyTony 888 ex-Moderator

12 Years Ago

After bit googling around, this looks like nice, medium taugh scientific paper digging deep in subject: http://www.cse.msu.edu/~cse891/Sect001/notes_alignment.pdf

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

sudipta.mml 0 Newbie Poster · Answer 1 · 2013-01-14T05:29:22+00:00

when rumning this script, I am getting this error

Traceback (most recent call last):
File "irc.py", line 12, in <module>
print list(ivp('AGGGTTTCCCTGACCTTCACTGCAGGTCATGCA', 6, 6))
File "irc.py", line 10, in ivp
if sub.translate(mapping)[::-1] in s :
TypeError: expected a character buffer object

Can anyone rectify me?

def substrings(s, lmin, lmax):
    for i in range(len(s)):
        for l in range(lmin, lmax+1):
            subst = s[i:i+l]
            if len(subst) == l:
                yield i, l, subst
def ivp(s, lmin, lmax):
    mapping = {'A': 'T', 'G': 'C', 'T': 'A', 'C': 'G'}
    for i, l, sub in substrings(s, lmin, lmax):
        if sub.translate(mapping)[::-1] in s :
            yield i, l, sub
print list(ivp('AGGGTTTCCCTGACCTTCACTGCAGGTCATGCA', 6, 6))

sudipta.mml 0 Newbie Poster · Answer 2 · 2013-01-14T05:42:34+00:00

@pyTony, the http://www.cse.msu.edu/~cse891/Sect001/notes_alignment.pdf link is not opening

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 3 · 2013-01-14T11:18:50+00:00

It was not long ago when I googled it, but the link seems to be dead now. Maybe you could check http://sols.unlv.edu/Schulte/BIO480/SequenceAlign.pdf

Finding inverted repeat pattern from a FASTA sequence using python

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers