Hi I am busy working with biological sequences, but I am having some problems to finilize my script

Let's say that I want to compare two dna sequences i.e
seq1='ATGGAGGCAATGGCGGCCAGCACTTCCCTGCCTGACCCTGGAGACTTTGACCGGAACGTG
CCCCGGATCTGTGGGGTGTGTGGAGACCGAGCCACTGGCTTTCACTTCAATGCTATGACC
TGTGAAGGCTGCAAAGGCTTCTTCAGGCGAAGCATGAAGCGGAAGGCACTATTCACCTGC
CCCTTCAACGGGGACTGCCGCATCACCAAGGACAACCGACGCCACTGCCAGGCCTGCCGG
CTCAAACGCTGTGTGGACATCGGCATGATGAAGGAGTTCATTCTGACAGATGAGGAAGTG'
seq2='ATGGAGGCAATGGCGGCCAGCACTTCCCTGCCTGACCCTGGAGACTTTGACCGGAATGTG
CCCCGGATCTGTGGGGTGTGTGGAGACCGAGCCACTGGCTTTCACTTCAATGCTATGACC
TGTGAAGGCTGCAAAGGCTTCTTCAGGCGAAGCATGAAGCGGAAGGCACTATTCACCTGC
CCCTTCAATGGGGACTGCCGCATCACCAAGGACAACCGGCGCCACTGCCAGGCCTGCCGG
CTCAAACGCTGTGTGGACATCGGCATGATGAAGGAGTTCATCCTGACAGATGAGGAAGTG
CAGAGGAAGCGGGAGATGATCCTGAAGCGGAAGGAGGAGGAGGCCTTGAAGGACAGTCTG
CGGCCCAAGCTGTCTGAGGAGCAGCAGCGCATCATTGCCATACTGCTGGACGCCCACCAT
AAGACCTACGACCCCACCTACTCCGACTTCTGCCAGTTCCGGCCTCCAGTTCGTGTGAAT
GATGGTGGAGGGAGCCATCCTTCCAGGCCCAACTCCAGACACACTCCCAGCTTCTCTGGG
GACTCCTCCTCCTCCTGCTCAGATCACTGTATCACCTCTTCAGACATGATGGAC---TCG'

Thus my aim is to identify variations between those two sequences in term of codons, and the position in which the changed codon falls. Then I want to know the amino acid product related to that codon. Any helps?
this is what I done so far but I cannot see exactly want I want:

def sequence_compare(seq_a, seq_b):
len1= len(seq_a)
len2= len(seq_b)
mismatches = []
for pos in range (0,min(len1,len2)) :
if seq_a[pos] != seq_b[pos]:
mismatches.append('|')
else:
mismatches.append(' ')
print (seq_a)
print (mismatches)
print (seg_b)
sequence_compare(seq_a,seq_b)

this gives me just the position of the mismatches, but if i want to know the related codon?

Basically I would like to see something like: ATC-ATT and the related amino acid changed.
does anyone help me in this, any assuggestion please?

I do not know how to estimate your effort, but here it goes:

diff = [(ind, s1, s2) for ind, (s1, s2) in enumerate(zip(seq1, seq2)) if s1 != s2]
for info in diff:
	place = info[0] // 3 * 3
	print('Difference at %3i triple starts from %3i: %r %r' % (info[0], place, seq1[place:place+3], seq2[place:place+3]))

Edited 5 Years Ago by pyTony: n/a

Heya the problem has been solved. Thanks very much for the help py Tony!!!
But I still would like to know if there is any other option in regard to what I done so far. I am new in python, and I am trying to understand if iam right or wrong

Thanks

You have not CODE tags, so it is not easy to comment about your code. Looks likely that you could use it modified to collect indexes of differences in list instead of my first list comprehension (differing letters were not actually used, so they are actually needed to save, easy to get when you know index). Also you could go in loop in steps of three and compare slices of 3 between sequences. Then you could store that info instead of single differing letters like I did, and last loop would not be needed.

Edited 3 Years Ago by happygeek: fixed formatting

This article has been dead for over six months. Start a new discussion instead.