Perl program to print out 10 characters of a start codon

Question

Anthony Cameron -2 Light Poster

14 Years Ago

Hello, I would like to know if I need to use a regular expression to match the desired substring in order to print out 10 characters of the start codon ATG.

My dna sequence is "CATAGAGATA"

Thanks for any advice.

perl

3 Contributors
9 Replies
342 Views
23 Hours Discussion Span
Latest Post 14 Years Ago Latest Post by Anthony Cameron

All 9 Replies

mitchems 12 Posting Whiz in Training

14 Years Ago

You need to provide more information in order for us (or me anyway) to create such a regex. What does your data look like?

mitchems 12 Posting Whiz in Training

14 Years Ago

When you say "upstream", you mean BEFORE the ATG, correct?

mitchems 12 Posting Whiz in Training

14 Years Ago

Here's some code that does that, if the ATG occurs once:

use strict;
use warnings;
my $seq="CCCCATAGAGATAGAGATAGAGAACCCCGCGCGCTCGCATGGGG";
my $term="ATG";
$_=$seq;
/(\D{20})$term/;
print "$1\n";

Output:

AGAGAACCCCGCGCGCTCGC

d5e5 commented: Looks good to me. +2

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

d5e5 109 Master Poster · Answer 1 · 2010-12-01T21:38:06+00:00

Hello, I would like to know if I need to use a regular expression to match the desired substring in order to print out 10 characters of the start codon ATG.
My dna sequence is "CATAGAGATA"
Thanks for any advice.

I don't think I understand the question. Your dna sequence consists of 10 characters and you want to print out 10 characters starting with the substring 'ATG'? I don't see any occurrence of the substring 'ATG' in your sequence. Can we shuffle the dna sequence until it contains (or starts with?) 'ATG'? Please tell us how you would determine the output without using a program and then maybe we can advise how to write a program that does it.

For example, does the following do what you want?

#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw(shuffle); #This module includes a method to shuffle arrays.

my $str = "CATAGAGATA";
my @arr;

while (1){
    @arr = $str =~ m/[AGCT]/g; #Convert string into array of single letters
    @arr = shuffle(@arr); #Shuffle the letters of the array randomly
    last if @arr[0,1,2] = qw(A T G)# Exit loop if first 3 elements = start codon
}

print "Shuffled sequence is:\n";
print join('', @arr), "\n";

This outputs:

Shuffled sequence is:
ATGAGTCTAA

Anthony Cameron -2 Light Poster · Answer 2 · 2010-12-01T23:55:10+00:00

I don't think I understand the question. Your dna sequence consists of 10 characters and you want to print out 10 characters starting with the substring 'ATG'? I don't see any occurrence of the substring 'ATG' in your sequence. Can we shuffle the dna sequence until it contains (or starts with?) 'ATG'? Please tell us how you would determine the output without using a program and then maybe we can advise how to write a program that does it.
For example, does the following do what you want?
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw(shuffle); #This module includes a method to shuffle arrays.

my $str = "CATAGAGATA";
my @arr;

while (1){
    @arr = $str =~ m/[AGCT]/g; #Convert string into array of single letters
    @arr = shuffle(@arr); #Shuffle the letters of the array randomly
    last if @arr[0,1,2] = qw(A T G)# Exit loop if first 3 elements = start codon
}

print "Shuffled sequence is:\n";
print join('', @arr), "\n";
This outputs:
Shuffled sequence is:
ATGAGTCTAA

Thank you for your response and I missed some characters. The DNA sequence is "CCCCATAGAG". I am supposed to print out the 10 characters upstream of the start codon ATG. I think that my output should provide me with the 10 bases upstream of ATG. I don't even know where to start because the question confuses me. Thank you

d5e5 109 Master Poster · Answer 3 · 2010-12-02T01:15:10+00:00

The question confuses me too. I still see only 10 bases and I don't see any 'ATG' in the sequence. Whoever gave you this question may have made a mistake.

Anthony Cameron -2 Light Poster · Answer 4 · 2010-12-02T01:23:26+00:00

The question confuses me too. I still see only 10 bases and I don't see any 'ATG' in the sequence. Whoever gave you this question may have made a mistake.

You were right I checked and it was a mistake. The dna sequence is: "CCCCATAGAGATAGAGATAGAGAACCCCGCGCGCTCGCATGGGG"; and I need to print out the 20 characters upstream of the start codon ATG.

Anthony Cameron -2 Light Poster · Answer 5 · 2010-12-02T01:43:40+00:00

When you say "upstream", you mean BEFORE the ATG, correct?

That is correct

Anthony Cameron -2 Light Poster · Answer 6 · 2010-12-02T02:38:17+00:00

Here's some code that does that, if the ATG occurs once:

use strict;
use warnings;
my $seq="CCCCATAGAGATAGAGATAGAGAACCCCGCGCGCTCGCATGGGG";
my $term="ATG";
$_=$seq;
/(\D{20})$term/;
print "$1\n";

Output:

AGAGAACCCCGCGCGCTCGC

Thanks for the help and it makes a lot of sense now.

Perl program to print out 10 characters of a start codon

Recommended Answers Collapse Answers

All 9 Replies

Recommended Answers