Hello, I would like to know if I need to use a regular expression to match the desired substring in order to print out 10 characters of the start codon ATG.

My dna sequence is "CATAGAGATA"

Thanks for any advice.

You need to provide more information in order for us (or me anyway) to create such a regex. What does your data look like?

Hello, I would like to know if I need to use a regular expression to match the desired substring in order to print out 10 characters of the start codon ATG.

My dna sequence is "CATAGAGATA"

Thanks for any advice.

I don't think I understand the question. Your dna sequence consists of 10 characters and you want to print out 10 characters starting with the substring 'ATG'? I don't see any occurrence of the substring 'ATG' in your sequence. Can we shuffle the dna sequence until it contains (or starts with?) 'ATG'? Please tell us how you would determine the output without using a program and then maybe we can advise how to write a program that does it.

For example, does the following do what you want?

#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw(shuffle); #This module includes a method to shuffle arrays.

my $str = "CATAGAGATA";
my @arr;

while (1){
    @arr = $str =~ m/[AGCT]/g; #Convert string into array of single letters
    @arr = shuffle(@arr); #Shuffle the letters of the array randomly
    last if @arr[0,1,2] = qw(A T G)# Exit loop if first 3 elements = start codon
}

print "Shuffled sequence is:\n";
print join('', @arr), "\n";

This outputs:

Shuffled sequence is:
ATGAGTCTAA

Edited 6 Years Ago by d5e5: n/a

I don't think I understand the question. Your dna sequence consists of 10 characters and you want to print out 10 characters starting with the substring 'ATG'? I don't see any occurrence of the substring 'ATG' in your sequence. Can we shuffle the dna sequence until it contains (or starts with?) 'ATG'? Please tell us how you would determine the output without using a program and then maybe we can advise how to write a program that does it.

For example, does the following do what you want?

#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw(shuffle); #This module includes a method to shuffle arrays.

my $str = "CATAGAGATA";
my @arr;

while (1){
    @arr = $str =~ m/[AGCT]/g; #Convert string into array of single letters
    @arr = shuffle(@arr); #Shuffle the letters of the array randomly
    last if @arr[0,1,2] = qw(A T G)# Exit loop if first 3 elements = start codon
}

print "Shuffled sequence is:\n";
print join('', @arr), "\n";

This outputs:

Shuffled sequence is:
ATGAGTCTAA

Thank you for your response and I missed some characters. The DNA sequence is "CCCCATAGAG". I am supposed to print out the 10 characters upstream of the start codon ATG. I think that my output should provide me with the 10 bases upstream of ATG. I don't even know where to start because the question confuses me. Thank you

The question confuses me too. I still see only 10 bases and I don't see any 'ATG' in the sequence. Whoever gave you this question may have made a mistake.

The question confuses me too. I still see only 10 bases and I don't see any 'ATG' in the sequence. Whoever gave you this question may have made a mistake.

You were right I checked and it was a mistake. The dna sequence is: "CCCCATAGAGATAGAGATAGAGAACCCCGCGCGCTCGCATGGGG"; and I need to print out the 20 characters upstream of the start codon ATG.

Here's some code that does that, if the ATG occurs once:

use strict;
use warnings;
my $seq="CCCCATAGAGATAGAGATAGAGAACCCCGCGCGCTCGCATGGGG";
my $term="ATG";
$_=$seq;
/(\D{20})$term/;
print "$1\n";

Output:

AGAGAACCCCGCGCGCTCGC
Comments
Looks good to me.

Here's some code that does that, if the ATG occurs once:

use strict;
use warnings;
my $seq="CCCCATAGAGATAGAGATAGAGAACCCCGCGCGCTCGCATGGGG";
my $term="ATG";
$_=$seq;
/(\D{20})$term/;
print "$1\n";

Output:

AGAGAACCCCGCGCGCTCGC

Thanks for the help and it makes a lot of sense now.

This question has already been answered. Start a new discussion instead.