About matching first and last part of the string
Hi everybody!! I think this is a simplest perl related problem..but still I need your help.
Here's my sample input file:
>blast
ATGGGCCTAC
ATCCACSTAT
Please note that the number of lines could be more than these two, but the Perl script should skip the first line which starts with '>'.
Now the Perl script should take multiple lines as a single line and check if the line starts with ATG and ends with TAT. If this condition is true, then the output should be "gene". Else "not gene".
But my perl script is not taking the whole file. It is taking one line at a time. Here's my script:
#!usr/bin/perl
print "Print your file name with location\n";
$dnafile=<STDIN>;
chomp $dnafile;
open (DNA, $dnafile) || die "Cannot open the file : $!";
while ($dna=<DNA>)
{
chomp ($dna);
### Check Starting not equals to '>' letter
if ($dna=~/^[^>]/)
{
@dna=split ('', $dna);
print "$dna";
}
if (($dna=~/^ATG/) && ($dna=~/TAT$/)) {
print "gene";
}
else {
print "Not gene\n";
}
}
Please let me know how can I improve it?
Thanks
ghosh22
Junior Poster in Training
53 posts since Aug 2010
Reputation Points: 10
Solved Threads: 0
But my perl script is not taking the whole file.
undef $/; # input record Separator
open (FILEHANDLE, "$input_file") || die "Cannot Open the $input_file : $!";
my $file_content = <FILEHANDLE>;
close (FILEHANDLE);
print $file_content;
or
open (FILEHANDLE, "$input_file") || die "Cannot Open the $input_file : $!";
read FILEHANDLE, my $file_content, -s FILEHANDLE;
close (FILEHANDLE);
print $file_content;
k_manimuthu
Junior Poster in Training
93 posts since Jun 2009
Reputation Points: 55
Solved Threads: 24
hii thanks..so u mean that I have to add this piece of code before the while loop?
ghosh22
Junior Poster in Training
53 posts since Aug 2010
Reputation Points: 10
Solved Threads: 0
Hi Ghosh,
Read the below links and try the updated code.
File Handling
File Contents
Regular Expression
and more
open (FIN, "$input_file") || die "Cannot Open the $input_file : $!";
read FIN, my $file, -s FIN;
close (FIN);
if ($file =~ m{
^ # Match Begining
> # match '>' char
[^\n]+\n # Caputred the first line
ATG.*TAT # Match char 'ATG' followed any characters and 'TAT'
$ # Match End
}xs)
{
print "\nGene";
}
else
{
print "\nNot Gene";
}
k_manimuthu
Junior Poster in Training
93 posts since Jun 2009
Reputation Points: 55
Solved Threads: 24
ghosh22
Junior Poster in Training
53 posts since Aug 2010
Reputation Points: 10
Solved Threads: 0
k_manimuthu's answers should work fine. Here is a slightly different way to do the same thing.
#!/usr/bin/perl
use strict;
use warnings;
my $input_file = 'blast.txt';
open my $fh, '<', $input_file or die "Cannot Open the $input_file : $!";
my $sequence;
while (<$fh>){
chomp;
$sequence .= $_ unless m/^>/;#Skip the line that starts with >
}
print $sequence, "\n";
if ($sequence =~ /^ATG.*TAT$/){
print "The above sequence starts with ATG and ends with TAT, so it's a gene.";
}
else{
print "The above sequence is not a gene.";
}
close $fh;
This gives the following output:
ATGGGCCTACATCCACSTAT
The above sequence starts with ATG and ends with TAT, so it's a gene.
d5e5
Practically a Posting Shark
810 posts since Sep 2009
Reputation Points: 159
Solved Threads: 159
hi thanks..would u plz let me know the meaning of .= in line 12?
thanks
ghosh22
Junior Poster in Training
53 posts since Aug 2010
Reputation Points: 10
Solved Threads: 0
it is a concat statment. It means....
$sequence = $sequence . $_;
## Another one example
$first_name = 'Mani';
$last_name = 'Muthu';
$full_name = "$first_name". " " . "$last_name";
print $full_name;
k_manimuthu
Junior Poster in Training
93 posts since Jun 2009
Reputation Points: 55
Solved Threads: 24
ghosh22
Junior Poster in Training
53 posts since Aug 2010
Reputation Points: 10
Solved Threads: 0