944,174 Members | Top Members by Rank

Ad:
  • Perl Discussion Thread
  • Unsolved
  • Views: 5446
  • Perl RSS
You are currently viewing page 1 of this multi-page discussion thread
Apr 30th, 2007
0

have some problems with pattern match hope you can help!!

Expand Post »
Hi there...

I am working one a perl script program and hope someone can help me , I'll give you a quick description of my project:

My program is given a fasta file, a signal description and a deviation (a number) as input om the command line.
A fasta file look like this :
>U00659.CDS.1 product:"insulin GGCCC
CCGCAGAAGCGTGGCATCGTGGAGCAGTGCTGCGCCGGCGTCTGCTCTCTCTACCAGCTG
AAAGACCAGACGGAGATGATGGTAAAGAGAGGTATTGTAGA
>X13559.CDS.1 product:"preproinsulin " DNA org:"Oncorhynchus keta" (CDS extraction)
ATGGCCTTCTGGCTCCAAGCTGCATCTCTGCTGGTGTTGCTGGCGCTCTCCCCCGGGGTA
GATGCTGCAGCTGCCCAGCACCTGTGTGGCTCTCACCTGGTGGACGCCCTCTATCTGGTG
TGTGGAGAGAAAGGATT
>J02989.CDS.1 note:"preproinsulin " DNA org:"Aotus trivirgatus" (CDS extraction)
ATGGCCCTGTGGATGCACCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCCGAG
CCAGCCCCGGCCTTTGTGAACCAGCACCTGTGCGGCCCCCACCTGGTGGAAGCCCTCTAC
CTGGTGTGCGGGGAGCGAGGTTTC
The first line of a FASTA file is a header and begins with '>', thise line should be ignored
the main thing is the sequence (ATCGCGCTATA)hoe i want to match..

A Signal description file is a text file that look like this:

# Shine-Delgarno
T 7
T 8
G 6
A 5
C 5
A 5
# intervening unimportant bases
* 15-21
# Pribnow box
T 8
A 8
T 6
A 6
AT 5
T 8

1) one or more allowed letters at this position and a penalty
for having a mismatch at that position.

2) the star character * denoting unimportant characters in the sequence and an interval where these
unimportant characters are allowed.

3) the hash character # meaning this line is a comment, and should be ignored by the program.


Okay now to the main thing, the output should list all matches in each fasta entry, clearly stating the location of the match.

The deviation is an important factor. If the deviation is set to 0, then it should search for the signal
is reduced to a regular expression. If the deviation is set to 16 in the above example,
then mismatches with the combined penalty of 16 or less are allowed.

I have try this so far but i cant figure out how to used tha deviation number and set the patternmatch I am pretty lost:


#!/usr/bin/perl -w

use strict;
#############
# Step 1 #
#############
#The program is given fasta file, a signal description file and a deviaton number as input on the command line comments if there are erros:
#Erros: be sured that deviation is a number


sub usage {
my ($msg) = @_;
print "$msg\n\n" if defined $msg;
print "Usage: project.pl <fastafile.fsa> <signaldescriptionfile.txt> <deviation>\n";
exit;
}
if (scalar @ARGV !=3){
&usage("Wrong number of arguments");
}

my ($fastafile, $signaldescription, $deviation) = @ARGV ;

if ($deviation =~ m/^\d+$/){ #correct input
print "Thanks!\n";
}else{
&usage ("I want a number please!");
}

################
# Step 2 #
################
# working with signal description:
#read the file and insure to put penalty and character in two seperate arrays,
#the # should be ignored
#the * unimportant sequence and should be ignored at position 15 -21 (have figure that yet):

open(IN,'<',$signaldescription ) or die "Could not find file\n";
my @character = ();
my @penalty = ();
my $comment ='';
while (defined (my $line = <IN>)) {
chomp ($line);
if ($line =~ m/^#/) {
if ($comment ne ''){
my ($character, $penalty) = split (' ',$line);
push @character, $character;
push @penalty, $penalty;
}
}
}


close IN;


############
# Step 3 #
############
#work with fasta file:
# Use regular Expresions to look at the fasta file and ignore the first line:


# $fragment: the pattern to search for
# $fraglen: the length of $fragment
# $buffer: a buffer to hold the DNA from the input file
# $position: the position of the buffer in the total DNA

my($fragment, $fraglen, $buffer, $position) = (@karaktere, '', 0);

my ($headline, $line, $dna) = ('', '', '');

open(IN, '>', $fastafilename) or die "Could not read file ($fastafile)\n";

# The first line of a FASTA file is a header and begins with '>'

while (defined ($line = <IN>)) {
if ($line =~ m/^>/) {
if ($headline ne '') { #after the sequence is readed i wanna look for the match

#write data to file (the matches):
chomp $headline;
print OUT "$headline\n";
for (my $i = 0; $i < length($reversecomplementdna); $i += 60) {
print OUT substr($reversecomplementdna, $i, 60), "\n";
}
# Get ready for next turn in the loop
$dna = '';+
}
$headline = $line;
}
else {
# Read the DNA
chomp $line;
$dna .= $line;
}
}
#########################

Thanks alot for your time, i really apriciet your time and if you can help me....

thanxxxx
MojoS
Reputation Points: 10
Solved Threads: 0
Light Poster
MojoS is offline Offline
31 posts
since Feb 2007
Apr 30th, 2007
0

Re: have some problems with pattern match hope you can help!!

Is this school work?
Reputation Points: 246
Solved Threads: 67
Practically a Posting Shark
KevinADC is offline Offline
898 posts
since Mar 2006
May 1st, 2007
0

Re: have some problems with pattern match hope you can help!!

Yes its one of my project that I am working on at my uni, and i will appreciate if someone can help me out by giving me some comments and ideas on how to work it out, because Iam pretty lost...

Thanxx
Reputation Points: 10
Solved Threads: 0
Light Poster
MojoS is offline Offline
31 posts
since Feb 2007
May 1st, 2007
0

Re: have some problems with pattern match hope you can help!!

First thing you have to do is fix the errors:

Global symbol "@karaktere" requires explicit package name at script line 68.
Global symbol "$fastafilename" requires explicit package name at script line 72.
Global symbol "$reversecomplementdna" requires explicit package name at script line 83.
Global symbol "$reversecomplementdna" requires explicit package name at script line 84.
syntax error at script line 88, near "}"

How come you are not getting help from a teacher or fellow student?
Reputation Points: 246
Solved Threads: 67
Practically a Posting Shark
KevinADC is offline Offline
898 posts
since Mar 2006
May 1st, 2007
0

Re: have some problems with pattern match hope you can help!!

Thanks for looking throught it; unfortunately my professor is not always available and it would take some time before I could go on with this without any advice(because I'm stuck). As I'm doing this project on my own I don't have any fellowstudents to aks or exchange views with.
But thanks anyway
Reputation Points: 10
Solved Threads: 0
Light Poster
MojoS is offline Offline
31 posts
since Feb 2007
May 1st, 2007
0

Re: have some problems with pattern match hope you can help!!

OK, well, fix those errors otherwise your code will not even compile.
Reputation Points: 246
Solved Threads: 67
Practically a Posting Shark
KevinADC is offline Offline
898 posts
since Mar 2006
May 3rd, 2007
0

Re: have some problems with pattern match hope you can help!!

okay I have fix it thanks!
Reputation Points: 10
Solved Threads: 0
Light Poster
MojoS is offline Offline
31 posts
since Feb 2007
May 3rd, 2007
0

Re: have some problems with pattern match hope you can help!!

Okay but can you advice me how to work with pattern match, if i want to ignore some character at a specific position.....

signaldescription:
# Shine-Delgarno
T 7
T 8
G 6
A 5
C 5
A 5
* 15-21 # intervening unimportant bases
# Pribnow box
T 8
A 8
T 6
A 6
AT 5
T 8


open(IN,'<',$signaldescription ) or die "Could not find file\n";
my @character = ();
my @penalty = ();
my $comment ='';
while (defined (my $line = <IN>)) {
chomp ($line);
if ($line =~ m/^#/) {
if ($comment ne ''){
my ($character, $penalty) = split (' ',$line);
push @character, $character;
push @penalty, $penalty;
}
}
}


close IN;


Here I have tried to save the penalty that contain the characters in an array and the penalties that contain number in another array but I cant figure out how to perlscript my program to skip a certain position when it meet the character *...

thanks..
Reputation Points: 10
Solved Threads: 0
Light Poster
MojoS is offline Offline
31 posts
since Feb 2007
May 3rd, 2007
0

Re: have some problems with pattern match hope you can help!!

If have the time I might try and help. Your question, or questions, are really more than asking for general help and will take some time to try and assist you with.
Reputation Points: 246
Solved Threads: 67
Practically a Posting Shark
KevinADC is offline Offline
898 posts
since Mar 2006
May 4th, 2007
0

Re: have some problems with pattern match hope you can help!!

what is $comment being used for?

Perl Syntax (Toggle Plain Text)
  1. my $comment ='';

further on you have:

Perl Syntax (Toggle Plain Text)
  1. if ($comment ne ''){

but $comment is blank ('') so that condition is always false so the expressions that follow are never evaluated.
Reputation Points: 246
Solved Threads: 67
Practically a Posting Shark
KevinADC is offline Offline
898 posts
since Mar 2006

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in Perl Forum Timeline: New to perl!!
Next Thread in Perl Forum Timeline: Formatting in perl





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC