have some problems with pattern match hope you can help!!

Reply

Join Date: Feb 2007
Posts: 31
Reputation: MojoS is an unknown quantity at this point 
Solved Threads: 0
MojoS's Avatar
MojoS MojoS is offline Offline
Light Poster

have some problems with pattern match hope you can help!!

 
0
  #1
Apr 30th, 2007
Hi there...

I am working one a perl script program and hope someone can help me , I'll give you a quick description of my project:

My program is given a fasta file, a signal description and a deviation (a number) as input om the command line.
A fasta file look like this :
>U00659.CDS.1 product:"insulin GGCCC
CCGCAGAAGCGTGGCATCGTGGAGCAGTGCTGCGCCGGCGTCTGCTCTCTCTACCAGCTG
AAAGACCAGACGGAGATGATGGTAAAGAGAGGTATTGTAGA
>X13559.CDS.1 product:"preproinsulin " DNA org:"Oncorhynchus keta" (CDS extraction)
ATGGCCTTCTGGCTCCAAGCTGCATCTCTGCTGGTGTTGCTGGCGCTCTCCCCCGGGGTA
GATGCTGCAGCTGCCCAGCACCTGTGTGGCTCTCACCTGGTGGACGCCCTCTATCTGGTG
TGTGGAGAGAAAGGATT
>J02989.CDS.1 note:"preproinsulin " DNA org:"Aotus trivirgatus" (CDS extraction)
ATGGCCCTGTGGATGCACCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCCGAG
CCAGCCCCGGCCTTTGTGAACCAGCACCTGTGCGGCCCCCACCTGGTGGAAGCCCTCTAC
CTGGTGTGCGGGGAGCGAGGTTTC
The first line of a FASTA file is a header and begins with '>', thise line should be ignored
the main thing is the sequence (ATCGCGCTATA)hoe i want to match..

A Signal description file is a text file that look like this:

# Shine-Delgarno
T 7
T 8
G 6
A 5
C 5
A 5
# intervening unimportant bases
* 15-21
# Pribnow box
T 8
A 8
T 6
A 6
AT 5
T 8

1) one or more allowed letters at this position and a penalty
for having a mismatch at that position.

2) the star character * denoting unimportant characters in the sequence and an interval where these
unimportant characters are allowed.

3) the hash character # meaning this line is a comment, and should be ignored by the program.


Okay now to the main thing, the output should list all matches in each fasta entry, clearly stating the location of the match.

The deviation is an important factor. If the deviation is set to 0, then it should search for the signal
is reduced to a regular expression. If the deviation is set to 16 in the above example,
then mismatches with the combined penalty of 16 or less are allowed.

I have try this so far but i cant figure out how to used tha deviation number and set the patternmatch I am pretty lost:


#!/usr/bin/perl -w

use strict;
#############
# Step 1 #
#############
#The program is given fasta file, a signal description file and a deviaton number as input on the command line comments if there are erros:
#Erros: be sured that deviation is a number


sub usage {
my ($msg) = @_;
print "$msg\n\n" if defined $msg;
print "Usage: project.pl <fastafile.fsa> <signaldescriptionfile.txt> <deviation>\n";
exit;
}
if (scalar @ARGV !=3){
&usage("Wrong number of arguments");
}

my ($fastafile, $signaldescription, $deviation) = @ARGV ;

if ($deviation =~ m/^\d+$/){ #correct input
print "Thanks!\n";
}else{
&usage ("I want a number please!");
}

################
# Step 2 #
################
# working with signal description:
#read the file and insure to put penalty and character in two seperate arrays,
#the # should be ignored
#the * unimportant sequence and should be ignored at position 15 -21 (have figure that yet):

open(IN,'<',$signaldescription ) or die "Could not find file\n";
my @character = ();
my @penalty = ();
my $comment ='';
while (defined (my $line = <IN>)) {
chomp ($line);
if ($line =~ m/^#/) {
if ($comment ne ''){
my ($character, $penalty) = split (' ',$line);
push @character, $character;
push @penalty, $penalty;
}
}
}


close IN;


############
# Step 3 #
############
#work with fasta file:
# Use regular Expresions to look at the fasta file and ignore the first line:


# $fragment: the pattern to search for
# $fraglen: the length of $fragment
# $buffer: a buffer to hold the DNA from the input file
# $position: the position of the buffer in the total DNA

my($fragment, $fraglen, $buffer, $position) = (@karaktere, '', 0);

my ($headline, $line, $dna) = ('', '', '');

open(IN, '>', $fastafilename) or die "Could not read file ($fastafile)\n";

# The first line of a FASTA file is a header and begins with '>'

while (defined ($line = <IN>)) {
if ($line =~ m/^>/) {
if ($headline ne '') { #after the sequence is readed i wanna look for the match

#write data to file (the matches):
chomp $headline;
print OUT "$headline\n";
for (my $i = 0; $i < length($reversecomplementdna); $i += 60) {
print OUT substr($reversecomplementdna, $i, 60), "\n";
}
# Get ready for next turn in the loop
$dna = '';+
}
$headline = $line;
}
else {
# Read the DNA
chomp $line;
$dna .= $line;
}
}
#########################

Thanks alot for your time, i really apriciet your time and if you can help me....

thanxxxx
MojoS
Reply With Quote Quick reply to this message  
Join Date: Mar 2006
Posts: 898
Reputation: KevinADC has a spectacular aura about KevinADC has a spectacular aura about 
Solved Threads: 67
KevinADC's Avatar
KevinADC KevinADC is offline Offline
Practically a Posting Shark

Re: have some problems with pattern match hope you can help!!

 
0
  #2
Apr 30th, 2007
Is this school work?
Reply With Quote Quick reply to this message  
Join Date: Feb 2007
Posts: 31
Reputation: MojoS is an unknown quantity at this point 
Solved Threads: 0
MojoS's Avatar
MojoS MojoS is offline Offline
Light Poster

Re: have some problems with pattern match hope you can help!!

 
0
  #3
May 1st, 2007
Yes its one of my project that I am working on at my uni, and i will appreciate if someone can help me out by giving me some comments and ideas on how to work it out, because Iam pretty lost...

Thanxx
Reply With Quote Quick reply to this message  
Join Date: Mar 2006
Posts: 898
Reputation: KevinADC has a spectacular aura about KevinADC has a spectacular aura about 
Solved Threads: 67
KevinADC's Avatar
KevinADC KevinADC is offline Offline
Practically a Posting Shark

Re: have some problems with pattern match hope you can help!!

 
0
  #4
May 1st, 2007
First thing you have to do is fix the errors:

Global symbol "@karaktere" requires explicit package name at script line 68.
Global symbol "$fastafilename" requires explicit package name at script line 72.
Global symbol "$reversecomplementdna" requires explicit package name at script line 83.
Global symbol "$reversecomplementdna" requires explicit package name at script line 84.
syntax error at script line 88, near "}"

How come you are not getting help from a teacher or fellow student?
Reply With Quote Quick reply to this message  
Join Date: Feb 2007
Posts: 31
Reputation: MojoS is an unknown quantity at this point 
Solved Threads: 0
MojoS's Avatar
MojoS MojoS is offline Offline
Light Poster

Re: have some problems with pattern match hope you can help!!

 
0
  #5
May 1st, 2007
Thanks for looking throught it; unfortunately my professor is not always available and it would take some time before I could go on with this without any advice(because I'm stuck). As I'm doing this project on my own I don't have any fellowstudents to aks or exchange views with.
But thanks anyway
Reply With Quote Quick reply to this message  
Join Date: Mar 2006
Posts: 898
Reputation: KevinADC has a spectacular aura about KevinADC has a spectacular aura about 
Solved Threads: 67
KevinADC's Avatar
KevinADC KevinADC is offline Offline
Practically a Posting Shark

Re: have some problems with pattern match hope you can help!!

 
0
  #6
May 1st, 2007
OK, well, fix those errors otherwise your code will not even compile.
Reply With Quote Quick reply to this message  
Join Date: Feb 2007
Posts: 31
Reputation: MojoS is an unknown quantity at this point 
Solved Threads: 0
MojoS's Avatar
MojoS MojoS is offline Offline
Light Poster

Re: have some problems with pattern match hope you can help!!

 
0
  #7
May 3rd, 2007
okay I have fix it thanks!
Reply With Quote Quick reply to this message  
Join Date: Feb 2007
Posts: 31
Reputation: MojoS is an unknown quantity at this point 
Solved Threads: 0
MojoS's Avatar
MojoS MojoS is offline Offline
Light Poster

Re: have some problems with pattern match hope you can help!!

 
0
  #8
May 3rd, 2007
Okay but can you advice me how to work with pattern match, if i want to ignore some character at a specific position.....

signaldescription:
# Shine-Delgarno
T 7
T 8
G 6
A 5
C 5
A 5
* 15-21 # intervening unimportant bases
# Pribnow box
T 8
A 8
T 6
A 6
AT 5
T 8


open(IN,'<',$signaldescription ) or die "Could not find file\n";
my @character = ();
my @penalty = ();
my $comment ='';
while (defined (my $line = <IN>)) {
chomp ($line);
if ($line =~ m/^#/) {
if ($comment ne ''){
my ($character, $penalty) = split (' ',$line);
push @character, $character;
push @penalty, $penalty;
}
}
}


close IN;


Here I have tried to save the penalty that contain the characters in an array and the penalties that contain number in another array but I cant figure out how to perlscript my program to skip a certain position when it meet the character *...

thanks..
Reply With Quote Quick reply to this message  
Join Date: Mar 2006
Posts: 898
Reputation: KevinADC has a spectacular aura about KevinADC has a spectacular aura about 
Solved Threads: 67
KevinADC's Avatar
KevinADC KevinADC is offline Offline
Practically a Posting Shark

Re: have some problems with pattern match hope you can help!!

 
0
  #9
May 3rd, 2007
If have the time I might try and help. Your question, or questions, are really more than asking for general help and will take some time to try and assist you with.
Reply With Quote Quick reply to this message  
Join Date: Mar 2006
Posts: 898
Reputation: KevinADC has a spectacular aura about KevinADC has a spectacular aura about 
Solved Threads: 67
KevinADC's Avatar
KevinADC KevinADC is offline Offline
Practically a Posting Shark

Re: have some problems with pattern match hope you can help!!

 
0
  #10
May 4th, 2007
what is $comment being used for?

  1. my $comment ='';

further on you have:

  1. if ($comment ne ''){

but $comment is blank ('') so that condition is always false so the expressions that follow are never evaluated.
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:



Other Threads in the Perl Forum
Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC