kindly help

Reply

Join Date: Jun 2009
Posts: 8
Reputation: rayken1 is an unknown quantity at this point 
Solved Threads: 0
rayken1 rayken1 is offline Offline
Newbie Poster

kindly help

 
0
  #1
Jun 30th, 2009
FT CDS complement(join ( 14006...14068, 19351..20068))
FT /locus_tag= TP01_0004”
FT /note=”go function: nutrient reservoir activity [goid
FT 0045889]



The above statement is been read as a string and I would a REGEX as follows:

/^FT \s CDS \s \ / complement[0-9]/ # search line 1
/^FT \s .* \ / locus _tag = (.*)/ # search line 2
/^FT ‘s / \t///#/ ‘ .* \ / note (.*) / #search line 3


It seems not to be working well. Kindly help.
Reply With Quote Quick reply to this message  
Join Date: Jun 2009
Posts: 10
Reputation: Prakash_8111 is an unknown quantity at this point 
Solved Threads: 0
Prakash_8111 Prakash_8111 is offline Offline
Newbie Poster

Re: kindly help

 
0
  #2
Jun 30th, 2009
Hi
Can u clearly tell wt u want to do ?????
Reply With Quote Quick reply to this message  
Join Date: Jun 2009
Posts: 8
Reputation: rayken1 is an unknown quantity at this point 
Solved Threads: 0
rayken1 rayken1 is offline Offline
Newbie Poster

Re: kindly help

 
0
  #3
Jul 2nd, 2009
Hi Prakash,
What I want to do is :

a) The file with the above info. s opened and read as strings,
B) I need a REGEX that would be able to capture this block of
letters,

FT CDS complement(join ( 14006...14068, 19351..20068))
FT /locus_tag= TP01_0004”
FT /note=”go function: nutrient reservoir activity [goid
FT 0045889]


and copy the last two statements to a new file.
Thanks.
Reply With Quote Quick reply to this message  
Join Date: Jun 2008
Posts: 49
Reputation: onaclov2000 is an unknown quantity at this point 
Solved Threads: 5
onaclov2000 onaclov2000 is offline Offline
Light Poster

Re: kindly help

 
0
  #4
Jul 6th, 2009
Originally Posted by rayken1 View Post
FT CDS complement(join ( 14006...14068, 19351..20068))
FT /locus_tag= TP01_0004”
FT /note=”go function: nutrient reservoir activity [goid
FT 0045889]



The above statement is been read as a string and I would a REGEX as follows:

/^FT \s CDS \s \ / complement[0-9]/ # search line 1
/^FT \s .* \ / locus _tag = (.*)/ # search line 2
/^FT ‘s / \t///#/ ‘ .* \ / note (.*) / #search line 3


It seems not to be working well. Kindly help.
First I would really like to see the output (If any).

Second I would recommend
for :FT CDS complement(join ( 14006...14068, 19351..20068))
1. /^FT \s+ CDS \s+ complement\(join\([0-9]+\)\)$/

Basically we want to escape each of the ('s and )'s also adding + to the \s will say match one or more (in case you have more then one space) as well as the character class [0-9] the plus will say match at least ONE number possibly more, not sure if the entire line should contain only that, but if so then you want to add a $ at the end to signify that it's the end of the string/line.

If that gets you in the right direction let me know if you still can't get the other two.

Hope that helped.
Reply With Quote Quick reply to this message  
Join Date: Jun 2009
Posts: 8
Reputation: rayken1 is an unknown quantity at this point 
Solved Threads: 0
rayken1 rayken1 is offline Offline
Newbie Poster

Re: kindly help

 
0
  #5
Jul 13th, 2009
Hi Prakash and Onclav2000,

Thanks for your answers. I still having some problem in solving my problem. I have sent the structure of the file am dealing with and the code I have so far come up with using Onclav2000 REGEX that captures only the first line in the file.



The structure of the text file is as follows:

FT CDS complement(7216..17805)
FT /locus_tag="TP01_0003"
FT /codon_start=1
FT /protein_id="XP_765530.1"
FT /db_xref="GI:71031777"
FT /db_xref="GeneID:3502673"
FT gene complement(<7216..>17805)
FT /locus_tag="TP01_0003"
FT /db_xref="GeneID:3502673"
FT mRNA complement(<7216..>17805)
FT /locus_tag="TP01_0003"
FT /product="hypothetical telomeric SfiI fragment 20 protein
FT 3"
FT /transcript_id="XM_760437.1"
FT /db_xref="GI:71031776"
FT /db_xref="GeneID:3502673"
FT CDS complement(join(18028..18116,19351..20668))
FT /locus_tag="TP01_0004"
FT /note="go_function: nutrient reservoir activity [goid
FT 0045735]"

FT /codon_start=1
FT /protein_id="XP_765531.1"
FT /db_xref="GI:71031779"
FT /db_xref="GeneID:3503550"
FT gene complement(<18028..>20668)
FT /locus_tag="TP01_0004"
FT /db_xref="GeneID:3503550"
FT mRNA complement(join(<18028..18116,19351..>20668))
FT /locus_tag="TP01_0004"
FT /product="hypothetical protein"
FT /transcript_id="XM_760438.1"
FT /db_xref="GI:71031778"
FT /db_xref="GeneID:3503550"
FT CDS complement(20951..21967)
FT /locus_tag="TP01_0005"
FT /codon_start=1
FT /protein_id="XP_765532.1"
FT /db_xref="GI:71031781"
FT /db_xref="GeneID:3503551"
FT gene complement(<20951..>21967)
FT /locus_tag="TP01_0005"
FT /db_xref="GeneID:3503551"
FT mRNA complement(<20951..>21967)
FT /locus_tag="TP01_0005"
FT /product="hypothetical protein"
FT /transcript_id="XM_760439.1"
FT /db_xref="GI:71031780"
FT /db_xref="GeneID:3503551"


This is my code

#!/usr/bin/perl
$file = 'Muguga.embl ';

open (F, $file) || die ("Could not open $file!");

while ($line = <F>)
{
($field1,$field2,$field3,$field4) = split( "\t" , $line);

print "$field1 $field2 $field3 $field4 \n";
my $string = (FT CDS complement(join(18028..18116,19351..20668))); # string to be searched

if ($string = ~ m/^FT \s+ CDS \s+ complement\(join\([0-9]+\)\)$/)
#search for the first line highlighted in bold
{
print 'match'
} else{
print 'no match';
}
}
close (F);


My wish is to able to search for the lines that are in bold and print them out.

I will be grateful if you are able to help my code though am still a perl newbie.

Thanks.
Reply With Quote Quick reply to this message  
Join Date: Jun 2008
Posts: 49
Reputation: onaclov2000 is an unknown quantity at this point 
Solved Threads: 5
onaclov2000 onaclov2000 is offline Offline
Light Poster

Re: kindly help

 
0
  #6
Jul 13th, 2009
Originally Posted by rayken1 View Post
Hi Prakash and Onclav2000,

Thanks for your answers. I still having some problem in solving my problem. I have sent the structure of the file am dealing with and the code I have so far come up with using Onclov2000 REGEX that captures only the first line in the file.



The structure of the text file is as follows:

FT CDS complement(7216..17805)
FT /locus_tag="TP01_0003"
FT /codon_start=1
FT /protein_id="XP_765530.1"
FT /db_xref="GI:71031777"
FT /db_xref="GeneID:3502673"
FT gene complement(<7216..>17805)
FT /locus_tag="TP01_0003"
FT /db_xref="GeneID:3502673"
FT mRNA complement(<7216..>17805)
FT /locus_tag="TP01_0003"
FT /product="hypothetical telomeric SfiI fragment 20 protein
FT 3"
FT /transcript_id="XM_760437.1"
FT /db_xref="GI:71031776"
FT /db_xref="GeneID:3502673"
FT CDS complement(join(18028..18116,19351..20668))
FT /locus_tag="TP01_0004"
FT /note="go_function: nutrient reservoir activity [goid
FT 0045735]"

FT /codon_start=1
FT /protein_id="XP_765531.1"
FT /db_xref="GI:71031779"
FT /db_xref="GeneID:3503550"
FT gene complement(<18028..>20668)
FT /locus_tag="TP01_0004"
FT /db_xref="GeneID:3503550"
FT mRNA complement(join(<18028..18116,19351..>20668))
FT /locus_tag="TP01_0004"
FT /product="hypothetical protein"
FT /transcript_id="XM_760438.1"
FT /db_xref="GI:71031778"
FT /db_xref="GeneID:3503550"
FT CDS complement(20951..21967)
FT /locus_tag="TP01_0005"
FT /codon_start=1
FT /protein_id="XP_765532.1"
FT /db_xref="GI:71031781"
FT /db_xref="GeneID:3503551"
FT gene complement(<20951..>21967)
FT /locus_tag="TP01_0005"
FT /db_xref="GeneID:3503551"
FT mRNA complement(<20951..>21967)
FT /locus_tag="TP01_0005"
FT /product="hypothetical protein"
FT /transcript_id="XM_760439.1"
FT /db_xref="GI:71031780"
FT /db_xref="GeneID:3503551"


This is my code

#!/usr/bin/perl
$file = 'Muguga.embl ';

open (F, $file) || die ("Could not open $file!");

while ($line = <F>)
{
($field1,$field2,$field3,$field4) = split( "\t" , $line);

print "$field1 $field2 $field3 $field4 \n";
my $string = (FT CDS complement(join(18028..18116,19351..20668))); # string to be searched

if ($string = ~ m/^FT \s+ CDS \s+ complement\(join\([0-9]+\)\)$/)
#search for the first line highlighted in bold
{
print 'match'
} else{
print 'no match';
}
}
close (F);


My wish is to able to search for the lines that are in bold and print them out.

I will be grateful if you are able to help my code though am still a perl newbie.

Thanks.

First things first, looking at your code, you're only searching on the string you provide
my $string = my $string = (FT CDS complement(join(18028..18116,19351..20668))); # string to be searched

So obviously you will only see that one and print it out....Try it with the input data,

Second, from what i'm understanding of your code, you're splitting each line up, so finding the data you want won't happen unless you look at the whole line, unless you modify the regex, and only look at the last field (for example).
So I would recommend changing to this (for the appropriate stuff):
while ($line = <F>)
{
print $line\n";

if ($line = ~ m/^FT \s+ CDS \s+ complement\(join\([0-9]+\)\)$/)
#search for the first line highlighted in bold
{
print 'match'
} else{
print 'no match';
}

#insert additional regex's here

}
close (F);



Third, Once you've done that....try writing a regex for the second line you're trying to get, if you have problems post it and i'll gladly take a look at your regex and make some suggestions....

Fourth, I would recommend putting a counter in, and when you print the "match" you print the Line number as well (makes troubleshooting easier)

Fifth...wow I talk alot...sorry....anyways. You can turn that one regex into something that will match each line you're looking for, but IMHO it'll make for a messy looking regex, but you can do it, then there are variables that the regex will return so if you stick the right things into parentheses you can extract them, but again, I'm for a cleaner easier to follow code so I would break up each particular line into each regex...unless they are REALLY similar but they really don't look like it.

Onaclov.
My Blog:
Onablog
Reply With Quote Quick reply to this message  
Join Date: Jul 2009
Posts: 15
Reputation: wickedxter is an unknown quantity at this point 
Solved Threads: 0
wickedxter wickedxter is offline Offline
Newbie Poster

Re: kindly help

 
0
  #7
Jul 13th, 2009
  1. $string =~ /FT\s*CDS\s*\w+\(join\(.+\)\)/;

try this, it isnt tested.
Reply With Quote Quick reply to this message  
Join Date: Jun 2009
Posts: 8
Reputation: rayken1 is an unknown quantity at this point 
Solved Threads: 0
rayken1 rayken1 is offline Offline
Newbie Poster

Re: kindly help

 
0
  #8
Jul 20th, 2009
Originally Posted by wickedxter View Post
  1. $string =~ /FT\s*CDS\s*\w+\(join\(.+\)\)/;

try this, it isnt tested.
Thanks a lot Wickedxter, The code did work perfectly.
God Bless You.
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:


Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC