943,900 Members | Top Members by Rank

Ad:
  • Perl Discussion Thread
  • Unsolved
  • Views: 886
  • Perl RSS
Jun 30th, 2009
0

kindly help

Expand Post »
FT CDS complement(join ( 14006...14068, 19351..20068))
FT /locus_tag= TP01_0004”
FT /note=”go function: nutrient reservoir activity [goid
FT 0045889]



The above statement is been read as a string and I would a REGEX as follows:

/^FT \s CDS \s \ / complement[0-9]/ # search line 1
/^FT \s .* \ / locus _tag = (.*)/ # search line 2
/^FT ‘s / \t///#/ ‘ .* \ / note (.*) / #search line 3


It seems not to be working well. Kindly help.
Similar Threads
Reputation Points: 10
Solved Threads: 0
Newbie Poster
rayken1 is offline Offline
8 posts
since Jun 2009
Jun 30th, 2009
0

Re: kindly help

Hi
Can u clearly tell wt u want to do ?????
Reputation Points: 10
Solved Threads: 0
Newbie Poster
Prakash_8111 is offline Offline
21 posts
since Jun 2009
Jul 2nd, 2009
0

Re: kindly help

Hi Prakash,
What I want to do is :

a) The file with the above info. s opened and read as strings,
B) I need a REGEX that would be able to capture this block of
letters,

FT CDS complement(join ( 14006...14068, 19351..20068))
FT /locus_tag= TP01_0004”
FT /note=”go function: nutrient reservoir activity [goid
FT 0045889]


and copy the last two statements to a new file.
Thanks.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
rayken1 is offline Offline
8 posts
since Jun 2009
Jul 6th, 2009
0

Re: kindly help

Click to Expand / Collapse  Quote originally posted by rayken1 ...
FT CDS complement(join ( 14006...14068, 19351..20068))
FT /locus_tag= TP01_0004”
FT /note=”go function: nutrient reservoir activity [goid
FT 0045889]



The above statement is been read as a string and I would a REGEX as follows:

/^FT \s CDS \s \ / complement[0-9]/ # search line 1
/^FT \s .* \ / locus _tag = (.*)/ # search line 2
/^FT ‘s / \t///#/ ‘ .* \ / note (.*) / #search line 3


It seems not to be working well. Kindly help.
First I would really like to see the output (If any).

Second I would recommend
for :FT CDS complement(join ( 14006...14068, 19351..20068))
1. /^FT \s+ CDS \s+ complement\(join\([0-9]+\)\)$/

Basically we want to escape each of the ('s and )'s also adding + to the \s will say match one or more (in case you have more then one space) as well as the character class [0-9] the plus will say match at least ONE number possibly more, not sure if the entire line should contain only that, but if so then you want to add a $ at the end to signify that it's the end of the string/line.

If that gets you in the right direction let me know if you still can't get the other two.

Hope that helped.
Reputation Points: 11
Solved Threads: 5
Junior Poster in Training
onaclov2000 is offline Offline
57 posts
since Jun 2008
Jul 13th, 2009
0

Re: kindly help

Hi Prakash and Onclav2000,

Thanks for your answers. I still having some problem in solving my problem. I have sent the structure of the file am dealing with and the code I have so far come up with using Onclav2000 REGEX that captures only the first line in the file.



The structure of the text file is as follows:

FT CDS complement(7216..17805)
FT /locus_tag="TP01_0003"
FT /codon_start=1
FT /protein_id="XP_765530.1"
FT /db_xref="GI:71031777"
FT /db_xref="GeneID:3502673"
FT gene complement(<7216..>17805)
FT /locus_tag="TP01_0003"
FT /db_xref="GeneID:3502673"
FT mRNA complement(<7216..>17805)
FT /locus_tag="TP01_0003"
FT /product="hypothetical telomeric SfiI fragment 20 protein
FT 3"
FT /transcript_id="XM_760437.1"
FT /db_xref="GI:71031776"
FT /db_xref="GeneID:3502673"
FT CDS complement(join(18028..18116,19351..20668))
FT /locus_tag="TP01_0004"
FT /note="go_function: nutrient reservoir activity [goid
FT 0045735]"

FT /codon_start=1
FT /protein_id="XP_765531.1"
FT /db_xref="GI:71031779"
FT /db_xref="GeneID:3503550"
FT gene complement(<18028..>20668)
FT /locus_tag="TP01_0004"
FT /db_xref="GeneID:3503550"
FT mRNA complement(join(<18028..18116,19351..>20668))
FT /locus_tag="TP01_0004"
FT /product="hypothetical protein"
FT /transcript_id="XM_760438.1"
FT /db_xref="GI:71031778"
FT /db_xref="GeneID:3503550"
FT CDS complement(20951..21967)
FT /locus_tag="TP01_0005"
FT /codon_start=1
FT /protein_id="XP_765532.1"
FT /db_xref="GI:71031781"
FT /db_xref="GeneID:3503551"
FT gene complement(<20951..>21967)
FT /locus_tag="TP01_0005"
FT /db_xref="GeneID:3503551"
FT mRNA complement(<20951..>21967)
FT /locus_tag="TP01_0005"
FT /product="hypothetical protein"
FT /transcript_id="XM_760439.1"
FT /db_xref="GI:71031780"
FT /db_xref="GeneID:3503551"


This is my code

#!/usr/bin/perl
$file = 'Muguga.embl ';

open (F, $file) || die ("Could not open $file!");

while ($line = <F>)
{
($field1,$field2,$field3,$field4) = split( "\t" , $line);

print "$field1 $field2 $field3 $field4 \n";
my $string = (FT CDS complement(join(18028..18116,19351..20668))); # string to be searched

if ($string = ~ m/^FT \s+ CDS \s+ complement\(join\([0-9]+\)\)$/)
#search for the first line highlighted in bold
{
print 'match'
} else{
print 'no match';
}
}
close (F);


My wish is to able to search for the lines that are in bold and print them out.

I will be grateful if you are able to help my code though am still a perl newbie.

Thanks.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
rayken1 is offline Offline
8 posts
since Jun 2009
Jul 13th, 2009
0

Re: kindly help

Click to Expand / Collapse  Quote originally posted by rayken1 ...
Hi Prakash and Onclav2000,

Thanks for your answers. I still having some problem in solving my problem. I have sent the structure of the file am dealing with and the code I have so far come up with using Onclov2000 REGEX that captures only the first line in the file.



The structure of the text file is as follows:

FT CDS complement(7216..17805)
FT /locus_tag="TP01_0003"
FT /codon_start=1
FT /protein_id="XP_765530.1"
FT /db_xref="GI:71031777"
FT /db_xref="GeneID:3502673"
FT gene complement(<7216..>17805)
FT /locus_tag="TP01_0003"
FT /db_xref="GeneID:3502673"
FT mRNA complement(<7216..>17805)
FT /locus_tag="TP01_0003"
FT /product="hypothetical telomeric SfiI fragment 20 protein
FT 3"
FT /transcript_id="XM_760437.1"
FT /db_xref="GI:71031776"
FT /db_xref="GeneID:3502673"
FT CDS complement(join(18028..18116,19351..20668))
FT /locus_tag="TP01_0004"
FT /note="go_function: nutrient reservoir activity [goid
FT 0045735]"

FT /codon_start=1
FT /protein_id="XP_765531.1"
FT /db_xref="GI:71031779"
FT /db_xref="GeneID:3503550"
FT gene complement(<18028..>20668)
FT /locus_tag="TP01_0004"
FT /db_xref="GeneID:3503550"
FT mRNA complement(join(<18028..18116,19351..>20668))
FT /locus_tag="TP01_0004"
FT /product="hypothetical protein"
FT /transcript_id="XM_760438.1"
FT /db_xref="GI:71031778"
FT /db_xref="GeneID:3503550"
FT CDS complement(20951..21967)
FT /locus_tag="TP01_0005"
FT /codon_start=1
FT /protein_id="XP_765532.1"
FT /db_xref="GI:71031781"
FT /db_xref="GeneID:3503551"
FT gene complement(<20951..>21967)
FT /locus_tag="TP01_0005"
FT /db_xref="GeneID:3503551"
FT mRNA complement(<20951..>21967)
FT /locus_tag="TP01_0005"
FT /product="hypothetical protein"
FT /transcript_id="XM_760439.1"
FT /db_xref="GI:71031780"
FT /db_xref="GeneID:3503551"


This is my code

#!/usr/bin/perl
$file = 'Muguga.embl ';

open (F, $file) || die ("Could not open $file!");

while ($line = <F>)
{
($field1,$field2,$field3,$field4) = split( "\t" , $line);

print "$field1 $field2 $field3 $field4 \n";
my $string = (FT CDS complement(join(18028..18116,19351..20668))); # string to be searched

if ($string = ~ m/^FT \s+ CDS \s+ complement\(join\([0-9]+\)\)$/)
#search for the first line highlighted in bold
{
print 'match'
} else{
print 'no match';
}
}
close (F);


My wish is to able to search for the lines that are in bold and print them out.

I will be grateful if you are able to help my code though am still a perl newbie.

Thanks.

First things first, looking at your code, you're only searching on the string you provide
my $string = my $string = (FT CDS complement(join(18028..18116,19351..20668))); # string to be searched

So obviously you will only see that one and print it out....Try it with the input data,

Second, from what i'm understanding of your code, you're splitting each line up, so finding the data you want won't happen unless you look at the whole line, unless you modify the regex, and only look at the last field (for example).
So I would recommend changing to this (for the appropriate stuff):
while ($line = <F>)
{
print $line\n";

if ($line = ~ m/^FT \s+ CDS \s+ complement\(join\([0-9]+\)\)$/)
#search for the first line highlighted in bold
{
print 'match'
} else{
print 'no match';
}

#insert additional regex's here

}
close (F);



Third, Once you've done that....try writing a regex for the second line you're trying to get, if you have problems post it and i'll gladly take a look at your regex and make some suggestions....

Fourth, I would recommend putting a counter in, and when you print the "match" you print the Line number as well (makes troubleshooting easier)

Fifth...wow I talk alot...sorry....anyways. You can turn that one regex into something that will match each line you're looking for, but IMHO it'll make for a messy looking regex, but you can do it, then there are variables that the regex will return so if you stick the right things into parentheses you can extract them, but again, I'm for a cleaner easier to follow code so I would break up each particular line into each regex...unless they are REALLY similar but they really don't look like it.

Onaclov.
Reputation Points: 11
Solved Threads: 5
Junior Poster in Training
onaclov2000 is offline Offline
57 posts
since Jun 2008
Jul 13th, 2009
0

Re: kindly help

Perl Syntax (Toggle Plain Text)
  1. $string =~ /FT\s*CDS\s*\w+\(join\(.+\)\)/;

try this, it isnt tested.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
wickedxter is offline Offline
20 posts
since Jul 2009
Jul 20th, 2009
0

Re: kindly help

Click to Expand / Collapse  Quote originally posted by wickedxter ...
Perl Syntax (Toggle Plain Text)
  1. $string =~ /FT\s*CDS\s*\w+\(join\(.+\)\)/;

try this, it isnt tested.
Thanks a lot Wickedxter, The code did work perfectly.
God Bless You.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
rayken1 is offline Offline
8 posts
since Jun 2009

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in Perl Forum Timeline: How to hide user input password in Perl Command Line Interface?
Next Thread in Perl Forum Timeline: using perl to display zip file contents





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC