Regexp

Please support our Perl advertiser: Programming Forums - DaniWeb Sister Site
Reply

Join Date: Jan 2006
Posts: 3
Reputation: thorne44 is an unknown quantity at this point 
Solved Threads: 0
thorne44 thorne44 is offline Offline
Newbie Poster

Regexp

 
0
  #1
Jan 14th, 2006
Hi,

I'm currently writing a routine to extract one, two and three word phrases from a string but with two or three word phrases I'm not getting all the phrases. For example

the string "blah blah blah" will show one ocurance of "blah blah" when really there is two.

My code is

while ($str =~ m/(\w+) (\w+)/g)
{

}

Any help would be appricated\

Thanks
Bruce
Reply With Quote Quick reply to this message  
Join Date: Jun 2005
Posts: 2,052
Reputation: Rashakil Fol is just really nice Rashakil Fol is just really nice Rashakil Fol is just really nice Rashakil Fol is just really nice 
Solved Threads: 139
Team Colleague
Rashakil Fol's Avatar
Rashakil Fol Rashakil Fol is offline Offline
Super Senior Demiposter

Re: Regexp

 
2
  #2
Jan 15th, 2006
That's not your code; it doesn't output anything at all.

While doing a global search, Perl continues just after the end of the previous match.
All my posts may be redistributed under the GNU Free Documentation License.
Reply With Quote Quick reply to this message  
Join Date: Jan 2006
Posts: 3
Reputation: thorne44 is an unknown quantity at this point 
Solved Threads: 0
thorne44 thorne44 is offline Offline
Newbie Poster

Re: Regexp

 
0
  #3
Jan 15th, 2006
Originally Posted by Rashakil Fol
That's not your code; it doesn't output anything at all.

While doing a global search, Perl continues just after the end of the previous match.
The regexp line is the only line that matters. The code inside just puts $1 and $2 into an array.

The full code is

while ($str =~ m/(\w+) (\w+)/g)
{
$keywords{'I_'.$1.'_'.$2}{'Cnt'} += 2;
$keywords{'I_'.$1.'_'.$2}{'Word'} = "$1 $2";
}

My problem is that perl continues after the end of the last match so with a string of "This is a test" I'll get
"this is" and "a test" but I won't get "is a" even though it's a valiv phrase.

Thanks
Bruce
Reply With Quote Quick reply to this message  
Join Date: Jun 2005
Posts: 2,052
Reputation: Rashakil Fol is just really nice Rashakil Fol is just really nice Rashakil Fol is just really nice Rashakil Fol is just really nice 
Solved Threads: 139
Team Colleague
Rashakil Fol's Avatar
Rashakil Fol Rashakil Fol is offline Offline
Super Senior Demiposter

Re: Regexp

 
3
  #4
Jan 16th, 2006
Then don't do it that way. Match only single words, and take the array of single words and work with that. If you're worried about in-between characters, and only want a single space between, do another match for contiguous strings of non-word characters, and then you're set to write some code that ties things together.
All my posts may be redistributed under the GNU Free Documentation License.
Reply With Quote Quick reply to this message  
Join Date: Jan 2006
Posts: 3
Reputation: thorne44 is an unknown quantity at this point 
Solved Threads: 0
thorne44 thorne44 is offline Offline
Newbie Poster

Re: Regexp

 
0
  #5
Jan 16th, 2006
Originally Posted by Rashakil Fol
Then don't do it that way. Match only single words, and take the array of single words and work with that. If you're worried about in-between characters, and only want a single space between, do another match for contiguous strings of non-word characters, and then you're set to write some code that ties things together.
Thats what I was going to do but I was hoping that using regexp I could do it neater and faster than using a split array

I'd expect something simple like this could be done easily with regexp. I was hoping there was a flag or something that I didn't know about so it could look at more than the last match.

Oh well back to the hard way :rolleyes:

Thanks
Bruce
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:


Thread Tools Search this Thread



Tag cloud for Perl
About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC