We're a community of 1077K IT Pros here for help, advice, solutions, professional growth and fun. Join us!
1,076,171 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Start New Discussion Reply to this Discussion

Regexp

Hi,

I'm currently writing a routine to extract one, two and three word phrases from a string but with two or three word phrases I'm not getting all the phrases. For example

the string "blah blah blah" will show one ocurance of "blah blah" when really there is two.

My code is

while ($str =~ m/(\w+) (\w+)/g)
{

}

Any help would be appricated\

Thanks
Bruce

2
Contributors
4
Replies
1 Day
Discussion Span
7 Years Ago
Last Updated
5
Views
thorne44
Newbie Poster
3 posts since Jan 2006
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

That's not your code; it doesn't output anything at all.

While doing a global search, Perl continues just after the end of the previous match.

Rashakil Fol
Super Senior Demiposter
Team Colleague
2,732 posts since Jun 2005
Reputation Points: 1,153
Solved Threads: 182
Skill Endorsements: 25

That's not your code; it doesn't output anything at all.

While doing a global search, Perl continues just after the end of the previous match.

The regexp line is the only line that matters. The code inside just puts $1 and $2 into an array.

The full code is

while ($str =~ m/(\w+) (\w+)/g) 
    {
    $keywords{'I_'.$1.'_'.$2}{'Cnt'} += 2;
    $keywords{'I_'.$1.'_'.$2}{'Word'} = "$1 $2";
    }

My problem is that perl continues after the end of the last match so with a string of "This is a test" I'll get
"this is" and "a test" but I won't get "is a" even though it's a valiv phrase.

Thanks
Bruce

thorne44
Newbie Poster
3 posts since Jan 2006
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

Then don't do it that way. Match only single words, and take the array of single words and work with that. If you're worried about in-between characters, and only want a single space between, do another match for contiguous strings of non-word characters, and then you're set to write some code that ties things together.

Rashakil Fol
Super Senior Demiposter
Team Colleague
2,732 posts since Jun 2005
Reputation Points: 1,153
Solved Threads: 182
Skill Endorsements: 25

Then don't do it that way. Match only single words, and take the array of single words and work with that. If you're worried about in-between characters, and only want a single space between, do another match for contiguous strings of non-word characters, and then you're set to write some code that ties things together.

Thats what I was going to do but I was hoping that using regexp I could do it neater and faster than using a split array

I'd expect something simple like this could be done easily with regexp. I was hoping there was a flag or something that I didn't know about so it could look at more than the last match.

Oh well back to the hard way :rolleyes:

Thanks
Bruce

thorne44
Newbie Poster
3 posts since Jan 2006
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

This article has been dead for over three months: Start a new discussion instead

Post: Markdown Syntax: Formatting Help
 
You
 
© 2013 DaniWeb® LLC
Page rendered in 0.5017 seconds using 2.46MB