The following code:
Exits immediately and produces the message below when search pattern "\b\w+{3}\b" is entered:

#!/usr/bin/perl -w
#matchtest1.pl
use strict;

my($pattern);
my($true);
$_ = '1: A silly sentence (495,a) *BUT* one which will be useful. (3)';

do {
   print "Enter a regular expression: ";
   chomp($pattern = <STDIN>);

   if (/($pattern)/g){ # Search pattern must be enclosed in ()'s
      print "$pattern found in $_\n";
      print "\$& = $&\n";
      print "\$1 is '$1'\n" if defined $1;
      print "\$2 is '$2'\n" if defined $2;
      print "\$3 is '$3'\n" if defined $3;
      print "\$4 is '$4'\n" if defined $4;
      print "\$5 is '$5'\n" if defined $5;
   }
   else{
      print "$pattern NOT found\n";
   }
   print "Enter 1 to continue, 0 to exit -> "; chomp($true=<STDIN>);
}while($true)

(Yes there is a nested quantifier in the search pattern "\b\w+{3}\b", but why does this cause the code to abort?)

Nested quantifiers in regex; marked by <-- HERE in m/(\b\w+{ <-- HERE 3}\b)/ at matchtest1.pl line 13, <STDIN> line 1.


Bonus Question!!!
Although the global modifier is used (/($pattern)/g) patterns with multiple matches only return the first match $1. Why?

Recommended Answers

All 2 Replies

It aborts because you can't have nested quantifiers.

Your expectation of what the "g" modifier does is also wrong. Here is an excerpt from perlretut:

Global matching

The final two modifiers //g and //c concern multiple matches. The modifier //g stands for global matching and allows the matching operator to match within a string as many times as possible. In scalar context, successive invocations against a string will have `//g jump from match to match, keeping track of position in the string as it goes along. You can get or set the position with the pos() function.

The use of //g is shown in the following example. Suppose we have a string that consists of words separated by spaces. If we know how many words there are in advance, we could extract the words using groupings:

1. $x = "cat dog house"; # 3 words
2. $x =~ /^\s*(\w+)\s+(\w+)\s+(\w+)\s*$/; # matches,
3. # $1 = 'cat'
4. # $2 = 'dog'
5. # $3 = 'house'

But what if we had an indeterminate number of words? This is the sort of task //g was made for. To extract all words, form the simple regexp (\w+) and loop over all matches with /(\w+)/g :

1. while ($x =~ /(\w+)/g) {
2. print "Word is $1, ends at position ", pos $x, "\n";
3. }

prints

1. Word is cat, ends at position 3
2. Word is dog, ends at position 7
3. Word is house, ends at position 13

A failed match or changing the target string resets the position. If you don't want the position reset after failure to match, add the //c , as in /regexp/gc . The current position in the string is associated with the string, not the regexp. This means that different strings have different positions and their respective positions can be set or read independently.

In list context, //g returns a list of matched groupings, or if there are no groupings, a list of matches to the whole regexp. So if we wanted just the words, we could use

1. @words = ($x =~ /(\w+)/g); # matches,
2. # $word[0] = 'cat'
3. # $word[1] = 'dog'
4. # $word[2] = 'house'

It aborts because you can't have nested quantifiers.

Your expectation of what the "g" modifier does is also wrong. Here is an excerpt from perlretut:

Global matching

The final two modifiers //g and //c concern multiple matches. The modifier //g stands for global matching and allows the matching operator to match within a string as many times as possible. In scalar context, successive invocations against a string will have `//g jump from match to match, keeping track of position in the string as it goes along. You can get or set the position with the pos() function.

The use of //g is shown in the following example. Suppose we have a string that consists of words separated by spaces. If we know how many words there are in advance, we could extract the words using groupings:

1. $x = "cat dog house"; # 3 words
2. $x =~ /^\s*(\w+)\s+(\w+)\s+(\w+)\s*$/; # matches,
3. # $1 = 'cat'
4. # $2 = 'dog'
5. # $3 = 'house'

But what if we had an indeterminate number of words? This is the sort of task //g was made for. To extract all words, form the simple regexp (\w+) and loop over all matches with /(\w+)/g :

1. while ($x =~ /(\w+)/g) {
2. print "Word is $1, ends at position ", pos $x, "\n";
3. }

prints

1. Word is cat, ends at position 3
2. Word is dog, ends at position 7
3. Word is house, ends at position 13

A failed match or changing the target string resets the position. If you don't want the position reset after failure to match, add the //c , as in /regexp/gc . The current position in the string is associated with the string, not the regexp. This means that different strings have different positions and their respective positions can be set or read independently.

In list context, //g returns a list of matched groupings, or if there are no groupings, a list of matches to the whole regexp. So if we wanted just the words, we could use

1. @words = ($x =~ /(\w+)/g); # matches,
2. # $word[0] = 'cat'
3. # $word[1] = 'dog'
4. # $word[2] = 'house'

Thanks so much for your time Kevin.
Looked up the CPAN docs and have figured out where my thinking was taking a left turn. Got it figured out and on track, you really saved me hours of banging my head against the CRT. As usual most problems with code are simple AFTER you see the answer!

Thank You!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.