0

Hello there,

I would like to know if anyone knows a solution for excluding a group within a group in a regular expression. For example:

/(name|(?:last[ ])name|(?:first[ ])name)/i

I would only like the word "name" to be returned (this is just an example), but when I take match number 1 (matched against "last name"), I get "last name" instead of just "name", even though I used "?:" in the capture group. I think it's because I didn't use "?:" in the first group, so therefore I would like to know: is it possible to achieve what I would like to achieve?

Edited by minitauros

3
Contributors
8
Replies
30
Views
4 Years
Discussion Span
Last Post by minitauros
0

For you maniacs who are interested in the real regex:

/
(
    (?<![\d])[0-9]{4}(?![\d])|['"][0-9]{2}(?![\d])
    |(?:(?:jan|feb|mrt|apr|mei|jun|jul|aug|sept?|okt|nov|dec)[ \t]?['"]?)\d{2}
)                                                                                          # Find first year
\s*
(?:[a-z]+[\s]+|\d{1,3}[\s]+|\d{5,}[\s]+){0,2}                                              # Anything between first year and until part?
(?:tot[ ]en[ ]met|t\/m|\bt[ ]m\b|\btm\b|\btot\b|\?|(?<![A-Za-z])\-(?![A-Za-z]))?           # Find until part (momentarily not required, maybe it should be if any errors ever start occuring?
\s*
(?:[a-z]+[\s]+|[\d\/]{1,3}[\s]+|\d{5,}[\s]+|[\d]{1,3}[\/\-]){0,2}                          # Anything between until part and second year?
(
    (?<![\d])[0-9]{4}(?![\d])|['"][0-9]{2}(?![\d])
    |(?:(?:jan|feb|mrt|apr|mei|jun|jul|aug|sept?|okt|nov|dec)[ \t]?['"]?)\d{2}
)                                                                                          # Find second year
/ixs

I want the first and last year to get returned.

0

1 sept 07 - 16 dec 10

I want to find the year numbers. BUT the text might also be something like: 1 sept 2007 - 16 dec 2010, or
sept 2007 - dec 2010
or
09-2007 - 12-2010
or
'07 - '10

and so on.

0

"And so on" may explain it for you, but for a regex you need to specify what you want. That means writing out every combination you want matched and "or" them, since a regex cannot "guess". When you have all patterns, there may be shortcuts.

1 sept 07 - 16 dec 10
1 sept 2007 - 16 dec 2010
\d{1,2} [a-z]+ \d\d(\d\d)? - \d{1,2} [a-z]+ \d\d(\d\d)?

09-2007 - 12-2010
\d{2}-\d{4} - \d{2}-\d{4}

'07 - '10
'\d{2} - '\d{2}

Edited by pritaeas

0

Thanks priteas! I know how that works, I'm just wondering how I can exclude a regex group within a regex group :). So if the container group matches, I want to fetch only part of that match (by using ?: to exclude it, because I need to know the order of the matches).

0

If you use the first regex pattern (OP) and use that against "last name", "name" will not match because it is not equal to "last name". Therefore it takes the second match, which is valid. Provide some test data to explain if this is not what you want:

(\w+ )?(name)

This will return "name" in the second capture group for all three lines.

Edited by pritaeas

0

That is indeed a good one and you are right. However, I might have misformulated my question. What I need is for the regex to match only last name and first name, but to exclude "last" and "first" from the capture. All this must be done in one regex (at least that would be my ideal situation). So, for example:

([a-z]{8}|(?:last[ ])[a-z]{4}|(?:first[ ])[a-z]{4})

I want to find an unknown word consisting of 4 letters, and it MUST be proceeded by either "last" or "first", and I want to exclude that "last" or "first" from the capture, as I want to use $1 to capture the [a-z] part, because a word of exactly 8 chars may also match. Sorry I find it hard to explain the exact need :p. Thanks a lot for your input though!

Edited by minitauros

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.