I need some quick regex help with matching the same word twice in a pattern.

For example, I'm trying to match on the PHP code: if (isset($foo) AND $foo) or if (isset($bar) AND $bar) where $foo and $bar are any valid variable name. However, I don't want to match on if (isset($foo) AND $bar).

Can anyone help me with this? Thanks!!

Recommended Answers

All 7 Replies

I posted a suggestion but after I got the exact definition of a php variable and fired up my regexptester it didn't pan out. I'll play with it and hopefully have something for you in a bit.

This expression will find all of the php variables in a line

(\$[a-zA-Z_][a-zA-Z_0-9]*)

\$              The initial literal $
[a-zA-Z_]       A lower/upper case letter or underscore
[a-zA-Z_0-9]*   0 or more lower/upper case letters, digits, or underscores

If I run this against

if ($foo == $bar): $foobar = $foo+$bar

I get the following matches

if ($foo == $bar): $foobar = $foo+$bar
    ----    ----   -------   ---- ----

Here's where it gets tricky. You can use numbered back references as in \1, \2, etc. where \1 refers to the first group (in this case a php variable), \2 the second, etc.

The problem is trying to detect duplicates. If I add to my patternregex

.*\1

which means any non-null sequence followed by the first match I end up with

if ($foo == $bar): $foobar = $foo+$bar
        -----------------------------

which says "I have found a line with a duplicate $foo". That's dandy, but if the line is

if ($foo == $bar): $foobar = $fnord

then the result is

if ($foo == $bar): $foobar = $fnord
        -------------------

If you are willing to accept the odd false positive then

(\$[a-zA-Z_][a-zA-Z_0-9]*).*\1

may be what you are looking for

If you are scanning your code with a python script you could use

(\$[a-zA-Z_][a-zA-Z_0-9]*)

to match all of the php variables in a line, extract them into a dict and check to see if you have any dict entries with more than one occurrance. Pretty trivial as python scripts go.

OK. Try this

(\$[a-zA-Z_][a-zA-Z_0-9]*)(?=.*\1[^a-zA-Z_0-9])

(\$[a-zA-Z_][a-zA-Z_0-9]*)      a php variable (grouped)
(?=.*\1[^a-zA-Z_0-9])           a look-ahead expression 

The second part says "look ahead for the matched php variable as long as it isn't part of a longer variable name." If you get a line with a hit then you have a repeated php variable in that line.

I really appreciate the time you took, but I think you misunderstood my question. I'm trying to match specifically on the following pattern:

if (isset(<<capture group>>) AND <<backreference to capture group>>)

I just don't know how to do a backreference.

I found that my code does a lot of if (isset($var) AND $var) and so I wanted to use a regex find/replace refactor all my code en masse to something such as if (!empty($var)).

Currently the find/replace feature of my IDE looks like this:

Find => if (isset(\$(.+)) AND <don't know what to put here>)
Replace => !empty($1)

the back reference is done with \# where you indicate (1-9) the desired group. You create the pattern for what you want to match by enclosing it in parentheses. In your case it would be

if \(isset\((\$[a-zA-Z_][a-zA-Z_0-9]*)\) AND \1\)

Note that to put in a literal ( or ) you have to escape with a backslash.

commented: Thanks! +34
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.