I've been trying to match some incoming strings from a mud(text based rpg) and I'm having some trouble.

I need a regex that will match the <mob> tag in these strings. You can have a separate regex per sentence to match the surrounding text.

You would stomp <mob> into the ground.
<mob> would be easy, but is it even worth the work out?
No Problem! <mob> is weak compared to you.
<mob> looks a little worried about the idea.
<mob> should be a fair fight!
<mob> snickers nervously.

Example sentences
A chubby porcupine should be a fair fight!
No Problem! A small dark viper is weak compared to you.

An example regex could be '(.*) should be a fair fight!'. But that would match the entire mob tag and not separate them into groups.

<mob> can be a bunch of words separated by spaces.

However, I only want words that are longer than two characters in length.

Also I would like each word stored as a separate group for easy access.

For example, in the cast of "A chubby porcupine", I would only want chubby and porcupine as the result.

I have medium skill with regex; I tried a bunch of things with no success. Best I got was matching one of the words with length greater than two, like porcupine.

Edited 6 Years Ago by shwick: n/a

Very quickly, here's code that works for the first sentence. The others are parallel. Note that I had to fall out of regex just a little bit in order to filter out the short words.

#!/usr/bin/python

import re

s0 = r"You would stomp (?P<mob>(\S+\s)+)into the ground."
r0 = re.compile(s0)



test0 = [
  ["You would stomp a hairy blonde gorilla into the ground.",('hairy','blonde','gorilla')],
  ["You would not stomp a hairy blonde gorilla into the ground.",()],
  ["You would stomp an ex into the ground.",()],
  ["You would stomp many a bad guy into the ground.",('many','bad','guy')],
  ]

for t,x in test0:
  m = r0.match(t)
  print 'target',t
  print 'expect',x
  if m:
    mob = [x.rstrip() for x in m.group('mob').split() if len(x) > 2]
  else:
    mob = ()
  print 'found  %s'%str(tuple(mob))


""" Output:
target You would stomp a hairy blonde gorilla into the ground.
expect ('hairy', 'blonde', 'gorilla')
found  ('hairy', 'blonde', 'gorilla')
target You would not stomp a hairy blonde gorilla into the ground.
expect ()
found  ()
target You would stomp an ex into the ground.
expect ()
found  ()
target You would stomp many a bad guy into the ground.
expect ('many', 'bad', 'guy')
found  ('many', 'bad', 'guy')
"""

Here is a version which builds a single regex for all the sentences. You can add as many sentences as you whish, as long as they contain a single occurence of <mob>

# tested with python 2.6 and 3.1
import re

sentences = """
You would stomp <mob> into the ground.
<mob> would be easy, but is it even worth the work out?
No Problem! <mob> is weak compared to you.
<mob> looks a little worried about the idea.
<mob> should be a fair fight!
<mob> snickers nervously.
""".strip().split("\n")

mob_re = re.compile(r"([^<]*)\<mob\>(.*)")
valid_set = None

def make_group(substrings):
    L = sorted(set(re.escape(s) for s in substrings), key = len, reverse = True)
    M = L if L[-1] else L[:-1]
    group = "(?:%s)" % ("|".join(s for s in M))
    if not L[-1]:
        group = group + "?"
    return "(%s)" % group

def make_re(sentences):
    global valid_set
    L = [(m.group(1), m.group(2)) for m in [mob_re.match(s) for s in sentences]]
    valid_set = set(L)
    begin, end = (make_group(x[i] for x in L) for i in range(2))
    return re.compile(r"%s(\s*(?:[a-zA-Z]\s*)*)%s" % (begin, end))

mud_re = make_re(sentences)


def match_mob(sentence):
    m = mud_re.match(sentence)
    if m:
        begin, mob, end = (m.group(i) for i in (1, 2, 3))
        if (begin, end) in valid_set:
            return [x for x in mob.strip().split() if len(x) > 1]
    return None

def main():
    for s in [
        "You would stomp a hairy blonde gorilla into the ground.",
        "A chubby porcupine should be a fair fight!",
        "No Problem! A small dark viper is weak compared to you.",
        ]:
        print(s)
        print(match_mob(s))

main()

"""
My output --->
You would stomp a hairy blonde gorilla into the ground.
['hairy', 'blonde', 'gorilla']
A chubby porcupine should be a fair fight!
['chubby', 'porcupine']
No Problem! A small dark viper is weak compared to you.
['small', 'dark', 'viper']
"""

Note: match_mob returns None if there is no match, otherwise the list of words of length at least 2 which constitute <mob>...

Edited 6 Years Ago by Gribouillis: n/a

thanks for the solutions

it looks like it can't be done by only using regex, which is fine

My opinion is that it can be done without re.

# tested with python 2.6 and 3.1
sentences = """
You would stomp <mob> into the ground.
<mob> would be easy, but is it even worth the work out?
No Problem! <mob> is weak compared to you.
<mob> looks a little worried about the idea.
<mob> should be a fair fight!
<mob> snickers nervously.
Hey! This is <mob>
""".strip().split("\n")
## added test case for <mob> in the end

def match_mob(sent):
    for b,_ ,e in matching:
##        print b,_,e
        if b:
            _,m,end = sent.partition(b)
        else: end =sent
            
        if not _ and m or not b:
            if e: mob,m,_ = end.partition(e)
            else: mob = end
            if not _ and m or not e:
                return tuple(part for part in mob.split() if len(part)>2)
matching=[]
for i in sentences:
    matching.append(i.partition('<mob>'))    

for s in [
        "You would stomp a hairy blonde gorilla into the ground.",
        "A chubby porcupine should be a fair fight!",
        "No Problem! A small dark viper is weak compared to you.",
        "Hey! This is a big ape"
        ]:
        print(s)
        print(match_mob(s))

""" Output:
You would stomp a hairy blonde gorilla into the ground.
('hairy', 'blonde', 'gorilla')
A chubby porcupine should be a fair fight!
('chubby', 'porcupine')
No Problem! A small dark viper is weak compared to you.
('small', 'dark', 'viper')
Hey! This is a big ape
('big', 'ape')
"""
Comments
nice use of str.partition
This article has been dead for over six months. Start a new discussion instead.