something like this:

open red box with key

or

open red box

to be broken down to open as .group("verb"), red box as .group("object"), with as .group("preposition"), and key as .group("indirectobj")

my current pattern is "^(?P<verb>open)\W*(?P<object>\w*\W{1})\W*(?P<preposition>with|\Z)\W*(?P<indirectobj>\w*)" it's not working, and i'm kinda out of ideas.

I do not know about regex magic so much, but here is my plain Python code, if it may help you:

sents= ('open red box with key','open red box','open box', 'open box with key')
for sentence in sents:
    sentence = sentence.split()
    print sentence
    verb, obj = sentence.pop(0), sentence.pop(0) ## object is used in python
    if len(sentence)<=1:
        if sentence:
            obj+= ' '+sentence.pop(0)
        print 'verb: ',verb,',object:',obj
    elif len(sentence)==2:
        preposition, indirectobj = sentence.pop(0),sentence.pop()
        print 'verb: ',verb,', object: ',obj,', preposition: ',preposition, ', indirectobj: ', indirectobj
    else:
        obj+= ' '+sentence.pop(0)
        preposition, indirectobj = sentence.pop(0),sentence.pop()
        print 'verb: ',verb,', object: ',obj,', preposition: ',preposition, ', indirectobj: ', indirectobj

yeah but my parser also should expect something like "look", which would make regex much more efficient. Thanks for the idea though, i'll try to incorporate it along with regex

If you use findall you can use something simple as this.

import re

text = '''\
my open verb in in red box test kk55 with me as key.
'''

test_match = re.findall(r'open|red|box|with|key' ,text)
print test_match
#->['open', 'red', 'box', 'with', 'key']

Not groups but a list you can use.

print test_match[2]
#->box

not exactly sure you got what i mean.

I need a regex pattern that would find the verb, object, preposition, and indirect object. Everything but the verb is optional, and object and indirect object can be up to 2 words each.

This article has been dead for over six months. Start a new discussion instead.