Searching Text (Python)

vegaseat 0 Tallied Votes 162 Views Share

Here is a generator function using find() to do a search for all the occurances of a substring in a text. It will give you all the positions/indexes within the text where the substring is located. The sample text here can be replaced by text read in from a file.

# find the indexes of all occurances of a substring in a text
# tested with Python24     vegaseat     03oct2005

str1 = \
'''Many many years ago when I was twenty three,
I got married to a widow, who was pretty as could be.
This widow had a grown-up daughter
who had much hair of red.
My father fell in love with her,
and soon the two were wed.

This made my dad my son-in-law
and changed my very life.
My daughter was my mother,
for she was my father's wife.

To complicate the matters worse,
although it brought me joy,
I soon became the father
of a bouncing baby boy.

My little baby then became
a brother-in-law to dad,
and so became my uncle,
which made me very sad.

For if he was my uncle,
it also made him brother
to the widow's grown-up daughter,
who of course, was my stepmother.

Father's wife then had a son,
who kept them on the run.
This boy became my grandson,
for he was my daughter's son.

My wife is now my mother's mother
and makes me rather blue.
Because, although she is my wife,
she's my grandmother too.

If my wife is my grandmother,
then I am her grandchild.
Every time I think of it,
it simply drives me wild.

For now I have become
the strangest case you ever saw.
As the husband of my grandmother,
I am my own grandpa!'''

def findSubstring(text, substring):
    """position pos (index) is moved up for each call to this generator"""
    pos = -1
    while True:
        # move index up on next call
        pos = text.find(substring, pos + 1)
        # not found or done
        if pos < 0:
            break
        yield pos


substring = 'daughter'
for index in findSubstring(str1, substring):
    print "'%s' is at index %d" % (substring, index)