1.11M Members

finding text between two specified words, when one of the two words changes

 
0
 

Basically, I am trying to extract text between two strings within a loop as one of the two words changes after the information is extracted.

so for example, the string is:

string = alpha 111 bravo 222 alpha somethingA end, 333 bravo somethingB end 444 alpha 555 bravo

So I want to extract the text between alpha and end and then bravo and end. I have quite a few of these unique words in my file so I have a list and a counter to go through them. See the code below:

string = 'alpha 111 bravo 222 alpha somethingA end, 333 bravo somethingB end 444 alpha 555 bravo'

words = ['alpha', 'bravo'] #there will be more words here

counter = 0

stringOut = ''

#going through the list of words

while counter < len(words):

    firstWord = words[counter]

    lastWord = 'end'     

    data = string[string.find(firstWord)+len(firstWord):string.find(lastWord)].strip() 

    #this will give the text between the first ocurrance of "alpha" and "end"
    #since I want just the smallest string between "alpha" and "end", I use another while loop
    #to see if firstWord occurs again

    while firstWord in data:
            ignore,ignore2,data = data.partition(str(firstWord))

    counter = counter + 1

    stringOut += str(data) + str('\n')

print('output string is \n' + str(stringOut))

#this code gives the correct output for the text between the first word ("alpha") and "end".
#but when the list moves to the next string "bravo", it takes the text between the first "bravo" 
#and the "end" that was associated with the information required for "alpha" ("somethingA")

Can anyone help me with this please? Any suggestions are welcome.

Many thanks.

 
4
 

For fun one with regex,but i guess this is a school task?

import re

string = 'alpha 111 bravo 222 alpha somethingA end, 333 bravo somethingB end 444 alpha 555 bravo'
pattern = re.compile(r'(alpha|bravo)(\s\w+\s)(end)')
for match in pattern.finditer(string):
    print match.group(2).strip()

"""Output-->
somethingA
somethingB
"""
 
0
 

This code would fail with words like 'bend' or 'send' inside the data, but could give you idea.

t = "alpha 111 bravo 222 alpha somethingA end, 333 bravo somethingB end 444 alpha 555 bravo"
keywords = 'alpha', 'bravo'
if 'end' in t:
    for part in t.split('end')[:-1]:
        last, key = max((part.rfind(key)+len(key), key) for key in keywords)
        print key,':',part[last:]
 
0
 

Thank you both for your reply. snippsat. You answer works perfectly.

I managed to get around the problem by marking the index of the string where word appears

word = ['alpha', 'bravo'] #...
counter = 0

marker0 = fileString.index(word)
marker0 = marker0 + len(word)
marker0 = fileString.index(word,marker0)

But your solution looks more robust as it doesnt matter how many times the word appears in the string before the required information!

Thanks! :)

 
0
 

But would using this line of code :

pattern = re.compile(r'(alpha|bravo)(\s\w+\s)(end)')

mean that the whole list of words need to written within this manually?
The reasom I am asking is that I have many of these unique words that I need to extract info for. So :

word = ['alpha', 'bravo', '....', '.....' , 'etc'] #could you quite a few here

Is there a way to use variable within the re.compile statement?

 
0
 
'|'.join(words)
You
This article has been dead for over six months: Start a new discussion instead
Post:
Start New Discussion
Tags Related to this Article