1,105,386 Community Members

finding text between two specified words, when one of the two words changes

Member Avatar
romes87
Newbie Poster
21 posts since Aug 2009
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Basically, I am trying to extract text between two strings within a loop as one of the two words changes after the information is extracted.

so for example, the string is:

string = alpha 111 bravo 222 alpha somethingA end, 333 bravo somethingB end 444 alpha 555 bravo

So I want to extract the text between alpha and end and then bravo and end. I have quite a few of these unique words in my file so I have a list and a counter to go through them. See the code below:

string = 'alpha 111 bravo 222 alpha somethingA end, 333 bravo somethingB end 444 alpha 555 bravo'

words = ['alpha', 'bravo'] #there will be more words here

counter = 0

stringOut = ''

#going through the list of words

while counter < len(words):

    firstWord = words[counter]

    lastWord = 'end'     

    data = string[string.find(firstWord)+len(firstWord):string.find(lastWord)].strip() 

    #this will give the text between the first ocurrance of "alpha" and "end"
    #since I want just the smallest string between "alpha" and "end", I use another while loop
    #to see if firstWord occurs again

    while firstWord in data:
            ignore,ignore2,data = data.partition(str(firstWord))

    counter = counter + 1

    stringOut += str(data) + str('\n')

print('output string is \n' + str(stringOut))

#this code gives the correct output for the text between the first word ("alpha") and "end".
#but when the list moves to the next string "bravo", it takes the text between the first "bravo" 
#and the "end" that was associated with the information required for "alpha" ("somethingA")

Can anyone help me with this please? Any suggestions are welcome.

Many thanks.

Member Avatar
pyTony
pyMod
6,103 posts since Apr 2010
Reputation Points: 818 [?]
Q&As Helped to Solve: 1,056 [?]
Skill Endorsements: 42 [?]
Moderator
Featured
 
0
 
Member Avatar
snippsat
Veteran Poster
1,039 posts since Aug 2008
Reputation Points: 483 [?]
Q&As Helped to Solve: 381 [?]
Skill Endorsements: 10 [?]
 
4
 

For fun one with regex,but i guess this is a school task?

import re

string = 'alpha 111 bravo 222 alpha somethingA end, 333 bravo somethingB end 444 alpha 555 bravo'
pattern = re.compile(r'(alpha|bravo)(\s\w+\s)(end)')
for match in pattern.finditer(string):
    print match.group(2).strip()

"""Output-->
somethingA
somethingB
"""
Member Avatar
pyTony
pyMod
6,103 posts since Apr 2010
Reputation Points: 818 [?]
Q&As Helped to Solve: 1,056 [?]
Skill Endorsements: 42 [?]
Moderator
Featured
 
0
 

This code would fail with words like 'bend' or 'send' inside the data, but could give you idea.

t = "alpha 111 bravo 222 alpha somethingA end, 333 bravo somethingB end 444 alpha 555 bravo"
keywords = 'alpha', 'bravo'
if 'end' in t:
    for part in t.split('end')[:-1]:
        last, key = max((part.rfind(key)+len(key), key) for key in keywords)
        print key,':',part[last:]
Member Avatar
romes87
Newbie Poster
21 posts since Aug 2009
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Thank you both for your reply. snippsat. You answer works perfectly.

I managed to get around the problem by marking the index of the string where word appears

word = ['alpha', 'bravo'] #...
counter = 0

marker0 = fileString.index(word)
marker0 = marker0 + len(word)
marker0 = fileString.index(word,marker0)

But your solution looks more robust as it doesnt matter how many times the word appears in the string before the required information!

Thanks! :)

Member Avatar
romes87
Newbie Poster
21 posts since Aug 2009
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

But would using this line of code :

pattern = re.compile(r'(alpha|bravo)(\s\w+\s)(end)')

mean that the whole list of words need to written within this manually?
The reasom I am asking is that I have many of these unique words that I need to extract info for. So :

word = ['alpha', 'bravo', '....', '.....' , 'etc'] #could you quite a few here

Is there a way to use variable within the re.compile statement?

Member Avatar
pyTony
pyMod
6,103 posts since Apr 2010
Reputation Points: 818 [?]
Q&As Helped to Solve: 1,056 [?]
Skill Endorsements: 42 [?]
Moderator
Featured
 
0
 
'|'.join(words)
You
This article has been dead for over three months: Start a new discussion instead
Post:
Start New Discussion
Tags Related to this Article