Hi,
I'm trying to match patterns of the types

sentence1 = "keywords=walter&keywords=scott"
sentence2 = "keywords=john&"
sentence3 = "keywords=james&keywords=john&keywords=brian&"

so basically the keywords=somestring& part can be repeated once or multiple times. I am trying to extract the string(s) between '=' and '&'. I have come up with the following so far.

pattern = re.compile(r"(keywords=[A-Z0-9._%+-]+[&])+",re.IGNORECASE)

This says it matches the 3rd sentence containing multiple "keywords=" but does not give me the three matches.

pattern.search(sentence3).groups()

gives me ('keywords=scott&') and not the other two keywords present in it.

What am I doing wrong?

Thanks,
Addy

Recommended Answers

All 5 Replies

Show your code perhaps?

Show your code perhaps?

Below is a small program using the above regex. It reads a text file line by line which may contains sentences like

"sometext herekeywords=walter&keywords=scott"
"keywords=john&sometexthere"
"keywords=james&keywords=john& keywords=brian&" etc

import re
def getKeyWords(myline):
      if kwpattern.search(myline):
           #Store the matched pattern in keywords
           keywords = kwpattern.search(myline)
           #Extract the keywords from the pattern match
           keywords = keywords.groups()
           return keywords

#Regex to match the keywords
kwpattern = re.compile(r"KEYWORDLIST=([A-Z0-9._%+-]+)[&]*",re.IGNORECASE)


#Open the file and read it line by line
o = open('somefile.log')
#get the keywords present in the line
for line in o:
      kw = getKeyWords(line)

Perhaps something like this:

>>> patt = r'keywords=([^&]+)[&]'
>>> re.findall(patt, "keywords=james&keywords=john&keywords=brian&")
['james', 'john', 'brian']
>>>

Because the keyword is not always ending with an '&' character a slight adjustment

>>> patt = r'keywords=([^&]+)'
>>> re.findall(patt, "keywords=james&keywords=john&keywords=brian")
['james', 'john', 'brian']

I am not a user (of regex's) so would do something like the following. You might prefer to find the first "&" and slice from there instead of using split and join.

test_data = ["sometext herekeywords=walter&keywords=scott",
             "keywords=john&sometexthere",
             "keywords=james&keywords=john& keywords=brian&"]

delim = 'keywords='
found_list = []
for rec in test_data:
   ## find all occurances of 'keywords='
   start = rec.find(delim)
   ctr = 0     ## I always limit while loops when testing
   while (start > -1) and (ctr < 10):
      rec = rec[start+len(delim):]  ## everything after delim
      if "&" in rec:
         substrs = rec.split("&")
         found_list.append(substrs[0])  ## before first "&"
         rec = "&".join(substrs[1:])  ## put the rest back together
      else:
         found_list.append(rec)

      start = rec.find(delim)
      ctr += 1

print found_list
#
# output is ['walter', 'scott', 'john', 'james', 'john', 'brian']
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.