Regular expression to find all words possible

Question

arindam31 -8 Junior Poster in Training

13 Years Ago

Hi guys,

My intention is to find all possible words starting with any letter of my choice example 'p' from a paragraph.I want all the possible results . How can i do that .

Take the example :

'This rule says that any match that begins earlier in the string is always preferred over any plausible match that begins later. This rule doesn\'t say anything about how long the winning match might be (we\'ll get into that shortly), merely that among all the matches possible anywhere in the string, the one that begins the leftmost in the string is chosen. Actually, since more than one plausible match can start at the same earliest point, perhaps the rule should read "a match...\'\' instead of "the match...,\'\' but that sounds odd'

The results should be
preferred
plausible
..
..
..

Also i want to know how can i get multiple search results if we already know that there will be multiple matches .

Thanks

python

Edited 13 Years Ago by arindam31 because: n/a

4 Contributors
9 Replies
244 Views
22 Hours Discussion Span
Latest Post 13 Years Ago Latest Post by arindam31

All 9 Replies

JoshuaBurleson 23 Posting Whiz

13 Years Ago

Show us the code you have so far, you need to show effort around here.

TrustyTony 888 ex-Moderator

13 Years Ago

The regex differs from globbing in that * and + modify previous symbol and any chatacter is . not ?. So p+ matches any number of p's at least one ie p, pp, ppp, .... Look for special symbols starting with \ to put suitable one between p and +

Read re documentation from Internernet or from IDLE help file.

Edited 13 Years Ago by TrustyTony because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

arindam31 -8 Junior Poster in Training · Answer 1 · 2011-09-28T23:28:01+00:00

Show us the code you have so far, you need to show effort around here.

Ofcourse...

These are the things i was trying.

>>> str1

'This rule says that any match that begins earlier in the string is always preferred over any plausible match that begins later. This rule doesn\'t say anything about how long the winning match might be (we\'ll get into that shortly), merely that among all the matches possible anywhere in the string, the one that begins the leftmost in the string is chosen. Actually, since more than one plausible match can start at the same earliest point, perhaps the rule should read "a match...\'\' instead of "the match...,\'\' but that sounds odd'

>>> var=re.compile("p+")
>>> var2=var.search(str1)
>>> var2.group()

Traceback (most recent call last):
File "<pyshell#82>", line 1, in <module>
var2.group()
AttributeError: 'NoneType' object has no attribute 'group'

This leads me to doubt that whether , the '+' sign can be used with letters.
I was expecting to get the first match atleast.
Next i have no idea how to get multiple search results.

By the way ,Thank you for replying.

arindam31 -8 Junior Poster in Training · Answer 2 · 2011-09-28T23:40:43+00:00

This is what happens when i use this code.

>>> var=re.compile("per+")
>>> var2=var.search(str1)
>>> var2.group()
'per'

I cant understand why the plus sign is not working here.

arindam31 -8 Junior Poster in Training · Answer 3 · 2011-09-28T23:50:31+00:00

The regex differs from globbing in that * and + modify previous symbol and any chatacter is . not ?. So p+ matches any number of p's at least one ie p, pp, ppp, .... Look for special symbols starting with \ to put suitable one between p and +
Read re documentation from Internernet or from IDLE help file.

Thank you boss.That was indeed the problem.The below code worked .

>>> var=re.compile("p\w+")
>>> var2=var.search(str1)
>>> var2.group()
'preferred'

Now, for the second part, how can i get all the results matching the pattern .?

Will findall() work?

arindam31 -8 Junior Poster in Training · Answer 4 · 2011-09-29T00:19:47+00:00

Hurrah i found the solution. Myself......Eh.sorry for that.

>>> var=re.compile(r"p\w+")
>>> var2=[]
>>> var2=var.findall(str1)
>>> var2
['preferred', 'plausible', 'possible', 'plausible', 'point', 'perhaps']

Findall does work..

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 5 · 2011-09-29T00:28:35+00:00

Yes, I checked it also with Python without re:

print(','.join(w for w in (''.join(c for c in word if c.isalpha()) for word in 'This rule says that any match that begins earlier in the string is always preferred over any plausible match that begins later. This rule doesn\'t say anything about how long the winning match might be (we\'ll get into that shortly), merely that among all the matches possible anywhere in the string, the one that begins the leftmost in the string is chosen. Actually, since more than one plausible match can start at the same earliest point, perhaps the rule should read "a match...\'\' instead of "the match...,\'\' but that sounds odd'.split()) if w.startswith('p')))
preferred,plausible,possible,plausible,point,perhaps

snippsat 661 Master Poster · Answer 6 · 2011-09-29T04:08:45+00:00

Just a note var2=[] is not needed as findall() is returing a list. finditer() can also be a good solution for iterating over larger files.
Always use raw string with regex r' '

import re

data = '''\
This rule says that any match that begins earlier
in the string is always preferred over any plausible match that begins later.'''

pattern = re.compile(r'p\w+')
for match in pattern.finditer(data):
    print match.group()

'''Out-->
preferred
plausible
'''

arindam31 -8 Junior Poster in Training · Answer 7 · 2011-09-29T15:47:47+00:00

Just a note var2=[] is not needed as findall() is returing a list. finditer() can also be a good solution for iterating over larger files.
Always use raw string with regex r' '
import re

data = '''\
This rule says that any match that begins earlier
in the string is always preferred over any plausible match that begins later.'''

pattern = re.compile(r'p\w+')
for match in pattern.finditer(data):
    print match.group()

'''Out-->
preferred
plausible
'''

Nice tip dude..I thought that we have to create a list variable ourselves first......Thanks for the correction....

Regular expression to find all words possible

Recommended Answers Collapse Answers

All 9 Replies

Recommended Answers