Regex problem (search, findall)

Question

mnmo88 0 Newbie Poster

13 Years Ago

Hi guys,
this is the first time for me to post something, i'm really a big fan of the site and so far every time i get stuck in something at uni i come to here. so thanks for helping me survive so far.

anywho, i'm here because i need something, well i would've said thanks either way <<< lair

i'm writing a python parser that could find any time format in a given source.
i thought i'm doing good until i realized that the search function in re only gets the first occurrence of the pattern givin to it. so i used find all with the same pattern and this what happened:

>>> import re
>>> hourRE = re.compile(r"[0-2]?\d(:[0-5]\d(-[0-2]?\d:[0-5]\d|[ap]m)?|[ap]m)", re.IGNORECASE)
>>> temp = hourRE.search("10am , 10:00 , 3pm")
>>> temp.group()
'10am'
>>> temp = hourRE.findall("10am , 10:00 , 3pm")
>>> temp
[('am', ''), (':00', ''), ('pm', '')]

could anyone tell why does this happen and what does it mean?

python regex

2 Contributors
4 Replies
141 Views
7 Hours Discussion Span
Latest Post 13 Years Ago Latest Post by mnmo88

All 4 Replies

griswolf 304 Veteran Poster

13 Years Ago

Each parenthesized sub-expression creates a group. http://docs.python.org/library/re.html#regular-expression-syntax (look for (...) ) The findall method returns groups.

Edited 13 Years Ago by griswolf because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

mnmo88 0 Newbie Poster · Answer 1 · 2011-01-21T05:06:23+00:00

hey griswolf,
thanks for the fast reply.

ok im fine with the grouping, but why does search() find the whole time and findall() skips the first two digits and find an empty character or what ever this '' is even though the result is generated from the same pattern?

griswolf 304 Veteran Poster · Answer 2 · 2011-01-21T05:16:09+00:00

search is returning the first entire match. findall is returning the whole match broken into groups. Try running it with an example that has more then one match. (I actually find their docs a little confusing right here. Running simple examples is a good way to learn what they should have said)

edit add:
I found this example useful

import re

pattern = r'([abc]?\d)*([def]?\d)+'
patRe = re.compile(pattern)

tests = [
  'xd1x','xd1xd2xd3','xxa1d1xxd2xxa3d3xx',
  ]
for t in tests:
  print "S (%s)"%t, patRe.search(t).group()
  print "F (%s)"%t, patRe.findall(t)
"""output:
S (xd1) d1
F (xd1) [('', 'd1')]
S (xd1xd2xd3) d1
F (xd1xd2xd3) [('', 'd1'), ('', 'd2'), ('', 'd3')]
S (xxa1d1xxd2xxa3d3xx) a1d1
F (xxa1d1xxd2xxa3d3xx) [('a1', 'd1'), ('', 'd2'), ('a3', 'd3')]
"""

See that findall shows a list of tuples. The tuples are matches, the items in the tuple are the part of the string that matched that group in that match.

mnmo88 0 Newbie Poster · Answer 3 · 2011-01-21T05:44:24+00:00

i think i know the problem now. i thinks the problem is in the regex pattern.
because when i tried a simple example as you said this is what turned out:

>>> time = re.compile("[0-9][ap]m")
>>> time.findall("7am 8am 2am")
['7am', '8am', '2am']
>>> time.search("7am 8am 2am").group()
'7am'
>>>

thanks for your help i'll try to fix the regex by the way did you see any mistakes in it?

Regex problem (search, findall)

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers