0

Hi guys,
this is the first time for me to post something, i'm really a big fan of the site and so far every time i get stuck in something at uni i come to here. so thanks for helping me survive so far.

anywho, i'm here because i need something, well i would've said thanks either way <<< lair

i'm writing a python parser that could find any time format in a given source.
i thought i'm doing good until i realized that the search function in re only gets the first occurrence of the pattern givin to it. so i used find all with the same pattern and this what happened:

>>> import re
>>> hourRE = re.compile(r"[0-2]?\d(:[0-5]\d(-[0-2]?\d:[0-5]\d|[ap]m)?|[ap]m)", re.IGNORECASE)
>>> temp = hourRE.search("10am , 10:00 , 3pm")
>>> temp.group()
'10am'
>>> temp = hourRE.findall("10am , 10:00 , 3pm")
>>> temp
[('am', ''), (':00', ''), ('pm', '')]

could anyone tell why does this happen and what does it mean?

2
Contributors
4
Replies
6
Views
6 Years
Discussion Span
Last Post by mnmo88
0

hey griswolf,
thanks for the fast reply.

ok im fine with the grouping, but why does search() find the whole time and findall() skips the first two digits and find an empty character or what ever this '' is even though the result is generated from the same pattern?

0

search is returning the first entire match. findall is returning the whole match broken into groups. Try running it with an example that has more then one match. (I actually find their docs a little confusing right here. Running simple examples is a good way to learn what they should have said)

edit add:
I found this example useful

import re

pattern = r'([abc]?\d)*([def]?\d)+'
patRe = re.compile(pattern)

tests = [
  'xd1x','xd1xd2xd3','xxa1d1xxd2xxa3d3xx',
  ]
for t in tests:
  print "S (%s)"%t, patRe.search(t).group()
  print "F (%s)"%t, patRe.findall(t)
"""output:
S (xd1) d1
F (xd1) [('', 'd1')]
S (xd1xd2xd3) d1
F (xd1xd2xd3) [('', 'd1'), ('', 'd2'), ('', 'd3')]
S (xxa1d1xxd2xxa3d3xx) a1d1
F (xxa1d1xxd2xxa3d3xx) [('a1', 'd1'), ('', 'd2'), ('a3', 'd3')]
"""

See that findall shows a list of tuples. The tuples are matches, the items in the tuple are the part of the string that matched that group in that match.

Edited by griswolf: n/a

0

i think i know the problem now. i thinks the problem is in the regex pattern.
because when i tried a simple example as you said this is what turned out:

>>> time = re.compile("[0-9][ap]m")
>>> time.findall("7am 8am 2am")
['7am', '8am', '2am']
>>> time.search("7am 8am 2am").group()
'7am'
>>>

thanks for your help i'll try to fix the regex by the way did you see any mistakes in it?

Edited by mnmo88: n/a

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.