hey, I tried it with
mpaaget = re.compile('<div class="info-content">(.*?)</div>')
but then I got something else . Could it be because there is a new line after <div class="info-content"> ? How do I take care of that?
Yes, the white space does not fit into your regular expression. Modify like so to match 0 or any number (*) of white space characters (\s):
>>> m = re.compile('<h5><a href="/mpaa">MPAA</a>:</h5>\s*<div class="info-content">\s*(.*?)\s*</div>')
>>> m.findall(h)
['Rated PG for some scary moments and mild language. (also 2009 extended version)']
>>> m.match(h)
>>>
jlm699
Veteran Poster
1,112 posts since Jul 2008
Reputation Points: 355
Solved Threads: 293
Skill Endorsements: 0
This time it's because '?' is a special character in regular expressions (you're using it inside your group). The question mark indicates a greedy match of 1 or more (where as the asterick (*) is a greedy match of 0 or more). To match the question mark character itself you need to escape it in your regex like so: \? . The full regular expression then becomes:
>>> c = re.compile('<a href="/List\?ratings=7">(.*?)</a>')
>>> c.findall(t)
['7.2']
jlm699
Veteran Poster
1,112 posts since Jul 2008
Reputation Points: 355
Solved Threads: 293
Skill Endorsements: 0
They are Flags for compile():
re.MULTILINE (or re.M) string and each line
re.DOTALL (or re.S) match any character, including a newline
re,IGNORECASE (or re.I) case-insensitive matching
vegaseat
DaniWeb's Hypocrite
6,464 posts since Oct 2004
Reputation Points: 1,447
Solved Threads: 1,608
Skill Endorsements: 34
Question Answered as of 3 Years Ago by
ghostdog74,
jlm699
and
vegaseat