As a personal project I've decided to write a small script which will take a raw_input film title, then look up the IMDB rating and return the result. As an extra challenge I decided to employ re.
Now, this is how far I have got (yes, I am yet to wrap most things in functions, I will do this when i have ironed out the following problems):
from BeautifulSoup import BeautifulSoup import urllib2 import re #get source code of page (function used later) def fetchsource(url): url = urllib2.urlopen(url) source = url.read() return source #ask for film title title = raw_input("Please enter a film title: ") #format the raw_input string for searching raw_string = re.compile(' ') #search for a space in string searchstring = raw_string.sub('+', title) #replace with + print searchstring #find the film page url url = "http://www.imdb.com/find?s=" + searchstring print url source = fetchsource(url) soup = BeautifulSoup(source) filmlink = soup.find('a', href=re.compile("title\/tt[0-9]*\/")) print filmlink
If you run this code, it prints the film string and the search url fine: the problem is that my regex for getting the url of the film page from the search results page never produces anything. So "filmlink" is always empty. I'm not really sure why I'm getting no value here.
Is my regex bad, or have I not put the right options in?
Also, I don't quite understand exactly what I am doing with re.compile() but it works! Could somebody possibly write an easy to understand sentence or two?
Many thanks for your help.