I am trying to use Beautiful Soup to scrape a website, Locationary.com, and get some information from it. I am a member and even when I'm logged in this doesn't work...
OK. This first bit of code just returns the HTML of Locationary.com (the home page) in a "pretty" form. And it works!!!
import urllib2 from BeautifulSoup import BeautifulSoup page = urllib2.urlopen('http://www.locationary.com/').read() soup = BeautifulSoup(page) print soup.prettify()
However when I add more stuff to the URL, such as a place page on their website, I get a bad result...
import urllib2 from BeautifulSoup import BeautifulSoup page = urllib2.urlopen('http://www.locationary.com/place/en/US/North_Carolina/Raleigh/Noodles_%26_Company-p1022884996.jsp').read() soup = BeautifulSoup(page) print soup.prettify()
With the above code, Python gives me something like this:
‹ (with a big dot at the end that won't copy!!!)
Does anybody know why this is happening? How come it can give me the HTML of the website's main page but not one of its other pages? What are these few weird characters Python is giving me?
I would appreciate any help. Thanks!