I want to fetch a web page and parse links in that. I am using the foll. code

file =urllib.urlopen("file:///home/suresh/html_parser/Category:Sports.html")
content = file.read()
# Process the page.

But since the page contains UTF-8 content, its not able to parse it properly. But, if I save the page locally, then its able to parse. How to handle this problem,

Are you getting an error when you try to parse it?

what about:

content = file.read().encode('ascii', 'ignore')