Hello,

I want to fetch a web page and parse links in that. I am using the foll. code

file =urllib.urlopen("file:///home/suresh/html_parser/Category:Sports.html")
content = file.read()
# Process the page.

But since the page contains UTF-8 content, its not able to parse it properly. But, if I save the page locally, then its able to parse. How to handle this problem,

Are you getting an error when you try to parse it?

what about:

content = file.read().encode('ascii', 'ignore')

?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.