0

Hello,

I want to fetch a web page and parse links in that. I am using the foll. code

file =urllib.urlopen("file:///home/suresh/html_parser/Category:Sports.html")
content = file.read()
# Process the page.

But since the page contains UTF-8 content, its not able to parse it properly. But, if I save the page locally, then its able to parse. How to handle this problem,

2
Contributors
1
Reply
2
Views
9 Years
Discussion Span
Last Post by tyincali
0

Are you getting an error when you try to parse it?

what about:

content = file.read().encode('ascii', 'ignore')

?

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.