Python and Unicode

Question

suresh_iyengar 0 Newbie Poster

15 Years Ago

Hello,

I want to fetch a web page and parse links in that. I am using the foll. code

file =urllib.urlopen("file:///home/suresh/html_parser/Category:Sports.html")
content = file.read()
# Process the page.

But since the page contains UTF-8 content, its not able to parse it properly. But, if I save the page locally, then its able to parse. How to handle this problem,

python

2 Contributors
1 Reply
99 Views
1 Day Discussion Span
Latest Post 15 Years Ago Latest Post by tyincali

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

tyincali 21 Light Poster · Answer 1 · 2008-10-29T01:57:42+00:00

Are you getting an error when you try to parse it?

what about:

content = file.read().encode('ascii', 'ignore')

?