Python and Unicode

Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
Reply

Join Date: Oct 2008
Posts: 1
Reputation: suresh_iyengar is an unknown quantity at this point 
Solved Threads: 0
suresh_iyengar suresh_iyengar is offline Offline
Newbie Poster

Python and Unicode

 
0
  #1
Oct 27th, 2008
Hello,

I want to fetch a web page and parse links in that. I am using the foll. code
  1. file =urllib.urlopen("file:///home/suresh/html_parser/Category:Sports.html")
  2. content = file.read()
  3. # Process the page.

But since the page contains UTF-8 content, its not able to parse it properly. But, if I save the page locally, then its able to parse. How to handle this problem,
Reply With Quote Quick reply to this message  
Join Date: Oct 2008
Posts: 45
Reputation: tyincali is an unknown quantity at this point 
Solved Threads: 6
tyincali tyincali is offline Offline
Light Poster

Re: Python and Unicode

 
0
  #2
Oct 28th, 2008
Are you getting an error when you try to parse it?

what about:
  1. content = file.read().encode('ascii', 'ignore')
?
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:


Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC