| | |
Python and Unicode
Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
![]() |
•
•
Join Date: Oct 2008
Posts: 1
Reputation:
Solved Threads: 0
Hello,
I want to fetch a web page and parse links in that. I am using the foll. code
But since the page contains UTF-8 content, its not able to parse it properly. But, if I save the page locally, then its able to parse. How to handle this problem,
I want to fetch a web page and parse links in that. I am using the foll. code
python Syntax (Toggle Plain Text)
file =urllib.urlopen("file:///home/suresh/html_parser/Category:Sports.html") content = file.read() # Process the page.
But since the page contains UTF-8 content, its not able to parse it properly. But, if I save the page locally, then its able to parse. How to handle this problem,
•
•
Join Date: Oct 2008
Posts: 45
Reputation:
Solved Threads: 6
Are you getting an error when you try to parse it?
what about:
?
what about:
Python Syntax (Toggle Plain Text)
content = file.read().encode('ascii', 'ignore')
![]() |
Similar Threads
- Help needed python unicode cgi-bin (Python)
- ASCII string containing unicode to UTF-8 ?!?! (Python)
- line drawing characters not displayed properly - python unicode (Python)
- unicode and pythonw.exe (Python)
- Inserting Unicode Characters into a List (Python)
- Python and Unicode (Python)
Other Threads in the Python Forum
- Previous Thread: Py2exe
- Next Thread: Installing Python on Vista
| Thread Tools | Search this Thread |
address aliased anydbm app bash beginner bits changecolor cipher class clear conversion coordinates corners cturtle curves definedlines development dictionary dynamic events examples excel external feet file float format ftp function gui handling homework iframe images import info input java keycontrol line linux list lists loan loop matching mouse number numbers output parsing path permissions port prime programming projects py2exe pygame pymailer python random rational raw_input recursion recursive scrolledtext searchingfile shebang singleton split string strings table tails terminal text thread threading time tkinter tlapse tooltip tuple tutorial type ubuntu unicode url urllib urllib2 valueerror variable whileloop windows word wx.wizard wxpython xlwt





