I have a question...I'm trying to get text out of a website containing Spanish characters (like ñ or á) using urllib2 (and with #-*- coding: latin-1 -*- near the beginning of the Python file). However, when I write the output of the text to a file, I get something else--for example, ñ appears as ñ (so español appears as español). Even if I manually put in a line that says something like letter="ñ" and print it to the screen, it appears as ±. Any advice? As I mentioned in the title, I'm using Python 2.7 on Windows 7 (though I get the same output in a file in Ubuntu). Thanks in advance!
hokeysmoke
0
Newbie Poster
Recommended Answers
Jump to PostIs it sure that the site is not using utf8, for example?
This is printing ok in Python 2.7.2/Windows XP, both typed from keyboard and copied from this post in DaniWeb.
# -*- coding: cp1252 -*- print 'Viva España!' print 'español'
Jump to PostSo I checked it out, looks like utf8 for me:
# -*- coding: cp1252 -*- import urllib2 print 'Viva España!' print 'español' site = urllib2.urlopen('http://www.wordreference.com/') test = site.read() site.close() print …
All 6 Replies
TrustyTony
888
pyMod
Team Colleague
Featured Poster
hokeysmoke
0
Newbie Poster
TrustyTony
888
pyMod
Team Colleague
Featured Poster
M.S.
53
Light Poster
hokeysmoke
0
Newbie Poster
hokeysmoke
0
Newbie Poster
Be a part of the DaniWeb community
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.