Spanish Characters in Python 2.7 (using Windows 7)

Question

hokeysmoke 0 Newbie Poster

12 Years Ago

I have a question...I'm trying to get text out of a website containing Spanish characters (like ñ or á) using urllib2 (and with #-*- coding: latin-1 -*- near the beginning of the Python file). However, when I write the output of the text to a file, I get something else--for example, ñ appears as Ã± (so español appears as espaÃ±ol). Even if I manually put in a line that says something like letter="ñ" and print it to the screen, it appears as ±. Any advice? As I mentioned in the title, I'm using Python 2.7 on Windows 7 (though I get the same output in a file in Ubuntu). Thanks in advance!

character python spanish urllib2

3 Contributors
6 Replies
3K Views
2 Days Discussion Span
Latest Post 12 Years Ago Latest Post by hokeysmoke

All 6 Replies

TrustyTony 888 pyMod

12 Years Ago

Is it sure that the site is not using utf8, for example?

This is printing ok in Python 2.7.2/Windows XP, both typed from keyboard and copied from this post in DaniWeb.

# -*- coding: cp1252 -*-
print 'Viva España!'
print 'español'

TrustyTony 888 pyMod

12 Years Ago

So I checked it out, looks like utf8 for me:

# -*- coding: cp1252 -*-
import urllib2
print 'Viva España!'
print 'español'
site = urllib2.urlopen('http://www.wordreference.com/')
test = site.read()
site.close()
print test.decode('utf8')

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

hokeysmoke 0 Newbie Poster · Answer 1 · 2012-02-14T12:10:35+00:00

Is it sure that the site is not using utf8, for example?
This is printing ok in Python 2.7.2/Windows XP, both typed from keyboard and copied from this post in DaniWeb.
# -*- coding: cp1252 -*-
print 'Viva España!'
print 'español'

Thanks! At the screen level, your code prints the ñ just fine. But...if I try to write (even from keyboard) the output into a file by using f.write('español'), it won't print the ñ properly into the file opened by f.

The website I'm accessing is within the www.wordreference.com domain. In links, the format of the characters is habr%c3%a1 for habrá, but where there's just regular text (the vast majority of the page), then words like habrá are used without any (apparently) special coding.

M.S. 53 Light Poster · Answer 2 · 2012-02-15T02:23:06+00:00

for writing into file, I suggest codecs:

import codecs
out_file = codecs.open("some_file.txt", 'w', 'utf8')
out_file.write('español')

hokeysmoke 0 Newbie Poster · Answer 3 · 2012-02-16T05:55:32+00:00

So I checked it out, looks like utf8 for me:

# -*- coding: cp1252 -*-
import urllib2
print 'Viva España!'
print 'español'
site = urllib2.urlopen('http://www.wordreference.com/')
test = site.read()
site.close()
print test.decode('utf8')

That code gave me the following error after the print test.decode('utf8')...

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 6695-6700: character maps to <undefined>

Is there a way around this? Thanks!

hokeysmoke 0 Newbie Poster · Answer 4 · 2012-02-16T06:00:27+00:00

for writing into file, I suggest codecs:
import codecs
out_file = codecs.open("some_file.txt", 'w', 'utf8')
out_file.write('español')

The code using codecs still gave me...
espaÂ¤ol

(This is based on how it looks in MS Word (opening with utf8) and WordPad.)

Spanish Characters in Python 2.7 (using Windows 7)

Recommended Answers Collapse Answers

All 6 Replies

Recommended Answers