Member Avatar for tvks

Hello,

I am a newbie in python.

I have a unicode in Tamil.

When I use the sys.getdefaultencoding() I get the output as "Cp1252"

My requirement is that when I use text = testString.decode("utf-8") I get the error "UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-8: character maps to undefined"

Pls help.

Regards,

You are decoding a string that is not in reality using coding 'utf-8', I think it is using coding 'cp1252' like you said. Like my Python under Windows XP.

>>> a='hyvä'
>>> a.decode('cp1252')
u'hyv\xe4'
>>> print a.decode('cp1252')
hyvä
>>> a.decode('utf-8')

Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    a.decode('utf-8')
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe4 in position 3: unexpected end of data
>>> ord(a[-1])
228
>>> print hex(228)
0xe4
>>>
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.