Hi,

How can I convert a 64-bit unicode string into a text string? I'm converting ASCII characters for example like this

str = unichr(int('00A9', 16))

But how can I convert unicode 'U2082' or any other character beyong the ASCII range?

Thank you.

Recommended Answers

All 5 Replies

Unicode string has potentially very many letters that does not fit to ASCII range. utf8 however can encode them in variable length codes (Python 2.6, python 3 has many changes for the system)

a=u'asfasdfö'
b=a.encode('utf8')
print a
print b

However wikipedis says:

The Python language environment officially only uses UCS-2 internally since version 2.1, but the UTF-8 decoder to "Unicode" produces correct UTF-16. Python can be compiled to use UCS-4 (UTF-32) but this is commonly only done on Unix systems.

I found this code by googling:http://www.xml.com/cs/user/view/cs_msg/2915

As I say in the article: "if possible, use a Python install compiled to use UCS4 character storage." Micah Dubinko asked how to check whether your current Python build is such. The best test right now is to take advantage of one of the bugs present in UCS2 builds and not UCS4 builds. The test that Eric van der Vlist came up with, for example:

if len(u'\U00010800') == 1:
    print "UCS4"
else: #len is 2 in UCS2 builds
    print "UCS2"

The above code outputs "UCS2", does it mean my Python doesn't support 64-bit unicode? And I can't output any unicode strings other than ASCII?

ok, but how do I do that in Python? Thanks.

You can use the unidecode module available from here http://pypi.python.org/pypi . For example

>>> str = unichr(int('00A9', 16))
>>> str
u'\xa9'
>>> from unidecode import unidecode
>>> unidecode(str)
'(c)'

Also, you should not use 'str' as a variable name because it's the name of a builtin type.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.