ASCII string containing unicode to UTF-8 ?!?!

Question

PicoDoM 0 Newbie Poster

16 Years Ago

So, I am slamming my head into my desk right now. I am trying to take a string containing unicode character codes and convert it to a python unicode string. I thought it would be simple, but I am having major issues. Any help would be greatly appreciated. This is what I am confused about.

Starting with this: test = "\u2022" I want to convert it to a unicode string which should look like u'\u2022' But when I try to convert test with test.encode("utf-8") I gives me back u'\\u2022' which when printed just shows "\u2022" which is not helpful at all!

Check this out:

>>> test = "\u2022"
>>> test.decode("utf-8")
u'\\u2022'
>>> test.encode("utf-8")
u'\\u2022'
>>> print test.decode("utf-8")
\u2022
>>> print test.encode("utf-8")
\u2022

So, I must be missing something, I am retrieving the original string externally so I cannot make it unicode from the start, I need to be able to convert it after the fact. I feel like I have tried everything, it would be great if there was a simple fix.

Thanks very much!

python

3 Contributors
9 Replies
656 Views
6 Days Discussion Span
Latest Post 16 Years Ago Latest Post by PicoDoM

jice 53 Posting Whiz in Training

16 Years Ago

test = u"\u2022"
print test.encode("utf-8")
ÔÇó

Have you tested this one ?

jice 53 Posting Whiz in Training

16 Years Ago

This one is not very clean but it may work till you've got a better solution...

test = '\u2022'
exec 'print u"%s".encode("utf-8")' % (test)

jice 53 Posting Whiz in Training

16 Years Ago

If you find something better, don't hesitate to post it here... I'd be glad to know it.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

PicoDoM 0 Newbie Poster · Answer 1 · 2007-11-20T01:57:22+00:00

test = u"\u2022"
print test.encode("utf-8")
ÔÇó
Have you tested this one ?

Yes, I have tried this, but it does not solve the problem I am currently working with, I need to be able to start with the plain ASCII string "\u2022" and then after the fact convert it to UTF-8 to look like u'\u2022'

PicoDoM 0 Newbie Poster · Answer 2 · 2007-11-20T14:34:10+00:00

That looks like its gonna work!

Thank you so much jice, you have no idea how happy I am to have this solution.

PicoDoM 0 Newbie Poster · Answer 3 · 2007-11-21T06:41:00+00:00

I definitely will, but jeez, from the hunting I did before I posted on here, I dont know if there is anything else out there.

PicoDoM 0 Newbie Poster · Answer 4 · 2007-11-21T12:54:29+00:00

One thing though, for my uses, since the "\u2022" is already basically in unicode form, just not in a unicode string, your code does not need the .encode("utf-8") since its already being inserted into a unicode declared string.

meastman 0 Newbie Poster · Answer 5 · 2007-11-25T04:02:25+00:00

You may also want to try this:
test.decode('raw_unicode_escape')

>>> test = "\u2022"
>>> test.decode('raw_unicode_escape')
u'\u2022'

PicoDoM 0 Newbie Poster · Answer 6 · 2007-11-26T07:19:22+00:00

meastman.. thank you, that is basically what I was looking for in the first place, I could sweat that i saw a list of the arguments that could be pasted to encode and decode and didnt see this. Anyway, thank you very much! It's nice to have the "proper" solution. jice's is a bit more interesting though, haha. Thank you everyone!