954,549 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Unicode ord and unichr

Hi!

How can i derive the used integer from variable a, when i use following code:

a = unichr(275).encode('utf8')



When i try this:

print ord(a)


It raises an error...

Peagles
Newbie Poster
3 posts since Sep 2006
Reputation Points: 10
Solved Threads: 0
 

This is going to be tough, because "print a" gives you a string consisting of two hex values --> '\xc4\x93'. The function ord() seems to only handle single characters.

This works ...

>>> b = unichr(275)
>>> b
u'\u0113'
>>> ord(b)
275


Is there a way to decode('utf8')? Sorry, I just don't work with unicode much.

vegaseat
DaniWeb's Hypocrite
Moderator
5,989 posts since Oct 2004
Reputation Points: 1,345
Solved Threads: 1,417
 

you could try this:

>>> a = unichr(275).encode('utf8')
>>> b = a.decode('utf8')
>>> b
u'\u0113'
>>> ord(b)
275
ghostdog74
Junior Poster
156 posts since Apr 2006
Reputation Points: 75
Solved Threads: 44
 

Ok thanx, this has helped me understand the problem and partially solve it, however, there is an additional problem.

When i concatate several unicoded characters like this:

a = unichr(88).encode('utf8')
b = unichr(257).encode('utf8')
c = unichr(109).encode('utf8')
d = unichr(258).encode('utf8')
print a,b,c,d
e = a+b+c+d
print e
for i in e:
    print i



How can i retrieve the original integers (88,257,109,258) from string e?
Since all characters above 255 contain two characters, how can i than determine which characters belong together, so i can decode them together (using ord()).

In other words, can i split the string in a certain way, so that it contains 'whole' characters.
I know that flash actionscript can to this, this advanced language must be able to do the same.

Peagles
Newbie Poster
3 posts since Sep 2006
Reputation Points: 10
Solved Threads: 0
 

Ok thanx, this has helped me understand the problem and partially solve it, however, there is an additional problem.

When i concatate several unicoded characters like this:

a = unichr(88).encode('utf8')
b = unichr(257).encode('utf8')
c = unichr(109).encode('utf8')
d = unichr(258).encode('utf8')
print a,b,c,d
e = a+b+c+d
print e
for i in e:
    print i

How can i retrieve the original integers (88,257,109,258) from string e? Since all characters above 255 contain two characters, how can i than determine which characters belong together, so i can decode them together (using ord()).

In other words, can i split the string in a certain way, so that it contains 'whole' characters. I know that flash actionscript can to this, this advanced language must be able to do the same.

....
dec = e.decode('string_escape').decode('utf8')
for i in dec: print ord(i),
...
ghostdog74
Junior Poster
156 posts since Apr 2006
Reputation Points: 75
Solved Threads: 44
 

Thank u so much!

Peagles
Newbie Poster
3 posts since Sep 2006
Reputation Points: 10
Solved Threads: 0
 

This seems to work too:

a = unichr(88).encode('utf8')
b = unichr(257).encode('utf8')
c = unichr(109).encode('utf8')
d = unichr(258).encode('utf8')
print a,b,c,d
e = a+b+c+d
print e
print
for c in e:
    print c
print
for i in e.decode('utf8'):
    print ord(i)
Ene Uran
Posting Virtuoso
1,723 posts since Aug 2005
Reputation Points: 625
Solved Threads: 213
 

This question has already been solved

Post: Markdown Syntax: Formatting Help
You