Hi!

How can i derive the used integer from variable a, when i use following code:

a = unichr(275).encode('utf8')

When i try this:

print ord(a)

It raises an error...

This is going to be tough, because "print a" gives you a string consisting of two hex values --> '\xc4\x93'. The function ord() seems to only handle single characters.

This works ...

>>> b = unichr(275)
>>> b
u'\u0113'
>>> ord(b)
275

Is there a way to decode('utf8')? Sorry, I just don't work with unicode much.

you could try this:

>>> a = unichr(275).encode('utf8')
>>> b = a.decode('utf8')
>>> b
u'\u0113'
>>> ord(b)
275

Ok thanx, this has helped me understand the problem and partially solve it, however, there is an additional problem.

When i concatate several unicoded characters like this:

a = unichr(88).encode('utf8')
b = unichr(257).encode('utf8')
c = unichr(109).encode('utf8')
d = unichr(258).encode('utf8')
print a,b,c,d
e = a+b+c+d
print e
for i in e:
    print i

How can i retrieve the original integers (88,257,109,258) from string e?
Since all characters above 255 contain two characters, how can i than determine which characters belong together, so i can decode them together (using ord()).

In other words, can i split the string in a certain way, so that it contains 'whole' characters.
I know that flash actionscript can to this, this advanced language must be able to do the same.

Ok thanx, this has helped me understand the problem and partially solve it, however, there is an additional problem.

When i concatate several unicoded characters like this:

a = unichr(88).encode('utf8')
b = unichr(257).encode('utf8')
c = unichr(109).encode('utf8')
d = unichr(258).encode('utf8')
print a,b,c,d
e = a+b+c+d
print e
for i in e:
    print i

How can i retrieve the original integers (88,257,109,258) from string e?
Since all characters above 255 contain two characters, how can i than determine which characters belong together, so i can decode them together (using ord()).

In other words, can i split the string in a certain way, so that it contains 'whole' characters.
I know that flash actionscript can to this, this advanced language must be able to do the same.

....
dec = e.decode('string_escape').decode('utf8')
for i in dec: print ord(i),
...

This seems to work too:

a = unichr(88).encode('utf8')
b = unichr(257).encode('utf8')
c = unichr(109).encode('utf8')
d = unichr(258).encode('utf8')
print a,b,c,d
e = a+b+c+d
print e
print
for c in e:
    print c
print
for i in e.decode('utf8'):
    print ord(i)
This question has already been answered. Start a new discussion instead.