I have been searching high and low on google, and I cannot seem to figure out how to convert unicode to integers. Take the unicode codepoint, u'3001', for example. I know in utf-8, this is suppose to be ideographic comma. The hexadecimal representation is 0xE38081. I know if I convert 0xE38081 to an integer, it is suppose to be 14909569. 14909569 is the answer I want, but I cannot seem to figure out how to do this in python.

>>> unichr(0x3001)
>>> str(unichr(0x3001))
>>> int('\xe3\x80\x81',16)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 16: '\xe3\x80\x81'
>>> int('0xe38081',16)

How come int() won't take the syntax \xE3\x80\x81? How can I strip or replace \x? the string functions? strip() and replace() do not work either. Is there another method that can deal with a unicode codepoint?

Here is my going around and finally I got there, but not maybe most elegant way as Python did not allow me to take ord from the individual bytes making utf8 letter. So I ended up manipulating the repr of that letter by string manipulation.

print b
print c
d= r"0x"+c.translate(None,r"\x'")
print d,int(d,16)
This question has already been answered. Start a new discussion instead.