I have been searching high and low on google, and I cannot seem to figure out how to convert unicode to integers. Take the unicode codepoint, u'3001', for example. I know in utf-8, this is suppose to be ideographic comma. The hexadecimal representation is 0xE38081. I know if I convert 0xE38081 to an integer, it is suppose to be 14909569. 14909569 is the answer I want, but I cannot seem to figure out how to do this in python.
>>> unichr(0x3001) u'\u3001' >>> str(unichr(0x3001)) '\xe3\x80\x81' >>> int('\xe3\x80\x81',16) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: invalid literal for int() with base 16: '\xe3\x80\x81' >>> int('0xe38081',16) 14909569 >>>
How come int() won't take the syntax \xE3\x80\x81? How can I strip or replace \x? the string functions? strip() and replace() do not work either. Is there another method that can deal with a unicode codepoint?