Unicode string has potentially very many letters that does not fit to ASCII range. utf8 however can encode them in variable length codes (Python 2.6, python 3 has many changes for the system)
However wikipedis says:
The Python language environment officially only uses UCS-2 internally since version 2.1, but the UTF-8 decoder to "Unicode" produces correct UTF-16. Python can be compiled to use UCS-4 (UTF-32) but this is commonly only done on Unix systems.
As I say in the article: "if possible, use a Python install compiled to use UCS4 character storage." Micah Dubinko asked how to check whether your current Python build is such. The best test right now is to take advantage of one of the bugs present in UCS2 builds and not UCS4 builds. The test that Eric van der Vlist came up with, for example:
if len(u'\U00010800') == 1:
else: #len is 2 in UCS2 builds