Using Python 3.2., wing ide: chr( i ) converts the integer i to a single character or string. How can I know which unicode encoding is being used ? When I execute the following code I get a single character symbol per character, including the protocol symbols such as ETX, ..., the alphanumerics and at 128 + integers I get the special symbols such as ... ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂà etc. Per locale.getpreferredencoding() my linux system is utf-8 and my windows is CP-1252.

It works fine on linux. When I run the same code on windows it gets an exception at \x80 in module cp437.py, def encode( ...) . I assume the encode method is called implicitly?

How can I get chr() to use utf-8 on windows without changing locale ?

for i in range(255):
    print( chr(i), end="")

Recommended Answers

All 5 Replies

In Python strings are Unicode, and what you are talking about is encodings of bytes. Use bytes type and it is just bytes. If you want to encode them in utf8, do so explicitely. But for me that does not make sense as it is variable length encoding of all unicode letters, not single byte like latin15 etc.

Thanks, I agree with you for most purposes.

I'm just using this technique for some debug output for byte stream that are part of a file protocol I'm deciphering. Patterns are more recognizable to me with characters than looking at \xC3 type hex stream some of the time. It works differently on linux verses windows. When I leave my office I am on my win 7 laptop, just trynig to make it work the same.

After some googling, I think there is some default coding for sys.stdout which is the console window that I'm writing too. There is also an environment variable PYTHONIOENCODING . sys.stdout uses different encoding on linux and windows. Still haven't quite figured it all out yet. I just want a way to override for std.out the encoding that appears to be picked up from the system locale parameters, CP-1252 on windows, but utf-8 on linux.

It seems that you can change the cmd console encoding to utf8 with the command chcp 65001. See here and here.

Yes, kind of, but as in comments of http://stackoverflow.com/questions/388490/unicode-characters-in-windows-command-line-how

Note there are serious implementation bugs in Windows's code page 65001 support which will break many applications that rely on the C standard library IO methods, so this is very fragile. (Batch files also just stop working in 65001.) Unfortunately UTF-8 is a second-class citizen in Windows. – bobince Dec 29 '11 at 21:51

I am using Wing IDE. There is a Debug window, an OS, and Python Shell Window, and each one of them can have any code page set indepenently. Once I discovered that, I set the Debug window to UTF-8 and I get the same print behavior on both Linux and Windows.

Problem Solved.

Thanks all.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.