decoding arbitrary byte data

ribot 0 Newbie Poster

15 Years Ago

Hello, I'm trying to deal a bit with IRC and Python 3. The decoding from bytes to Unicode create quite a dilemma.

1. Somehow it works to convert the å character from utf-8 to bytes, but the received bytes seem to often be encoded in latin-1, or at least they can be decoded correctly as that. Does this mean that on IRC servers the data being transmitted can be encoded characters of any arbitrary set (utf-8, latin-1, etc)?

2. If that is the case, is there then some way to figure out what the encoding is? Because the utf-8 codec cannot decode the character \xe5, which latin-1 decodes to å. utf-8 decodes \xc3\xa5 to å. Seemingly, there is no way then to know what encoding is being done on IRC?

3. My approach to this problem is that I can take the OS default charset. For example, using tkinter, there must be some charset being used in the text widget. How can I find this out? At least then, for an IRC client, the language used by the system will be decoded properly when chatting.

python

1 Contributor
0 Replies
69 Views

Be the first to reply

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.