Hi there everyone.
I'm trying to get a start on learning about Character encoding. I'm trying to understand what it is and how it is used and how to interpret it etc etc... I know there is allot of web resources out there on this kind of thing but I was hoping to use a book which will explain this in english.

Not a super advanced bood like the Orielly version, but a beginners book, something like at the 'For Dummies' level. I'm unable to find one so far though I thought someone out there may have a good idea if one exists.
I'm programming in C++ if that is at all relevent.

Thank you..

Recommended Answers

All 7 Replies

There's not really a 'for Dummies' level book because the topic is very complex. The O'Reilly book is probably your best bet for an introduction, and if it's too difficult you're probably lacking in prerequisite education.

Perhaps if you point out some parts of the book that are troublesome, we can clarify things or point you toward a resource for further learning.

Thanks very much. I am reading through the orelly book now which is quiet extensive so I have not doubt there will be questions.

I am a little confused about something. I now understand what UTF-32 -8 and -16 is but what I can't understand is why UTF-32 was invented in the first place if UTF-8 would be a easily substitutable by using multiple bytes as opposed to allot of empty wydes in UTF-32 for example.

Surely there was nothing stopping UTF-8 from the beginning.. Does anyone know why this didn't happen?

Thanks

Because UTF-8 didn't exist in the beginning? It was proposed years later.

I understand that, but it doesn't make sense that it wasn't introduced first. There were always 8 bits in a byte. Why create an encoding standard that is stored with multiple 0's when say this could be done with 16 bits or 8 bits.. Maybe it was simply to differentiate software in the beginning (a microsoft way of thinking).

There were always 8 bits in a byte.

That's not universally true. To the best of my knowledge there aren't any platforms where a byte is less than 8 bits, but there certainly are platforms where it's more.

Why create an encoding standard that is stored with multiple 0's when say this could be done with 16 bits or 8 bits..

Storage cost is only a part of the problem. When you have complex encoding methods, it takes time to process them. UTF-32 is the ideal when you can afford the space because it's a lot faster than encoding and decoding UTF-16 or UTF-8 compression. UTF-16 and UTF-8 were designed to reduce storage cost, and the price of that is reduced performance.

Thanks.. That explains it well.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.