I've searched all over for an answer to this, including this forum, so sorry if I missed something,
anyway, I'd like to get a numerical code from extended characters like ß or ü and so on.
I don't use them very much myself, as I'm a native English language user! But they pop up enough that I should be able to support them if they arise.
I have found some information on long chars but I didn't manage to find a resource I could understand enough to actually use.
i.e.

char c;
	int i;
	c = 'h';
	i = c;
	std::cout << i << "\n";

i is now equal to 104, the standard ascii number.
How can I consistently get the same number from one of the extended characters, and convert back again if needed?

Recommended Answers

All 6 Replies

If you hury up you can correect the code tags.

[code=cplusplus]

Notice no spaces and its cplusplus not c++


Are you talking about converting UNICODE wchar_t* to char*? Here is a thread that shows one way to do it.

If this is a serious project (as opposed to something for learning Unicode), I'd suggest ICU. Managing Unicode is a bitch without a good library.

Hi, thanks for your speedy answers! I'm having a look at ICU. I'll mark the thread "solved" in a day or so just in case any other good ideas turn up.

Hi again, ICU seems enormously complex, is it overkill when all I need is a number to character and back again conversion? Or is this the only way?

ICU is enormously complex because Unicode is enormously complex.

when all I need is a number to character and back again conversion?

Let's assume you want to do it manually. You'd need to support at least UTF-8, UTF-16 (including surrogates), and UTF-32. The process is different for converting each of those into a code point. Now, in all honesty that's not especially difficult. It's more difficult than calling a library function, but straightforward, in my opinion.

The hard part comes when you realize that you're probably not just converting a character to a code point, you're likely introducing general Unicode support including I/O and comparisons, which opens up a can of worms like normalization (and normalization is stupidly complex if you're thinking about doing it manually).

OK, thanks people for your insight, I appreciate it!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.