954,499 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Newbie alert - numerical value for unicode characters

I've searched all over for an answer to this, including this forum, so sorry if I missed something,
anyway, I'd like to get a numerical code from extended characters like ß or ü and so on.
I don't use them very much myself, as I'm a native English language user! But they pop up enough that I should be able to support them if they arise.
I have found some information on long chars but I didn't manage to find a resource I could understand enough to actually use.
i.e.

char c;
	int i;
	c = 'h';
	i = c;
	std::cout << i << "\n";

i is now equal to 104, the standard ascii number.
How can I consistently get the same number from one of the extended characters, and convert back again if needed?

Dorson8009
Newbie Poster
7 posts since Jan 2011
Reputation Points: 10
Solved Threads: 0
 

If you hury up you can correect the code tags.


[code=cplusplus]


Notice no spaces and its cplusplus not c++


Are you talking about converting UNICODE wchar_t* to char*? Here is a thread that shows one way to do it.

Ancient Dragon
Retired & Loving It
Team Colleague
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
 

If this is a serious project (as opposed to something for learning Unicode), I'd suggest ICU . Managing Unicode is a bitch without a good library.

Narue
Bad Cop
Administrator
15,460 posts since Sep 2004
Reputation Points: 6,464
Solved Threads: 1,401
 

Hi, thanks for your speedy answers! I'm having a look at ICU. I'll mark the thread "solved" in a day or so just in case any other good ideas turn up.

Dorson8009
Newbie Poster
7 posts since Jan 2011
Reputation Points: 10
Solved Threads: 0
 

Hi again, ICU seems enormously complex, is it overkill when all I need is a number to character and back again conversion? Or is this the only way?

Dorson8009
Newbie Poster
7 posts since Jan 2011
Reputation Points: 10
Solved Threads: 0
 

ICU is enormously complex because Unicode is enormously complex.

when all I need is a number to character and back again conversion?


Let's assume you want to do it manually. You'd need to support at least UTF-8, UTF-16 (including surrogates), and UTF-32. The process is different for converting each of those into a code point. Now, in all honesty that's not especially difficult. It's more difficult than calling a library function, but straightforward, in my opinion.

The hard part comes when you realize that you're probably notjust converting a character to a code point, you're likely introducing general Unicode support including I/O and comparisons, which opens up a can of worms like normalization (and normalization is stupidly complex if you're thinking about doing it manually).

Narue
Bad Cop
Administrator
15,460 posts since Sep 2004
Reputation Points: 6,464
Solved Threads: 1,401
 

OK, thanks people for your insight, I appreciate it!

Dorson8009
Newbie Poster
7 posts since Jan 2011
Reputation Points: 10
Solved Threads: 0
 

This question has already been solved

Post: Markdown Syntax: Formatting Help
You
View similar articles that have also been tagged: