As I emphasize *piece by piece* in the question, I have to read the ANSI file piece by piece to conserve resources, say 40 kilobytes for each read.

Now I am handling an ANSI file that contains Chinese characters (encoded using the GBK charset, two bytes for each Chinese character and one byte for each ASCII character).

I want to convert the ANSI file to Unicode, this can be easily done using the Win32 API MultiByteToWideChar (I am on Windows Mobile), the problem is if I read the file piece by piece, I will have many chances to read half of a Chinese character. How do I avoid that?

Thank you in advance!
-- Kevin Tse

7 Years
Discussion Span
Last Post by Ancient Dragon

I don't know a thing about Chinese characters, but I would suggest reading the file 2 bytes at a time. After reading 2 bytes test to see if it is a valid Chinese character. If not, assume its two ascii characters. Then write out either the chinese characters (which are already in UNICODE format), or convert the two ascii characters to UNICODE format and write them out. Converting from ascii to unicode is quite simple

wchar_t  c; // unicode char
char ascii = 'A';
c = (wchar_t)ascii;
This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.