As I emphasize *piece by piece* in the question, I have to read the ANSI file piece by piece to conserve resources, say 40 kilobytes for each read.

Now I am handling an ANSI file that contains Chinese characters (encoded using the GBK charset, two bytes for each Chinese character and one byte for each ASCII character).

I want to convert the ANSI file to Unicode, this can be easily done using the Win32 API MultiByteToWideChar (I am on Windows Mobile), the problem is if I read the file piece by piece, I will have many chances to read half of a Chinese character. How do I avoid that?

Thank you in advance!
-- Kevin Tse

I don't know a thing about Chinese characters, but I would suggest reading the file 2 bytes at a time. After reading 2 bytes test to see if it is a valid Chinese character. If not, assume its two ascii characters. Then write out either the chinese characters (which are already in UNICODE format), or convert the two ascii characters to UNICODE format and write them out. Converting from ascii to unicode is quite simple

wchar_t  c; // unicode char
char ascii = 'A';
c = (wchar_t)ascii;
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.