How do I correctly read an ANSI file piece by piece?

Question

kevintse 0 Light Poster

15 Years Ago

As I emphasize *piece by piece* in the question, I have to read the ANSI file piece by piece to conserve resources, say 40 kilobytes for each read.

Now I am handling an ANSI file that contains Chinese characters (encoded using the GBK charset, two bytes for each Chinese character and one byte for each ASCII character).

I want to convert the ANSI file to Unicode, this can be easily done using the Win32 API MultiByteToWideChar (I am on Windows Mobile), the problem is if I read the file piece by piece, I will have many chances to read half of a Chinese character. How do I avoid that?

Thank you in advance!
-- Kevin Tse

api c++ file-system windows-api

2 Contributors
1 Reply
139 Views
9 Hours Discussion Span
Latest Post 15 Years Ago Latest Post by Ancient Dragon

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Ancient Dragon 5,243 Achieved Level 70 Team Colleague Featured Poster · Answer 1 · 2009-07-22T17:42:01+00:00

I don't know a thing about Chinese characters, but I would suggest reading the file 2 bytes at a time. After reading 2 bytes test to see if it is a valid Chinese character. If not, assume its two ascii characters. Then write out either the chinese characters (which are already in UNICODE format), or convert the two ascii characters to UNICODE format and write them out. Converting from ascii to unicode is quite simple

wchar_t  c; // unicode char
char ascii = 'A';
c = (wchar_t)ascii;

How do I correctly read an ANSI file *piece by piece*?

How do I correctly read an ANSI file piece by piece?