Well I am writing a program for Windows Mobile phone, I need to read text files in the program, the character sets (charset) of the files I am going to read are unknown, here I need to convert whatever text(Actually I probably only need to handle UTF-8, UTF-16 BE, GBK, BIG5) to Unicode(UTF-16 LE) to properly display them, the conversion proper is quite simple using the Windows API, but I don't know how to do the detection of the file encodings at runtime?

Anyone any ideas?
Your replies will be greatly appreciated!

-- Kevin Tse

Recommended Answers

All 2 Replies

Thank you.
The article you pointed me to was just showing how to read Unicode and ANSI files, which I already knew. I knew I could easily know the encoding of files with BOMs(Byte Order Mark), but there are no BOMs for ANSI files, like GBK, BIG5, and there can even be UTF-8, UTF-16 without BOMs.

Anyway, the article gives me a clue that I can ALWAYS test if there are BOMs, UTF-8, UTF-16 LE and UTF-16 BE may have BOMs, so I can take all others without BOMs as ANSI files, though the assumption is not accurate, it can do what I want most of the time, I think.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.