954,504 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

How to use wifstream to read a unicode file..

Hi All,
Currently I am doing one project related to unicode file reading & writing. I got my result using CFile & using WcharToMultibyte,MultiByteToWchar conversion functions. But I have a doubt whether those functions work fine if the unichar is more than 2 bytes.

Now I want to read the file(UTF8,16(BE),16(LE)) using wifstream..

Can anyone help me???

smaity
Newbie Poster
3 posts since Dec 2005
Reputation Points: 10
Solved Threads: 0
 

unichar can be more than 2 bytes? I thought it was always 2 bytes.

WolfPack
Postaholic
Moderator
2,051 posts since Jun 2005
Reputation Points: 572
Solved Threads: 115
 
unichar can be more than 2 bytes? I thought it was always 2 bytes.


The size of wchar_t is operating system dependent. On MS-Windows wchar_t is defined as unsigned short. *nix computers it is unsigned long. And the UNICODE standards say that they intend to have 64-bit wchr_t.

That becomes a very big problem when attempting to port a UNICODE file between operating systems.

smality: No sure if this will help or not.

Ancient Dragon
Retired & Loving It
Team Colleague
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
 

The size of wchar_t is operating system dependent. On MS-Windows wchar_t is defined as unsigned short. *nix computers it is unsigned long. And the UNICODE standards say that they intend to have 64-bit wchr_t.

That becomes a very big problem when attempting to port a UNICODE file between operating systems.

smality: No sure if this will help or not.

thank you Ancient ..for providing the link, but its not enough..there is no clear idea about conversion..
this time i am trying use wistream...i willl read byte by byte......... and after getting the BOM ..then i will read all the bytes for a unichar...but if i get the byte then how to convert it back to unichar to show in textbox or listControl..

Do you have any idea regarding wistream application

thanks..

smaity
Newbie Poster
3 posts since Dec 2005
Reputation Points: 10
Solved Threads: 0
 

I don't use c++ streams for UNICODE for the reasons you describe -- its a lot easier to use C's FILE, fopen() in binary mode, fread() and fwrite(). You don't have to worry about conversion that way. That works providing you don't want to transport the file from one operating system to another and you don't want to use another editor such as Notepad.exe to read it.


If you still want to use wfstreams, you can use mbstowcs() to convert from char* to wchar_t*, or wcstombs() to convert the other direction.

Ancient Dragon
Retired & Loving It
Team Colleague
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
 
If you still want to use wfstreams, you can use mbstowcs() to convert from char* to wchar_t*, or wcstombs() to convert the other direction.

But i got to know that wifstream/wistream uses wchar_t whiich is of 2 byte in windows system. Now the problem is that if the unicode character is more than 2 bytes (surrogates) then it is not possible to read or show unicode characters..
VC compiler is not designed in that way..

Thanks,

smaity
Newbie Poster
3 posts since Dec 2005
Reputation Points: 10
Solved Threads: 0
 

you will probably have to write your own conversion functions that compress those 32-bit characters into 16 or 8 bit characters. But that may not work if the data requires all (or most) 32 bits to store each character, such as needed by many of the eastern languages (Chines, Japanese, etc).

Ancient Dragon
Retired & Loving It
Team Colleague
30,049 posts since Aug 2005
Reputation Points: 5,662
Solved Threads: 2,343
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You