Hello,

I have a file which I need to read which contains some unusual characters (EG: '╠') and I need to be able to read it and convert those characters to numbers. As an example of what I am looking for would be code that reads a file and prints "Found" whenever '╠' is encountered. How do I do this with native std::fstream compatibility?

EG:

void testFile(const char *str)
{
    std::fstream file(str);
    char temp;
    int count=0;
    while (!file.eof())
    {
        file>>temp;
        if (temp=='╠')//I highly doubt this would work
            count++;
    }
    cout<<"Found "<<count<<" ╠s"<<endl;
}

I am sure that there is a way to store unicode characters into a char, and I seem to recall its 'u####' or something like that, and I think I need a different type for my char, but for the life of me I can't seem to remember what it is called. On top of that I am unsure how reading those larger chars from std::fstream objects works.

Recommended Answers

All 7 Replies

Well ╠ is expressible by the ascii code of 204. As far as working with unicode you might want to look into the std::locale library.

Should you not be working with wstring and wchar_t types?

@NathanOliver yes, I know that ╠ happens to be expressible by ASCII 204, but some of my other symbols (like ₆ for example) are not expressible in ASCII, but rather require full Unicode (₆ is Unicode 2086 I think). I am not sure if locale will help me too much as I think it is more concerned with multiple interpretations of things, while Unicode is (theoretically) a universal standard.

@Suzie999 I think wchar_t is exactly the type who's name I was trying to remember (assuming that it is the one that supports full unicode?). Now I just need to know if I can extract those directly from an std::fstream object or if I have to do something special, and how to input my unusual symbols into my source files.

Note: I believe all of the symbols which I am referencing are a part of UTF-16 (none have Unicode values higher than FFFF), but I am not well educated on exactly how unicode works.

If you are going to use either wchar_t or wstring then you need to use wfstream for file operations.

Is there any tutorial, or information webpage, somewhere that would explain how to use wchar_t's and wfstream's? Do they act like normal char's and fstream's? How to I specify unicode characters in C++? How do wchar_t and wfstream respond to normal ASCII input (given that unicode is for the most part backwards compatible with ASCII)? Basically, could you show me an implementation of the function:

int countOccurencesOfUnicodeCharacter(const char *filename, int unicode)
{
    //somehow return the frequency of 'unicode' in the file specified by filename.
    //of particular interest to me is how to get unicode characters and how to compare with them.
}

Once I have the constituent parts of that function I should be able to adapt them to my needs.

Note: I believe all of the symbols which I am referencing are a part of UTF-16 (none have Unicode values higher than FFFF), but I am not well educated on exactly how unicode works.

How do wchar_t and wfstream respond to normal ASCII input (given that unicode is for the most part backwards compatible with ASCII)? Basically, could you show me an implementation of the function:

Try the following implementation. It's based on the assumption that you are using UTF-16

#include <fstream>
#include <sstream>
#include <iostream>

#define CRAZYCHAR 204
#define UNICODETEXTFILE "myUnicodeTextFile.txt"
int main(void)
{
    std::stringstream myStringStream;
    std::ifstream fin(UNICODETEXTFILE);
    myStringStream << fin.rdbuf(); // Copy the complete file contents into a stringstream
    std::string const &myString = myStringStream.str();
    if (myString.size() % sizeof(wchar_t) != 0)
    {
        std::cerr << "Error: Unicode file just isn't the right size\n"; // Must be even number. Two bytes per unit
        return 1;
    }
    std::wstring myWideString;
    myWideString.resize(myString.size() / sizeof(wchar_t));
    std::memcpy(&myWideString[0],myString.c_str(),myString.size()); // Copy all the data into wstring
    for(int x = 0; x < myWideString.size();  x++)
    {
        if(myWideString[x] == CRAZYCHAR)
        {
            std::cout << "Found Crazy Character" << std::endl;
        }
    }
    return 0;
}

Exactly what I was looking for. Thank you.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.