So, I've gotten myself in a bit of a pickle. I have been serializing data I collect with:

vector<myClass> mVec;
... // mVec is filled with the data to collect

for (auto it = mVec.begin(); it != mVec.end(); ++it)
{
    myfile.write(reinterpret_cast<char*>(&(*it)), sizeof(*it));
}

I've wrapped this code in a templated function, and used it a LOT over the past couple of years. However, over the past 3 weeks, I have made a huge mistake because the object I've been serializing is not always the same size:

class myClass
{
    std::string m_str;
    // ... other data
};

Because myClass contains a string, it's size can vary, so my usual ReadDATA() is not going to work:

template<class T>
static void ReadData(std::vector<T>* pData
    , const char* pszFileName)
{
    std::ifstream myfile;
    T tp;
    myfile.open(pszFileName, std::ios::binary);
    while (myfile.read(reinterpret_cast<char*>(&tp)
        , sizeof(tp)).gcount() == sizeof(tp))
    {
        pData->push_back(T(tp));
    }
    myfile.close();
}

That is the bad news. The good news, the m_str object of my class is a finite size (Let's say a max of 10 total characters).

Is there any way I can possibly recover this data of objects of unknown size? I am really hoping that I did not just lose 3 weeks of data over a silly mistake! I've been in permanent face-palm mode ever since I realized my error earlier today. I will be very, very thankful for any help!

Recommended Answers

All 7 Replies

Just rewrite the serialization routines to write the first byte (or 32-bit chunk) as the size. On the read side, read the size first and then read however many bytes it indicates.

commented: That'll work in the future... +2

Thank you, that is a good way to do it in the future, but my problem is that I want to recover the data that I've serialized over the past 3 weeks.

I just noticed something in your code: you are serializing a string class by casting it to an array of char - that might not work the way you intend it to. The size of a std::string is not the size of the data it contains. You may deal with this in some way not in your example but, as provided, this would not work.

Hopefully you are serializing to the same machine you are reading from...

If they are is there any field in the data that you can use to infer the size? If so, read up to that element and decide the remainder (assuming that element appears earlier in the structure).

If not, can you recreate the objects you've written and determine from them what the size would be?

To be honest, I'm more worried that you've serialized a std::string and lost the data you think has been stored all along...

Well, if your program is anything like the one below then the strings may not be anywhere in the file because std::string contains a pointer to where the string is located in memory, and writing out the class does not auto write out those strings. In this case you just lost all three weeks of work without any hope of recovery. Load the file into memory with a hex editor to verify whethere the actual text of the strings are in the file.

Also, run the program and you will see that sizeof(MyClass) is the same for all records in the file regardless of the length of the string.

#include <fstream>
#include <string>
#include <iostream>

using namespace std;

class MyClass
{
    int a, b, c;
    string s1;
    float d, e, f;
public:
    MyClass()
    {
        a = b = c = 0;
        d = e = f = 0.F;
    }
    void setstr(string s)
    {
        s1 = s;
    }


};


int main()
{
    MyClass c;
    string s;
    ofstream out("text.dat", ios::binary);
    for (int i = 0; i < 5; i++)
    {
        cout << "Enter string # " << i << '\n';
        getline(cin, s);
        c.setstr(s);
        cout << "sizeof(c) = " << sizeof(c) << '\n';
        out.write((char*) &c, sizeof(c));
    }

}

It is usually much easier to serialize c++ classes if you use char arrays instead of std::sting. This makes sizeof(MyClass) the same regardless of the length of the actual string within the array.

commented: Very clear example and excellent advice! +2

Well, thank you all very much! This thread is resolved. So, at least I can recover the data minus the string member, AND, I have good advice on how to serialize in the future. Thank you.

I wanted to follow up on this thread for the final solution in case anyone else repeats my mistake.

When ReadData() creates the std::string data member, it cannot assign anything meaninful to the pointer address. This is not a problem, so long as the std::string member is not ever accessed, but when the MyClass object is destroyed, it calls the std::string member's destructor, and this tries to free up memory from the heap (from an address which has not been allocated).

So, I created a dummy variable in MyClass to replace the std::string variable. I changed:

class myClass
{
    std::string m_str;
    // ... other data
};

to

struct DumStr
{
    int a,b,c,d,e,f,g;
};

class myClass
{
    DumStr m_str;
    // ... other data
};

On my system, both std::string and DumStr objects take up 28 bytes.

In this way, the heap is not touched ever by myClass, and I was able to read the rest of the data, only losing the information stored in the string.

IMO the best solution is to not use std::string at all in MyClass but use a character array, which will ensure the string is written/read from the binary file without a problem.

class MyClass
{
   char str[126];
   // other objects
};

This solution is for writing data to new files, not for reading your existing data.

commented: Yes, thank you. That is what I will do in the future! +2
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.