Taking the size of an object

Question

caut_baia 9 Posting Whiz

14 Years Ago

Hi everyone.I'll just get to the point.I want to write objects of type B to a file in binary form using fstream and it's 'write' member function.I need to pass a size for an object of type B but since it contains dynamic data and i can't use just a plain sizeof(object) should i take the size writing a function like down below ?

class A {
  int x;
public:
  A (): x(0) {}
  ~A () {}
 };


class B {
   std::deque<A*> vec;
public:
   B () {}
   B (int x) : vec(std::deque<A*>(x,new A()) {}
   ~B ()  {
      while (vec.size())  {
         delete vec[0];
         vec.pop_front(); 
      }
   int size ()  {
      return vec.size()*sizeof(A)+sizeof(*this); //this is it .. is it ok?
    }
 };

Thanks a lot

c++

5 Contributors
14 Replies
653 Views
4 Days Discussion Span
Latest Post 14 Years Ago Latest Post by Labdabeta

All 14 Replies

Narue 5,707 Bad Cop

14 Years Ago

I'll just get to the point.

That's certainly better than the alternative.

I want to write objects of type B to a file in binary form using fstream and it's 'write' member function.

That would be unwise. B is not a POD type, which wreaks all kinds of havok when trying to do a naive serialization. In the case of std::vector, the most obvious issue is that the item array will likely be represented by a pointer. The write member function performs a shallow write of the bytes of the object, so you would be serializing an address rather than the array data. The address is transient, and thus persisting it to file would be a wasted effort.

Narue 5,707 Bad Cop

14 Years Ago

So then .. can i get a hint on how to do this?

It's hard to go wrong with manual serialization:

#include <fstream>
#include <iostream>
#include <istream>
#include <ostream>
#include <sstream>
#include <string>
#include <typeinfo>
#include <vector>

template <typename Target, typename Source>
Target lexical_cast(Source arg)
{
    std::stringstream conv;
    Target result;

    if (!(conv << arg && conv >> result))
        throw std::bad_cast("lexical_cast<>");

    return result;
}

template <typename T>
class MyVec {
    typename std::vector<T> _base;
public:
    void push_back(const T& value) { _base.push_back(value); }
    void clear() { _base.clear(); }

    std::string serialize() const
    {
        std::string s;

        for (std::vector<T>::size_type i = 0; i < _base.size(); i++) {
            s += lexical_cast<std::string, T>(_base[i]);

            if (i < _base.size() - 1)
                s += ',';
        }

        return s;
    }

    void deserialize(std::istream& in)
    {
        std::string part;

        while (std::getline(in, part, ','))
            _base.push_back(lexical_cast<T, std::string>(part));
    }

    template <typename T>
    friend std::ostream& operator<<(std::ostream& out, const MyVec<T>& v)
    {
        return out<< v.serialize();
    }
};

template <typename T>
void save(MyVec<T> v)
{
    std::ofstream out("test.txt");

    out<< v.serialize();
}

template <typename T>
void load(MyVec<T>& v)
{
    std::ifstream in("test.txt");

    v.deserialize(in);
}

int main()
{
    MyVec<std::string> v;

    v.push_back("a");
    v.push_back("b");
    v.push_back("this is a test");
    v.push_back("booga");
    v.push_back("meep");

    std::cout<<"Original:     '"<< v <<"'\n";
    save(v);
    v.clear();
    std::cout<<"Cleared:      '"<< v <<"'\n";
    load(v);
    std::cout<<"Deserialized: '"<< v <<"'\n";
}

Narue 5,707 Bad Cop

14 Years Ago

i assume 'naive serialization' means serializing using only the size of the object as a cue for performing the insertion/extraction..

I made up the term, so no worries. Basically it means treating every object as a sequence of bytes that can be written to some storage medium as-is and read back without any issues. This brings four categories to mind, in order of least to most complex in terms of serialization:

Verbatim sequences of bytes. These will pretty much be safe to perform naive serialization on because there's very little that can get in the way of safe and lossless I/O.
Plain Old Data (POD). There's no magic going on behind the scenes, such as virtual tables. POD types are generally safe to "pun" into sequences of bytes, but the issue of byte ordering does come up. For example, if you pun an int as big-endian and try to restore it as little-endian, you're not likely to get the same value. In this case you can correct the issue by reading and writing the bytes manually in the same order.
Non-shallow data. Pretty much any time you have a pointer, writing the value of the pointer is pointless because upon restoring the pointer, the address may not be valid any longer. For this kind of thing you have no choice but to perform a deep copy to the storage medium.
Non-POD. All bets are off, there's voodoo going on under the hood that you don't know about and bitwise serialization will very likely blow up in your face.

mike_2000_17 2,669 21st Century Viking

14 Years Ago

Serialization 101:

class oserializer; //forward declaration
class iserializer;

//create a base abstract class for all serializable objects:
class serializable {
  public:
    virtual oserializer& serialize(oserializer&) const = 0;
    virtual iserializer& deserialize(iserializer&) = 0;
};

//create a serializer class (for saving, o for output):
class oserializer {
  private:
    std::ostream out_file; //hold a file to write to.
  public:
    oserializer(const std::string& filename) : out_file(filename) { };
    
    //Create a bunch of overloads of operator << for each primitive types:
    oserializer& operator <<(const int& value) {
      //save to the file, any way you like (e.g. binary or XML), say binary:
      out_file.write((char*)&value,sizeof(int));
      return *this;
    };
    // ... and so on for all primitive types (unsigned int, char, float, double, etc.)
    
    //Then, write a few overloads to handle some common STL containers (technically you have them all):
    template <class T>
    oserializer& operator <<(const std::vector<T>& vec) {
      *this << vec.size(); //keep a record of the size, this will be useful for loading.
      for(std::vector<T>::const_iterator it = vec.begin(); it != vec.end(); ++it)
        *this << *it; //save all the elements using one of the primitive overloads.
      return *this;
    };
    //so on for the other STL containers.

    //now, write an operator for a serializable object:
    oserializer& operator <<(const serializable& obj) {
      return obj.serialize(*this); //just call serialize.
    };
    //provide one for a pointer too: (or better: a smart pointer)
    oserializer& operator <<(const serializable* obj) {
      return obj->serialize(*this);
    };
};

//now you can create classes like this for example:
class Foo : public serializable {
  int a;
  float b;
  char c;
  public:
    virtual oserializer& serialize(oserializer& out) const {
      return out << a << b << c;
    };
    virtual iserializer& deserialize(iserializer& in) {
      return in >> a >> b >> c; 
    };
};

//and like this:
class Bar : public serializable {
  Foo* f;
  std::vector<Foo*> fv;
  public:
    virtual oserializer& serialize(oserializer& out) const {
      return out << f << fv;
    };
    virtual iserializer& deserialize(iserializer& int) {
      return in >> f >> fv; //isn't that nice! and perfectly safe!
    }; 
};

The above is the most basic form of it. You can make this a bit more fancy by adding names to the variables you save (such that it makes sense if saved in XML). You can make the serializer class a base class such that you can interchange the file-format. You can keep records of objects that were saved in order to break cycles in object cross-references. You can use some template meta-programming techniques to avoid having a tone of overloaded operator functions. etc. etc.

Edited 14 Years Ago by mike_2000_17 because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Fbody 682 Posting Maven Featured Poster · Answer 1 · 2011-02-21T20:05:16+00:00

Unless it's more feasible for you to use a friend function (mostly for reusability), I think you would be better off incorporating your read/write directly into your class. This way, you can tell an object to write itself to the file or tell it to read its own data from the file. It would also give you better access to the member data.

i.e.:

class someClass
 private:
   //data members
 public:
   //function members
   ofstream &writeFile(ofstream &outStream) {
     //necessary loops and output statements
   }
   //other function members
};

int main() {
  someClass anObject;
  ofstream output("output.txt", ios::out | ios::trunc | ios::binary);

  //manipulate anObject

  anObject.writeFile(output);

  //remaining code
}

caut_baia 9 Posting Whiz · Answer 2 · 2011-02-21T20:09:46+00:00

So then .. can i get a hint on how to do this?Storing a UDT object that contains dynamic data , which can be then retrieved and used within the program?I know it works for static objects but i'm not sure how to this and i've been struggling for a few days now.Anything would be appreciated.And thanks btw

caut_baia 9 Posting Whiz · Answer 3 · 2011-02-21T20:18:28+00:00

Thanks Fbody but a have a hierarchy of objects and i must store the most derived form of it .It's like the fourth in order and each contain different objects and arrays allocated dynamically.The most i can do is to define a Size method for each but i'm not even sure i can do this.

class A {}
class B : public A {
   //dynamic data
   //vectors of dynamic pointers to dynamic allocated objects
 };
class C : public A {
   //same as B
 };
//and so on
int main ()  {
  C b;
  C c;
  Write (b,b.Size()); //wrapper for fstream.write()    
  Read (c); 
 }

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster · Answer 4 · 2011-02-22T00:06:04+00:00

I agree with Narue, there is no escaping it. You have to implement the serialization method for each class. If you have a hierarchy of classes, then make it a virtual function and in each class you store its data only and call the base class method to serialize its data (you can do it in either direction, e.g. call the base class serialize first and then serialize the derived class' data, or the other way around). Make sure that both serialization and unserialization functions do it in the exact same order. Finally, you might want to take a look at Boost.Serialization for a good example of how to do this, and you might even want to use it because it is not very intrusive.

caut_baia 9 Posting Whiz · Answer 5 · 2011-02-23T03:28:39+00:00

Thanks a lot for the example narue and mike.Though i don't really get it.I'm not really acquainted to some definitions so i assume 'naive serialization' means serializing using only the size of the object as a cue for performing the insertion/extraction..I googled about it but couldn't find much.I revisited the I/O section in the standard.Made every object inside the program static so there can't be an issue regarding the size of the objects being stored.One more thing though.The problem occurs only when trying to store objects of type C,D etc in an A,B,C,D,E hierarchy.The object of type D is 848 bytes and contains two vectors of type C and some type A and B objects.Also i've written a test program and sometimes i can store say .. 1000 objects and retrieve them.But sometimes i can't.Also i can acces member functions from inside the objects being retrieved so the data seems to be there well organized but when i try stashing then in a vector i get a crash.I have never encountered such problems before and it really gets tedious since it's been like four days since i have this problem.Anyway i appreciate your time and effort.

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster · Answer 6 · 2011-02-23T07:07:23+00:00

What I call "naive serialization" is this:

class A {
  //some stuff.. possibly arrays, pointers or resources of any kind. 
};

int main() {
  A a;
  ofstream outfile("foo.dat");
  outfile.write((char*)&a, sizeof(A));
  return 0;
};

The above is "naive serialization", which is probably a term I made up, so looking it up is not going to yield anything. Possibly also because "naive" is an understatement in this case, but I couldn't give the appropriate name without violating Daniweb's policy on foul language!

I really really recommend you that a good read of the description of the Boost.Serialization library, it will answer your questions.

The way you are describing your solution, really doesn't seem good, very dangerous in fact. Here it is line-by-line:

>>I revisited the I/O section in the standard.
The I/O section of the standard is really not going to help for this purpose, read Boost.Serialization instead.

>>Made every object inside the program static
That's a horrible restriction to impose on your software. This will be a huge burden on all your future development in this project. You ought to find a method that doesn't involve such a restriction (in fact it shouldn't impose any restrictions, my serialization library doesn't and neither does Boost.Serialization).

>>so there can't be an issue regarding the size of the objects being stored.
The size of objects are always constant, regardless of whether they hold dynamic data or not. Dynamic data (i.e. a dynamic array or STL container) is basically just storing and handling a pointer (which has fixed size).

>>The problem occurs only when
You should say "the problems are only visible when.." because I guarantee there are plenty of problems that you just didn't happen to see.

>>The object of type D is 848 bytes and contains two vectors of type C and some type A and B objects.
How are those objects stored in the D class (by pointer or actual instances) and how are they individually being stored in the file (by calling their serialization function or by just dumping them in, like in my snippet of code about)? How are you marking the size of the vector, do you save an int representing the size before you save all the objects?

>>Also i've written a test program and sometimes i can store say .. 1000 objects and retrieve them.But sometimes i can't.
It looks like your method is not very deterministic (because of the heavy use of the word "sometimes"). Unless you are doing massive multithreading or using a random-number generator, the behaviour of your program should be very deterministic. So the phrase "sometimes this happens and sometimes that" usually means you have big problem (typically: memory corruption!).

>>Also i can acces member functions from inside the objects being retrieved so the data seems to be there well organized
Being able to access member functions (unless they are virtual) has no relation to whether you successfully retrieved its data or not. Member functions are not part of the object's data, they are part of the program. The fact that you can access them after you loaded the just means that the compiler did its job, but it says nothing about whether your code is working or not.

>>when i try stashing then in a vector i get a crash.
Either the vector in which you are stashing them is probably corrupt, it would be if you used a method similar to the code above.

>>I have never encountered such problems before
When you start dealing with the binary footprint of classes, it is always a big step-up from typical programming problems. You are not the first to fall in this trap and you won't be the last. This is not an easy thing to understand, but the solution is easy once you do. As Narue and I have already mentioned, you need each class to implement its own serialization and deserialization function, and then all the STL containers and primitive types will require that you make a special serialization function for them too. Long story short, just use Boost.Serialization, all this work is done for you in there.

If you want to make your own, please be careful, and post code instead of vague explanations (at least, that's what I prefer).

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster · Answer 7 · 2011-02-23T08:30:51+00:00

I made up the term, so no worries. Basically it means treating every object as a sequence of bytes that can be written to some storage medium as-is and read back without any issues. This brings four categories to mind, in order of least to most complex in terms of serialization:
Verbatim sequences of bytes. These will pretty much be safe to perform naive serialization on because there's very little that can get in the way of safe and lossless I/O.

Plain Old Data (POD). There's no magic going on behind the scenes, such as virtual tables. POD types are generally safe to "pun" into sequences of bytes, but the issue of byte ordering does come up. For example, if you pun an int as big-endian and try to restore it as little-endian, you're not likely to get the same value. In this case you can correct the issue by reading and writing the bytes manually in the same order.

Non-shallow data. Pretty much any time you have a pointer, writing the value of the pointer is pointless because upon restoring the pointer, the address may not be valid any longer. For this kind of thing you have no choice but to perform a deep copy to the storage medium.

Non-POD. All bets are off, there's voodoo going on under the hood that you don't know about and bitwise serialization will very likely blow up in your face.

5. An object hierarchy where there are cross-references (references or pointers) between different objects and possibly reference cycles. This requires the serializer to keep track of objects that have been saved already to avoid duplication and infinite loops if the hierarchy has cycles.
6. An object hierarchy containing classes whose definition is spread of several modules (shared libraries, DLLs, and/or executables). Then, the serializer will need a cross-modular RTTI system (which C++ does not provide).

Basically, any of the cases from 4 to 6 require a lot more than a simple class analogous to the iostreams from the standard libraries. It require use of RTTI which is more persistent than the C++ RTTI. It requires a fair amount of template meta-programming. And it usually requires some form of smart pointers and/or reference counting. This is essentially what Boost.Serialization does. My serialization library is very similar as well.

A fairly simple serializer might work for cases 1 to 3. But they are also very limited, and from the looks of your hierarchy A,B,C,D,E, I imagine that your classes won't fit in the first three cases, because inheritance implies a virtual destructor which implies a virtual table which already puts you in case number 4.

caut_baia 9 Posting Whiz · Answer 8 · 2011-02-23T09:11:28+00:00

So i should instead create two methods Serialize and Deserialize which process packet like aggregates containing data from which objects can be reconstructed?That would be simple enough to store because it would resemble a few primitive types so i could use 'naive serialization'.Is that the most common way used for complex types because i feel like i'm beginning to sound absurd in what i'm trying to do.

caut_baia 9 Posting Whiz · Answer 9 · 2011-02-23T17:15:24+00:00

Thank you very much mike for the explicit example.I'll give it a shot right away.

Labdabeta 182 Posting Pro in Training Featured Poster · Answer 10 · 2011-02-25T21:01:46+00:00

Not sure if this is what you are looking for but whenever I have a class that I wish to write to a file I just overload the '<<' operator, then I can do it specially for each object and string them together like:

fstream file("MyFile.txt", ios_base::in|ios_base::out);
file<<MyClass()<<MyOtherClass()<<AnotherMyClass()<<2<<"BOB"<<endl;
And it works fine.

Taking the size of an object

Recommended Answers Collapse Answers

All 14 Replies

Recommended Answers