So in my scripting project I mentioned yesterday I was thinking about how to implement arrays.

That got me to thinking how does c++ get arrays and inherited classes to interoperate and found out some wierd things

   class Y
{
public:
    int A;
    Y()
    {
        A = 0;
    }
};

class Z : public Y
{
public:
    int B;
    Z()
    {
        A = B = 0;
    }
};

Y variables[4];

int main(void)
{

        stuff[0] = Z();

    stuff[1] = Y();
    stuff[2] = Z();
    stuff[3] = Y();

    Z ztest = Z();
    ztest.A = 8;
    ztest.B = 9;

    Y* testY;

    Z* testZ = (Z*)&stuff[0];
    testZ->A = 1;
    testZ->B = 2;

    stuff[2] = ztest;

    testZ = (Z*)&stuff[0];
    cout << "Z: " << testZ->A << "," << testZ->B << endl;
    testY = (Y*)&stuff[1];
    cout << "Y: " << testY->A << endl;
    testZ = (Z*)&stuff[2];
    cout << "Z: " << testZ->A << "," << testZ->B << endl;


    return 0;
}

This kinda wierds me out becuase that means that if I make a class object Z and add it to an array of Z,Y objects to do a universal action than I'm inherently truncating data unless I make the array of pointers and augment the data that way and can't cast it back out. Which is not something I've really had to consider before.

Is this the reason more modern languages like C# use interface type situations and Handle Classes under the hood as references/pointers?

If you hold an array of Y objects, then you can only store Y objects in it, nothing else. When you do stuff[0] = Z();, what you are really doing this is: create a temporary Z object, and then assign a Y object (the first in the stuff array) to be equal to the Y part of the temporary Z object (which is itself destroyed shortly after). So, all that remains in that slot of the array is a Y object, it's as if the Z object never existed. A good compiler should warn you about this, I think.

If you want to store polymorphic objects (e.g. objects of any derived class) inside an array, you have to use pointers (or references, which is rather impractical in this case). That's just the way it is.

Is this the reason more modern languages like C# use interface type situations and Handle Classes under the hood as references/pointers?

Well, "modern" is subjective a bit, but I don't want to start another rant on that. Some languages use reference-semantics, while C++ uses value-semantics. In C++, everything is a variable (or "value") with a defined location in memory and a size which are fixed at compile-time, i.e., we call that a deterministic memory model (meaning that you can always predict exactly when a variable is created and destroyed, which is at declaration and at scope exit, respectively). Now, you can also use freestore (or "dynamic memory" or "the heap") to store objects which make their size and location dynamic (determined at run-time, and thus, subject to change, and allowing you to explicitly control the life-time and size of variables or allocated memory in general). In languages like C# and Java, everything is a reference to an actual object whose actual location and size is determined by the virtual machine or runtime environment, i.e., we call that a managed memory model (which is also non-deterministic, of course, because you have no means to control nor to predict the lifetime of any object or variable). Then, there are other languages, like Delphi for example, which have value semantics for all the fundamental types (integers, floats, strings, etc.) but reference semantics for all class objects, the memory model is deterministic but not as automatic as in C++ which is, arguably, pretty bad from a robustness point-of-view, so this type of languages are somewhat rare. Usually, value-semantics and deterministic / automatic memory models go together, and reference-semantics and managed memory models go together. But of course, each method can cope with each other's drawbacks to some extent (smart-pointers or garbage collection in C++, or virtual machines that automatically try to predict what objects are more local and put them on the stack, and language features like the finally statements to cope with the lack of deterministic finalization code in C#/Java).

All that said, the main reason why a language like C# uses reference-semantics is because it is a managed environment, meaning that it cannot allow value-objects (because these wouldn't be manageable!). These languages simply "hide" the fact that everything is a reference by not giving any syntax marker to say it is so, but that's just for convenience. A by-product of this is that variables of a base-class type can store derived class variables, but that's because all variables are references to something somewhere, not the actual values, and everything you actually do involves dereferencing to access memory chunks all over the place (often explaining the significant performance penalty involved in running C#/Java code).

In C++, if you want to store an array of base-class pointers to derived-class objects, then I would recommend that you pick one kind of smart-pointer. Smart-pointers are classes which wrap a pointer and add some additional semantics to the operations (usually, automatic destruction). This allows you to create objects dynamically (with new) and wrap them in a small object that will take care of destroying that dynamic object when appropriate. For instance, the standard std::shared_ptr will do reference-counting meaning that it keeps a count of all the copies of that pointer and will only delete the pointer when all copies of that pointer have been destroyed (i.e. nobody needs that object anymore so it can be deleted). Also, the standard std::unique_ptr will make sure that it is impossible to copy the pointer (meaning that there is only a unique "owner" of the dynamic object) and it can thus safely delete the pointer when the unique_ptr object itself is destroyed. For an one-size-fits-all solution, you can just use std::shared_ptr, otherwise, I suggest you read my tutorial on ownership design.

As an example, you can do:

#include <memory>
#include <iostream>

class Y {
public:
  int A;
  Y() { A = 0; }
};

class Z : public Y {
public:
  int B;
  Z() { A = B = 0; }
};

typedef std::shared_ptr<Y> Y_ptr;

Y_ptr variables[4];

int main(void)
{

  stuff[0] = Y_ptr(new Z());
  stuff[1] = Y_ptr(new Y());
  stuff[2] = Y_ptr(new Z());
  stuff[3] = Y_ptr(new Y());

  Z ztest = Z();
  ztest.A = 8;
  ztest.B = 9;

  Y* testY;

  Z* testZ = (Z*)(stuff[0].get()); // the get() function gives you the raw-pointer.
  testZ->A = 1;
  testZ->B = 2;

  stuff[2] = Y_ptr( new Z(ztest) );

  testZ = (Z*)(stuff[0].get());
  std::cout << "Z: " << testZ->A << "," << testZ->B << std::endl;
  testY = (Y*)(stuff[1].get());
  std::cout << "Y: " << testY->A << std::endl;
  testZ = (Z*)(stuff[2].get());
  std::cout << "Z: " << testZ->A << "," << testZ->B << std::endl;

  return 0;
}

If you have a somewhat older compiler, you might have to change the <memory> header with <tr1/memory>, and std::shared_ptr with std::tr1::shared_ptr.

Oops Stuff = Variables. Changed a thing or 2 after copy pasting so forgot to correct that.

Hm, I'm using the newest VC++ compiler and I've never got even a warning doing this. The book I learned from specifically mentioned this as a Pro of inheritance. So I found this actually quite disconcerning. So figured it was worth sharing.

This article has been dead for over six months. Start a new discussion instead.