Use of pointers to hierarchical data structures in C++

Question

rwarlord 0 Newbie Poster

13 Years Ago

I am having a hard time deciding whether to use pointers or not in my nested data structures.
Memory and CPU efficiencies are important as I am dealing with data files of sizes on the order of gigabytes.

I have decided to use several classes that are nested in a hiearchical manner to make the code more organized and re-usable.

Here is a simplified version of my code:

struct Segment
{
  int start, end, value;
};

class Set
{
private:
  vector<Segment*>* segments;
public:
  Set() {
    segments = new vector<Segment*>();
  }
  ~Set() {
    for (size_t i = 0; i < segments->size(); ++i) {
      delete segments->at(i);
    }
    delete segments;
  }
  vector<Segment*>* getSegments() {
    return segments;
  }
  // accessor functions...
};

class Collection
{
private:
  vector<Set*>* sets;
public:
  Collection() {
    sets = new vector<Set>();
  }
  ~Collection() {
    for (size_t i = 0; i < sets->size(); ++i) {
      delete sets->at(i);
    }
    delete sets;
  }
  void push_back(Set* set) {
    sets->push_back(set);
  }
  // other accessor functions...
};

int main()
{
  Collection collection;
  // open file, read file
  Set* set;
  while (!f.eof()) {
    Segment* segment = new Segment();
    // fill segment
    // instantiate set using "new" as needed, and fill it with segments
    collection.push_back(set);
  }
  
  // do other stuff
  
  return 0;
}

The main motivation to use pointers is so that the child data structures can be built up while reading in the data file, and a pointer is simply passed to the parent data structures, avoiding using accessor functions and copying pieces of data redundantly.

My two main concerns are:

1. Memory leak. Does the order in which the class destructors are called ensure that there is no memory leak? When the collection destructor is called, I needed to "delete sets->at(i)" to deallocate the Set objects allocated elsewhere, but does the Set destructor get called, or does a memory leak result?

2. Kdevelop using gdb does not resolve the pointers, so I cannot see the content of the Collection object during debugging. I have to print them out, which can be cumbersome.
Back in the days when I used to use Visual Studio, I think I was having similar problems when I was using pointers to classes...

Is there a better way to do this?
(Besides using several global vectors to store different data types and hope that I remember to process related data concurrently...)

c++

4 Contributors
6 Replies
394 Views
1 Day Discussion Span
Latest Post 13 Years Ago Latest Post by rwarlord

All 6 Replies

rubberman 1,355 Nearly a Posting Virtuoso

13 Years Ago

Better:

struct Segment
{
  int start, end, value;
};

class Set
{
private:
  vector<Segment*> segments;
public:
  Set() {}
  virtual ~Set() {
    for (size_t i = 0; i < segments.size(); ++i) {
      delete segments.at(i);
    }
  }
  vector<Segment*>& getSegments() {
    return segments;
  }
  const vector<Segment*>& getSegments() const {
    return segments;
  }
  // accessor functions...
};
 
class Collection
{
private:
  vector<Set*> sets;
public:
  Collection() { }
  ~Collection() {
    for (size_t i = 0; i < sets.size(); ++i) {
      delete sets.at(i);
    }
  }
  void push_back(Set* set) {
    sets.push_back(set);
  }
  // other accessor functions...
};
 
int main()
{
  Collection collection;
  // open file, read file
  Set* set = new Set;
  while (!f.eof()) {
    Segment* segment = new Segment();
    set->push_back(segment);
    // fill segment
    // instantiate set using "new" as needed, and fill it with segments
    collection.push_back(set);
  }
 
  // do other stuff
 
  return 0;
}

Memory management in C++ is no different that C by default, BUT IT COULD BE A LOT EASIER! There are techniques, such as using smart pointers and reference counters to hold your Segments and such, so you don't have to destroy them manually. I do this all the time, and as a result, have nearly zero delete statements in my code, and no memory leaks. I also have to deal with very large data sets, for statistical analysis of stocks and options, or performance and failure analysis of large computer networks, for example. My wife also does a lot of C++ coding, but she has data sets that dwarf the mind! She is a physicist and writes software to analyze and reduce the data coming out of research labs like CERN and Fermi Lab. They generate many terabytes of data daily (petabytes every month). So I think you can understand that I may have an idea what you are trying to do and why.

Anyway, one rule I try to follow that may help. Don't allocate any more than you have to. In your classes, your member variables were pointers, and that isn't necessary. Since you have to allocate them if you declare them as pointers, and then you have to destroy them also. Two steps you can eliminate, and lose nothing with regard to memory utilization.

mike_2000_17 2,669 21st Century Viking

13 Years Ago

I think the answers so far are very good, but I could have a few things to add, because I love answering good questions like this.

>>Memory and CPU efficiencies are important

Then don't use the function at(i) like you do at line 16 and 36. The at function includes bound-checking which is a waste when you are just looping through. That's just a small remark. Use the [] indexing operator or iterators instead.

>>1. Memory leak. Does the order in which the class destructors are called ensure that there is no memory leak? When the collection destructor is called, I needed to "delete sets->at(i)" to deallocate the Set objects allocated elsewhere, but does the Set destructor get called, or does a memory leak result?

To answer the question: You did it correctly in the sense that memory won't be leaking, as far as I can see. The destructors for the sets will be called and so will the default destructor of the segments. If you want to check for memory leaks, run your program in Valgrind (I would suggest you run it on a smaller amount of "test" data, both because if you do have a leak, you don't want to have massive leaks reported, and because valgrind is a virtual machine that is slow because it does a lot of bookkeeping to allow you to trace back the leaks, post mortem). For memory issues Valgrind is the tool to use.

On the grander scheme of things, the idea of even being concerned with memory leaks is, at least in the realm of experienced programmers, almost non-existing. I mean, we all had our nightmarish problems with memory leaks and corruption problems in the past when we made bad decisions that back-fired on us, but my point is after such crippling experiences and after having found ways to avoid these problems altogether, most experienced programmers value the use of certain techniques like RAII and Smart-Pointers as essential in any non-trivial programming task. With those, memory leak and corruption problems almost vanish completely and so does the use of delete .

>>Is there a better way to do this?
(Besides using several global vectors to store different data types and hope that I remember to process related data concurrently...)

Absolutely! You already have a much improved solution (and very simply so) from rubberman. And of course, what you suggested between parentheses is the worst.

There are a few more things you have to be worrying about here. Namely, the Big Three. You have implemented the destructor but no copy-constructor or assignment operator. Since there is no implementation of it, the compiler will generate a default one for you, where it will just do a simple copy of all the data members (i.e. a shallow copy). In your original code, it will not copy the memory pointed to by the pointer data member, but rather only the pointer value itself, leading to two pointers that point to the same location, you delete one, and you crash when you try to delete the other. You have thought "well I'm never gonna copy this huge data container anyways", well, you never know, maybe somebody else will, maybe you will forget about it and do it, or maybe a piece of code you didn't expect to trigger a copy will.

There are two options: disable copying (i.e. "noncopyable" idiom) or implement both the copy-constructor and copy-assignment. To disable the copying, just make both special functions private and with no implementation:

class Collection
{
private:
  vector<Set*> sets;

  Collection(const Collection&); //no implementation and private access.
  Collection& operator=(const Collection&); //idem. This make Collection non-copyable
public:
  Collection() { }
  ~Collection() {
    for (size_t i = 0; i < sets.size(); ++i) {
      delete sets.at(i);
    }
  }
  void push_back(Set* set) {
    sets.push_back(set);
  }
  // other accessor functions...
};

The second option is to implement the copy semantics. In this case, the preferred option is usually the copy-and-swap idiom. In this case, you make a swap function (which you would probably often use instead of the copy anyways) and then use it to implement the copy-assignment:

class Collection
{
private:
  vector<Set*> sets;
public:
  Collection() { }
  ~Collection() {
    for (size_t i = 0; i < sets.size(); ++i) {
      delete sets.at(i);
    }
  }
  Collection(const Collection& C) : sets(C.sets.size()) {
    for(size_t i=0; i<sets.size(); ++i)
      sets[i] = new Set( *(C.sets[i]) ); //deep-copy.
  };
  Collection& operator=(const Collection& C) {
    Collection tmp(C); //copy to temporary.
    swap(tmp);         //swap with the temporary.
    return *this;      //leave the old data in temporary to be destroyed at return from function.
  };
  Collection& swap(Collection& C) {
    sets.swap(C.sets); //swap the vectors, no overhead at all.
  };

  void push_back(Set* set) {
    sets.push_back(set);
  }
  // other accessor functions...
};

Now, you may notice from the above that you still have these very ugly loops that delete all the elements. This is because of raw pointers which need this kind of attention to avoid memory leaks. The solution to this is to either not use pointers at all and store them by value, or use a smart pointer. To not store by pointer, you would have to live with some overhead in copying the elements into the container, however, for large amounts of data this often is better at the end. The reason why it is better is because memory is kept more locale (when objects a contiguous they are much faster to access sequentially). So accessing operations on your containers will be much faster if you allow some overhead in placing them into the container (that is usually true in many situations in programming). And to minimize the overhead as much as possible, you can use swapping to your advantage too:

class Collection
{
private:
  vector<Set> sets; //notice a vector of sets, held by value.
public:
  Collection() { };
  ~Collection() { }; //no more loop needed.
  Collection(const Collection& C) : sets(C.sets) { }; //notice, no more deep-copy needed!
  Collection& operator=(const Collection& C) {
    Collection tmp(C); //copy to temporary.
    swap(tmp);         //swap with the temporary.
    return *this;      //leave the old data in temporary to be destroyed at return from function.
  };
  Collection& swap(Collection& C) {
    sets.swap(C.sets); //swap the vectors, no overhead at all.
    return *this;
  };

  void push_back(Set& set) { //pass by non-const reference.
    sets.push_back(Set());   //create an empty element at the back().
    sets.back().swap(set);   //swap the parameter with the back element with minimal.
  }
  // other accessor functions...
};

You will notice that you can't completely avoid copying Set objects because std::vector might perform copying when relocation is needed. You could use a std::list instead and avoid the occasional relocations and copies. However, I have found, in my experience, that just making the copy operation cheaper on the Set class is often better. In this case, Set can be made cheaper to copy by using a shared_ptr to store the segment container:

class Set {
private:
  boost::shared_ptr< vector<Segment> > segments;
public:
  Set() : segments(new vector<Segment>()) {}; 
  virtual ~Set() { }; //store-by-value so no need for destructor, and shared_ptr is automatically deleted.
  Set(const Set& S) : segments(S.segments) {}; //for shared ownership of segments
  //Set(const Set& S) : segments(new vector<Segment>(*(S.segments))) {}; //for "unique" ownership of segments.
  Set& operator=(const Set& S) {
    Set tmp(S); //same as before.
    return swap(tmp);
  };
  Set& swap(Set& S) {
    segments.swap(S.segments); //swap the pointers only.
    return *this;
  };

  boost::shared_ptr< vector<Segment> > getSegments() {
    return segments;
  }
  const boost::shared_ptr< vector<Segment*> > getSegments() const {
    return segments;
  }
  // accessor functions...
};

The above makes Set very cheap to copy and safe when it comes to memory leaks. In your example, Segment is also small and cheap to copy so it will be fine to hold it by value. If it is bigger, well you now have enough tools to find a good way to solve that problem. Also, notice that the above has no "delete" statements... what did we tell you! Delete statements are a thing of the past for many programmers.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Greywolf333 11 Junior Poster · Answer 1 · 2011-03-18T16:37:34+00:00

I am having a hard time deciding whether to use pointers or not in my nested data structures.
Memory and CPU efficiencies are important as I am dealing with data files of sizes on the order of gigabytes.
I have decided to use several classes that are nested in a hiearchical manner to make the code more organized and re-usable.
Here is a simplified version of my code:
struct Segment
{
  int start, end, value;
};

class Set
{
private:
  vector<Segment*>* segments;
public:
  Set() {
    segments = new vector<Segment*>();
  }
  ~Set() {
    for (size_t i = 0; i < segments->size(); ++i) {
      delete segments->at(i);
    }
    delete segments;
  }
  vector<Segment*>* getSegments() {
    return segments;
  }
  // accessor functions...
};

class Collection
{
private:
  vector<Set*>* sets;
public:
  Collection() {
    sets = new vector<Set>();
  }
  ~Collection() {
    for (size_t i = 0; i < sets->size(); ++i) {
      delete sets->at(i);
    }
    delete sets;
  }
  void push_back(Set* set) {
    sets->push_back(set);
  }
  // other accessor functions...
};

int main()
{
  Collection collection;
  // open file, read file
  Set* set;
  while (!f.eof()) {
    Segment* segment = new Segment();
    // fill segment
    // instantiate set using "new" as needed, and fill it with segments
    collection.push_back(set);
  }
  
  // do other stuff
  
  return 0;
}
The main motivation to use pointers is so that the child data structures can be built up while reading in the data file, and a pointer is simply passed to the parent data structures, avoiding using accessor functions and copying pieces of data redundantly.
My two main concerns are:
1. Memory leak. Does the order in which the class destructors are called ensure that there is no memory leak? When the collection destructor is called, I needed to "delete sets->at(i)" to deallocate the Set objects allocated elsewhere, but does the Set destructor get called, or does a memory leak result?
2. Kdevelop using gdb does not resolve the pointers, so I cannot see the content of the Collection object during debugging. I have to print them out, which can be cumbersome.
Back in the days when I used to use Visual Studio, I think I was having similar problems when I was using pointers to classes...
Is there a better way to do this?
(Besides using several global vectors to store different data types and hope that I remember to process related data concurrently...)

So, "sets" is a pointer to an array of vectors containing pointers to set pointers which point to arrays of vectors of segment pointers. I think hehe. It is a little strange to mix dynamic arrays and vectors to this degree, especially when vectors are already dynamic arrays. You'd be hard pressed to convince people that this is necessary.

I think that yes you should worry about a memory leak in Collection. Since calling delete on an array of pointers to vectors will cause the vectors to destruct, and the vectors that are "destructing" will call the destructor of the elements--which unfortunately, are only pointers, and the pointers won't automatically dereference and call the destructors of the actual classes, they'll just simply vanish. I can't be certain but I'm not comfortable at all with that... maybe one of the gurus here might know.

You'd be much better off using a 2d or 3d vector and do away with the pointers altogether. That way you can be certain that the destructors will all be called without the hassle of worrying about pointers and dereferencing them to destruct them. If you're worried about the size of it on the stack, stl vector stores things on the heap.

rubberman 1,355 Nearly a Posting Virtuoso Featured Poster · Answer 2 · 2011-03-19T12:24:46+00:00

Mike makes excellent points. And he took a good deal of his valuable time to point these details out. I'm usually a bit more parsimonious in my advice! :-) Anyway, most of what he says is spot on.

Greywolf333 11 Junior Poster · Answer 3 · 2011-03-19T12:35:43+00:00

I am having a hard time deciding whether to use pointers or not in my nested data structures.
Memory and CPU efficiencies are important as I am dealing with data files of sizes on the order of gigabytes.
I have decided to use several classes that are nested in a hiearchical manner to make the code more organized and re-usable.
Here is a simplified version of my code:
struct Segment
{
  int start, end, value;
};

class Set
{
private:
  vector<Segment*>* segments;
public:
  Set() {
    segments = new vector<Segment*>();
  }
  ~Set() {
    for (size_t i = 0; i < segments->size(); ++i) {
      delete segments->at(i);
    }
    delete segments;
  }
  vector<Segment*>* getSegments() {
    return segments;
  }
  // accessor functions...
};

class Collection
{
private:
  vector<Set*>* sets;
public:
  Collection() {
    sets = new vector<Set>();
  }
  ~Collection() {
    for (size_t i = 0; i < sets->size(); ++i) {
      delete sets->at(i);
    }
    delete sets;
  }
  void push_back(Set* set) {
    sets->push_back(set);
  }
  // other accessor functions...
};

int main()
{
  Collection collection;
  // open file, read file
  Set* set;
  while (!f.eof()) {
    Segment* segment = new Segment();
    // fill segment
    // instantiate set using "new" as needed, and fill it with segments
    collection.push_back(set);
  }
  
  // do other stuff
  
  return 0;
}
The main motivation to use pointers is so that the child data structures can be built up while reading in the data file, and a pointer is simply passed to the parent data structures, avoiding using accessor functions and copying pieces of data redundantly.
My two main concerns are:
1. Memory leak. Does the order in which the class destructors are called ensure that there is no memory leak? When the collection destructor is called, I needed to "delete sets->at(i)" to deallocate the Set objects allocated elsewhere, but does the Set destructor get called, or does a memory leak result?
2. Kdevelop using gdb does not resolve the pointers, so I cannot see the content of the Collection object during debugging. I have to print them out, which can be cumbersome.
Back in the days when I used to use Visual Studio, I think I was having similar problems when I was using pointers to classes...
Is there a better way to do this?
(Besides using several global vectors to store different data types and hope that I remember to process related data concurrently...)

Good stuff Mike 'n Rubber. Ah Mike's right he doesn't have a memory leak after all. For some reason I read it like he was using the vector functions to clear the elements.

rwarlord 0 Newbie Poster · Answer 4 · 2011-03-19T21:51:18+00:00

Thank you for all your feedbacks.
Your suggestions have much improved my code.

I agree that I had used many native pointers unnecessarily; references, store-by-values, and smart pointers can do better jobs.

I had not considered the sequential access advantage of copying data to a continguous block, as well as the other points raised.

Thank you for all your help!

Use of pointers to hierarchical data structures in C++

Recommended Answers Collapse Answers

All 6 Replies

Recommended Answers