Hello!

I'm writing some mathematical iterators over some big classes that I don't want to store in memory at all, but I still want to be able to iterate through them. As an example (but I have a few of these), think of a class Combinations({0,1,2...7}, 3), which means I would like to iterate over all subsets of size 3 from the set {0,1,2...7}

I already have the code for this, that's not my problem. The thing is, I'd like to make it compatible with the STL. Right now, I have only one class, which is called "CombinationsIterator" that has a function called bool Next(vector<int>& output) and when called, it kinda acts as std::next_permutation(output), just modifies output so that it represents the next combination. And so if I want to iterate over everything, I do this:

CombinationIterator<int> X(somevectorofints, 3);
vector<int> combination = X.First(); //I defined this in combinationiterator

do
{
    //operate with combination
} while (X.Next(combination)); //next returns false if there are no more combinations

Don't get me wrong: this works perfectly for now. But I'd like to make it compatible with the STL so that I get access to tons of neat functions like find_if() and so on.

As I said, one would never want to hold all combinations in memory.

Now, before someone starts to say "oh, but combinations is in such and such library...", yeah, maybe, but I have a bunch of those for many other mathematical objects.

What would this mean? Probably change next to ++it, but then what is an iterator? does it hold the "current combination" on its own, and calling *it returns a reference to current_combination?

But then when I copy the iterator (like it1 = it2) (I think iterators are supposed to be copyable), should it copy the whole thing? Isn't that a waste? Is there a smarter way to approach this? And what would the iterator "end()" be? begin() is easy. What's a comparison between iterators? Would I need to compare the whole thing?

I'm confused...

Thank you!

Comments
Good question!

This is a good question. Iterators are awesome but with the implementation of a custom iterator often comes a number of non-trivial issues, especially when you're pushing the envelop.

My first recommendation would be to get well acquainted with the different iterator concepts, if you haven't done so already. They are grouped in a logical fashion and it is important to understand the requirements and behavior you need to fulfill.

But I'd like to make it compatible with the STL so that I get access to tons of neat functions like find_if() and so on.

Aligning yourself with standard libraries and idiomatic, modern C++ is certainly a great idea. You should nevertheless be careful and make sure you are not bending your library or the standard library in ways that are too awkward.

Now, before someone starts to say "oh, but combinations is in such and such library...", yeah, maybe, but I have a bunch of those for many other mathematical objects.

I can't help myself, I must point you to the Boost.Iterator library, in case you weren't aware of it yet. They do have a permutation_iterator which you might consider, at least, for inspiration.

What would this mean? Probably change next to ++it, but then what is an iterator? does it hold the "current combination" on its own, and calling *it returns a reference to current_combination?

That's an option. Or, it could manufacture the "current combination" when it is dereferenced. Either way, all that is required for an input iterator is that it can deliver a "value". How it delivers that value is your business.

I think iterators are supposed to be copyable

Yes, they technically are required to be copyable, this is mostly because many STL algorithms take the iterators by value and will often pass them forward to other function templates that take them by value too. This is a bit annoying because most algorithms could be implemented without this requirement, i.e., with only the requirement that they be movable.

But then when I copy the iterator (like it1 = it2), should it copy the whole thing? Isn't that a waste? Is there a smarter way to approach this?

Of course, when you copy the iterator, you don't need it to copy the "whole thing" (whatever that means), because you can implement the copy operations as you see fit. And yes, it would seem wasteful to copy the "whole thing" every time. And to the third question, there are at least one smarter approach, one possibly suitable approach and one dirty approach (and the "ideal solution" I already mentioned, is to make a movable iterator).

1) The smart approach - Copy-on-Write:

A smarter way to do this would be through a Copy-on-Write (CoW) strategy. I'm gonna give an example using the next_permutation algorithm, because I don't understand your CombinationIterator class (not enough details given). Here is a simple permutation iterator that uses a CoW strategy:

template <typename T, typename Compare = std::less<T> >
class perm_iter_CoW {
  private:
    std::vector<T>* p_vect;  //<--  store pointer to the storage.
    int* ref_count;          //<--  store pointer to the shared reference count.
    Compare compare;

    // define a deep-copy function to use when a write is needed:
    void copy_storage() {
      p_vect = new std::vector<T>(*p_vect);  // deep-copy of storage.
      --(*ref_count);                        // decrement reference count.
      ref_count = new int(1);                // create a fresh reference count.
    };

  public:
    typedef perm_iter_CoW<T,Compare> self;

    // constructor from a range of values:
    template <typename FIter>
    perm_iter_CoW(FIter first, FIter last, Compare comp = Compare()) :
                  p_vect(new std::vector<T>(first, last)), 
                  ref_count(new int(1)), compare(comp) {
      std::sort(p_vect->begin(), p_vect->end(), compare);
    };

    // copy-constructor (shallow-copy + ref-count increment):
    perm_iter_CoW(const self& rhs) :
                  p_vect(rhs.p_vect), ref_count(rhs.ref_count), compare(rhs.compare) {
      ++(*ref_count);
    };

    // move-constructor:
    perm_iter_CoW(self&& rhs) :
                  p_vect(rhs.p_vect), ref_count(rhs.ref_count), 
                  compare(std::move(rhs.compare)) {
      rhs.p_vect = NULL;
      rhs.ref_count = NULL;
    };

    // swap function:
    friend void swap(self& lhs, self& rhs) {
      using std::swap;
      swap(lhs.p_vect, rhs.p_vect);
      swap(lhs.ref_count, rhs.ref_count);
      swap(lhs.compare, rhs.compare);
    };

    // copy-/move-and-swap:
    self& operator=(self rhs) {
      swap(*this,rhs);
      return *this;
    };

    // destructor:
    ~perm_iter_CoW() {
      if(ref_count) {
        if(*ref_count < 2) {
          delete p_vect;
          delete ref_count;
        } else {
          --(*ref_count);
        };
      };
    };

    // increment operator:
    self& operator++() {
      if( !ref_count )
        return *this;
      if( *ref_count > 1 )
        copy_storage();
      if( ! std::next_permutation(p_vect->begin(), p_vect->end(), compare) ) {
        delete p_vect;    p_vect = NULL;
        delete ref_count; ref_count = NULL;
      };
      return *this;
    };

    self operator++(int) {
      if( !ref_count )
        return *this;
      self result(*this);  // make a copy.
      if( *ref_count > 1 )
        copy_storage();
      if( ! std::next_permutation(p_vect->begin(), p_vect->end(), compare) ) {
        delete p_vect;    p_vect = NULL;
        delete ref_count; ref_count = NULL;
      };
      return result;
    };

    // and similarly for the rest..

};

The above scheme results in having only as many copies of the iterators as there are unique iterations under way, which is technically optimal. But again, the ideal solution would be to make the iterator movable but not copyable, which is not possible for an STL-compliant iterator.

2) The dirty approach - Shallow-copies and shared state:

In many cases, it would work to have a shallow-copy iterator, something like this:

template <typename T, typename Compare = std::less<T> >
class perm_iter_ref {
  private:
    std::vector<T>* p_vect;  //<--  store pointer to the storage.
    Compare compare;
  public:
    typedef perm_iter_CoW<T,Compare> self;

    // constructor from a vector:
    perm_iter_CoW(std::vector<T>& v, Compare comp = Compare()) :
                  p_vect(&v), compare(comp) {
      std::sort(p_vect->begin(), p_vect->end(), compare);
    };

    // leave the Big Five as shallow (default).

    // increment operator:
    self& operator++() {
      if( p_vect && !std::next_permutation(p_vect->begin(), p_vect->end(), compare) )
        p_vect = NULL;
      return *this;
    };

    self operator++(int) {
      self result(*this);
      if( p_vect && !std::next_permutation(p_vect->begin(), p_vect->end(), compare) )
        p_vect = NULL;
      return result;
    };

    // and so on..
};

The above would work in many cases (certainly in the basic linear iteration case), but it's not semantically correct. For example, the post-increment operator has incorrect behavior because the iterator it returns is the original iterator, because its state is the state of the vector it points too which is the same as the one that was just moved to the next permutation. This is the problem with trying to avoid the copying in this case, if you don't do a copy (at least sometimes), you can't fulfill the requirements because you'll have to share state between iterators (original and copy).

3) The "maybe good" approach - Store the recipe:

Another approach that might be appropriate in some cases is to "store the recipe" to manufacture the result of the dereferencing of the iterator, and then simply change the recipe as you move the iterator. A typical example of that would be something like an "index iterator":

template <typename T>
class index_iterator {
  private:
    const std::vector<T>* p_vect;
    std::size_t i;

    index_iterator(const std::vector<T>* p, std::size_t I) : p_vect(p), i(I) { };
  public:
    typedef index_iterator<T> self;

    static self begin(const std::vector<T>* p) {
      return self(p,0);
    };
    static self end(const std::vector<T>* p) {
      return self(p,p->size());
    };

    T operator*() const { return (*p_vect)[i]; };

    self& operator++() { ++i; return *this; };

    // ... so on ...
};

This example is assinine, but the main point here is that you store the recipe (the index into the vector and a pointer to the (immutable) vector), and then you manufacture the result when dereferencing it (i.e., looking up the value in the vector). That's the approach in a nutshell. This approach could be appropriate if the recipe is rather cheap to manipulate and rather cheap to use to construct the final result. For example, it would not be appropriate for the permutation iterator that I showed in the first approach, because performing several permutations at each dereferencing (to get to the "current" permutation) would be much worse than having to do a few deep-copies here and there. But in other cases, like stride iterators, or views, it is much cheaper to use this "store the recipe" approach.

And what would the iterator "end()" be? begin() is easy.

If you have trouble with the "end" iterator, just define the end iterator as the "invalid" iterator. This is what is used for std::back_inserter for example. You can typically just invalidate the iterator somehow when you get to the end of the range (like in my permutation iterator examples above, where I invalidate the iterator when next_permutation return false).

What's a comparison between iterators? Would I need to compare the whole thing?

In general, iterators are not the same because their values are the same, so you shouldn't have to compare the "whole thing", just the state of the iterators, and in general, it isn't sufficient (i.e., although iterators that are the same have the same value, iterators that have the same value are not necessarily the same iterators (at the same positions)). This is where the copy-on-write technique above sort of breaks down a bit, because you can't directly check that two iterators are the same. In the "store the recipe" approach, this is much easier, you can just compare the recipe (not that the copy-on-write approach could also be solved by storing the recipe just for the purpose of comparison).

I hope that gives you enough food for thoughts. I guess another final point to make is that if you can't figure out a reasonable way to comply with the requirements of an STL iterator, you shouldn't try to, because it could have unforeseen consequences.

Wow, most of this is probably beyond my current C++ skill level... :) But I think I'll try to do something like the CoW approach you mentioned and see what problems I run into. I had never heard of move semantics until a few days ago when I finally decided to study up on C++11x

I'll read up on all the stuff you mention, but all of it seems harder than I expected. I thought I was just being silly, after all, I thought iterators were meant for doing exactly what I wanted to do: to go over every element of a "set" (in the mathematical sense) without ever needing to create the actual set in memory. But I guess not really.

Anyway, I think I figured out how to solve the other one of my problems: comparison (== and !=). The iterator can have a member variable of type long unsigned int (or anything big enough) and start out at 0. Then the end can be, in the case of the CombinationsIterator example, just (n choose r), and two iterators are "the same" if the index matches.

This article has been dead for over six months. Start a new discussion instead.