This program does not crash but behaves badly due to not allocating enough space for vector z in line 21. If I comment out line 21 and instead uncomment line 23, things work well. If I comment out both lines 21 and 23, the program will crash, presumably because I am writing to a vector with no space allocated. For the program to work well when resizing on line 37, z has to have a size of at least 3. I assume that I am just getting lucky as well. Potentially only allocating z to be size 1 could make the program crash too. My understanding of STL is that the programmer is responsible for making sure that the size of the z vector is adequate when the program gets to line 27. No size check is done because checking sizes slows performance. My question is "Can I change the code so that sizes are checked or at least an exception is thrown that can be caught so the program does not crash? If so, how?".

Program output is as follows...

Before resizing-iterator:1: 1 2 3
Before resizing-z.end():1: 1
After resizing:3: 1 0 0

The first line gives the correct union, but proves that the set_union function did not resize the z vector from 1 to 3. Line 2 proves that the iterator returned is BEYOND the end of the vector. Line 3 shows that the resizing threw away the 2 and the 3.

#include <iostream>
#include <algorithm>
#include <vector>
using namespace std;

void print_int(int x)
    cout << " " << x;

int main()
    vector<int> x;
    vector<int> y;
    vector<int> z;
    z.resize(1);  // if this line is commented out, program will crash.  I am
                  // ASSUMING that it COULD crash with a size of 1.
    //z.resize(x.size() + y.size()); // probably the better way to do it

    vector<int>::iterator it;

    it = set_union(x.begin(), x.end(), y.begin(),
        y.end(), z.begin());

    cout << "Before resizing-iterator:" << z.size() << ":";
    for_each(z.begin(), it, print_int);
    cout << endl;
    cout << "Before resizing-z.end():" << z.size() << ":";
    for_each(z.begin(), z.end(), print_int);
    cout << endl;

    z.resize(it - z.begin());
    cout << "After resizing:" << z.size() << ":";
    for_each(z.begin(), z.end(), print_int);
    cout << endl;
    return 0;

Edited by VernonDozier: Fix line numbers and typo

4 Years
Discussion Span
Last Post by VernonDozier

set_union() assumes that the output iterator points to a collection with enough existing space to hold all of the results. If you're planning on using an empty vector or have set_union() grow the vector then you'll want to do it through a back inserter.


You can assume, for all practical matter, that the std::vector class just stores two fields: the size of the array and a pointer to the first element of the array. In other words, this is the same as what you would have to do if you allocated the dynamic array yourself with new[]. When you have a fresh vector (empty), the internal pointer is most likely set to NULL. You can also assume that the iterator type std::vector<int>::iterator is the same as int* (e.g., a pointer). So, when you call begin(), it will, in effect, just give you the pointer to the start of the array (the pointer that the vector stores internally). So, on a fresh and empty vector, that pointer is NULL (and so will be the iterator at begin()), and dereferencing it is going to cause a crash (access violation, segmentation fault). This explains the crash if you have both lines 21 and 23 commented out.

If you have resized the vector, then the internal pointer is no-longer NULL, which explains the non-crash behavior when using line 21 or 23. If you allocate space for 1 value, but end up writing 3 elements (via an iterator), you will be writing beyond the memory that was allocated (or allowed), this may or may not result in a crash, but in either case, it is bad. The reason why you get (1,0,0) after you resize the vector is because resizing from 1 to 3 adds two elements which are initialized to 0 (overwriting your previous values).

Using line 23 is indeed correct, but wasteful because the final vector might not need to be as big as x.size() + y.size(). So, the pertinent question is: how do you add only the set-union values without having to over-resize the destination vector ahead of time? The answer is: std::back_inserter.

The back-insert-iterator is an iterator that can be used to push_back() elements on an STL container. In other words, if it is a back-insert-iterator to vector v, doing *it++ = some_value; is equivalent to doing v.push_back(some_value);. So, you can replace line 25 to 28 with this:

set_union(x.begin(), x.end(), y.begin(), y.end(), back_inserter(z));

And with this, you don't need to initialize the size of vector z.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.