Fast insertion into a sorted list

Question

dougy83 74 Posting Whiz in Training

11 Years Ago

Hi all,

Does anyone know how to quickly (i.e. O(log n) time) insert items into a sorted list? I'd prefer an existing working function if available; I didn't see any appropriate ones in boost or stl.

Some background: I'd like to keep a constant sized sorted list, and insert items into the list while maintaining order. The item at the tail of the list is discarded after every insertion to maintain a constant size.

This is very easy to do in C# using the SortedList container class, but I need to use C++. Number of items will be a few thousand, 16 bytes of data per item.

Thanks for any leads or help!
-Doug

c++

Edited 11 Years Ago by dougy83

2 Contributors
2 Replies
13K Views
3 Days Discussion Span
Latest Post 11 Years Ago Latest Post by dougy83

mike_2000_17 2,669 21st Century Viking

11 Years Ago

There are a number of ways to do it. Here are some options:

1) Use the C++ counterpart to "SortedList" which is called std::set (see docs). This is a container of elements that are sorted. However, just like SortedList, this is implemented using a binary search tree (some kind of red-black tree) that uses "links" between nodes. For storing elements of 16 bytes each (probably a POD-type), and when you only have a few thousand of them, this solution will be very far from optimal in terms of performance. Probably about as good as the C# version, which is terrible by C++ standards.

2) Keep a sorted array (vector) and insert elements into it. Something like this:

void addElementToVector(std::vector<T>& v, T elem, std::size_t max_size) {
  if( ( max_size <= v.size() ) && ( elem >= v.back() ) )
    return; // nothing to do.

  if( max_size > v.size() ) { // if empty room in v
    // add room at the end:
    v.push_back(elem);
  };
  // find the position for the element
  auto pos = std::upper_bound(v.begin(), v.end()-1, elem);
  // and move the array around it:
  std::move_backward(pos, v.end()-1, pos+1);
  // and set the new element:
  *pos = elem;
};

You can look at the docs for upper_bound and move_backward to figure out how this works. It's a simple matter of finding the proper place to put the new element within the existing array, and then move the sub-array by one position such that the new element can be placed where it needs to be. The complexity of this operation is O(log(N)) to find the insertion position, and O(N) to rotate the elements of the array. Before you say that this is inefficient, because of the copying (rotation), I must say that a memory rotation for a simple type (e.g., 16 bytes, POD-type) is going to be very efficient (because of cache architectures and because it optimizes to a "block" operation).

3) Use a heap instead of a sorted array. This problem that you are describing (i.e., keeping track of N amount of elements of the smallest values (or nearest distance, whatever..)) is a classic problem which is generally solved with a heap array. A heap is simpler (and thus, more efficient) than a sorted array, with the limitation that you can only know about the "maximum" value of the heap. For this problem, this is all you need, because the maximum value is needed to determine if the new element needs to be added to the array or not. In C++, you can just use the std::priority_queue class for this (or the underlying functions, std::make_heap, std::push_heap and std::pop_heap) to do this:

void addElementToQueue(std::priority_queue<T>& p, T elem, std::size_t max_size) {
  if( ( max_size <= p.size() ) && ( elem >= p.top() ) )
    return; // nothing to do.
  p.push(elem);
  if( max_size < v.size() )
    p.pop();
};

// or, with a "heap" vector:

void addElementToQueue(std::vector<T>& h, T elem, std::size_t max_size) {
  if( ( max_size <= h.size() ) && ( elem >= h[0] ) )
    return; // nothing to do.
  h.push_back(elem);
  std::push_heap(h.begin(), h.end());
  if( max_size < h.size() ) {
    std::pop_heap(h.begin(), h.end());
    h.pop_back();
  };
};

In both cases (which boil down to the exact same code), the complexity is O(log(N)) because that is the complexity of the push/pop functions on a heap. And this will cause no block copying because both push and pop only add or erase an element at the end of the vector.

The advantage of using a vector instead of the priority_queue is because if you want to retrieve, at the end of your algorithm, a sorted array of all the collected elements, you can simply do:

std::sort_heap(v.begin(), v.end());

which will turn the heap into a sorted array (from min to max).

This solution is by far the best, if it is sufficient for your problem. If you really need to keep a sorted array the whole time, you'll have to use either solution 1 or 2, depending on which performs better in your particular case (test them both!).

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

dougy83 74 Posting Whiz in Training · Answer 1 · 2013-12-22T08:12:06+00:00

Wow, thanks for the very detailed response! I wasn't aware of most of the STL classes and methods you mentioned. I would have replied earlier, but unfortunately I didn't click on the "watch this article" button.

I implemented the second method you showed, but using a prefilled vector of numeric_limits<double>::max(), and using a non-STL binary search and memcpy to move the elements before insertion. Using your STL code listing 2 is a little faster.

Line 12 of the first listing should be std::move_backward(pos, v.end()-1, v.end()); for correct operation.

The execution times are shown in the attached plot, and as you said, the heap is the best. I'll have to check out how the heap actually works.

Thanks very much for you time and great response.
-Doug
e10504a3f0fb187756102761dfa2af42