Quickselect, based on quicksort, and counting select, based on counting sort.

Each is capable of finding the kth smallest element in an unsorted/sorted list/array.

This is some example code for a QuickSelect algorithm. This doesn't include the partition function.

``````// return the kth smallest item
int quickSelect(int items[], int first, int last, int k) {
int pivot = partition(items, first, last);
if (k < pivot-first) {
return quickSelect(items, first, pivot, k);
} else if (k > pivot) {
return quickSelect(items, pivot+1, last, k-pivot);
} else {
return items[k];
}
}
``````

Counting select, based on the counting sort algorithm, looks something like this.

``````// return the kth smallest item
int countingSelect(int items[], int first, int last, int k) {
int counts[cap];
for (int c = 0; c < cap; c++) {
counts[c] = 0;
}
for (int i = first; i < last; i++) {
counts[items[i]] += 1;
}
int c = 0;
while (k >= 0) {
k -= counts[c++];
}
return c-1;
}
``````

I wish to write a set of guidelines to decide which of the two algorithms is most appropriate for specific situations or data sets. I need to think of situations, and than implement guidelines as to which algorithm is the better choice.

For this, I need a little help distinguishing which algorithm has specific strengths in certain areas/data sets and varieties, etc.

2
Contributors
1
6
Views
6 Years
Discussion Span
Last Post by mike_2000_17

The guidelines for the two algorithms are basically the same as for their respective mother-algorithms, i.e., the sorting algorithms.

The counting-selection algorithm is only applicable to a very tightly packed integer-valued array. What I mean by that is that the bound (`cap` in your function) has to be very small, underwise the memory consumption of the algorithm is really unacceptable. You see, the theoretical best algorithm for such a selection problem requires O(n) time and O(1) memory. The counting selection requires O(n + m) time and O(m) memory, where m is the maximum possible value in the array, this is significantly worse than the theoretically achievable performance. So, unless m is really small compared to n, it makes no sense to use this algorithm. And because you should be able to get O(n) time and O(1) memory for this problem, there is no reason to use this algorithm in any case, at least, from a theoretical point-of-view. If you do some comparative performance tests to show that for certain small array sizes and small bound on the values, that the counting-selection algorithm has sufficiently small constant-factor to make it faster than others in that regime, than that is your guideline.

As for the quick-selection algorithm, it is quite a bit better, with O(n) best-case time and O(1) memory, but it does have O(n^2) worse-case time. This algorithm applies in the general case (values don't have to be integers either) but it is not guaranteed to be fast in all cases, and that can be a problem. This is why the C++ standard library (and the C standard library, and standard libraries in many other languages) don't implement the quick-sort algorithm in its basic form, they mostly use intro-sort or some other hybrid sorting algorithm. You might consider doing the same thing, like implementing intro-select (the selection version of intro-sort). But surely, your quickselect algorithm is likely to perform better than your counting-selection algorithm in all but pathological cases.

As a final note, for a professional implementation, may I suggest that you do things in the style of STL algorithms, and, don't use recursion when it is not necessary:

``````// General quick-selection algorithm on a random-access range.
template <typename RandomAccessIter, typename CompareFnc>
typename std::iterator_traits<RandomAccessIter>::const_reference
quick_select(RandomAccessIter first, RandomAccessIter last, int k,
CompareFnc compare) {
RandomAccessIter k_it = first + k;
while(true) {
RandomAccessIter pivot = partition(first, last, compare);
if (k_it < pivot)
last = pivot;
else if (k_it > pivot)
first = pivot + 1;
else
return *k_it;
};
};

// The quick-selection algorithm with default less-than comparison on a random-access range.
template <typename RandomAccessIter>
typename std::iterator_traits<RandomAccessIter>::const_reference
quick_select(RandomAccessIter first, RandomAccessIter last, int k) {
return quick_select(first, last, k,
std::less<typename std::iterator_traits<RandomAccessIter>::value_type>());
};
``````