Recently we have been asked to create a hybrid sort based upon quicksorting down to a certain point and insertion sorting from then on. We are to calculate the efficency based upon previous tests. I had very little problems implementing a quicksort algorithm, but for the life of me, I can't get anything to work on the hybrid version. Can anyone help, provide tips or pointers?

Recently we have been asked to create a hybrid sort based upon quicksorting down to a certain point and insertion sorting from then on. We are to calculate the efficency based upon previous tests. I had very little problems implementing a quicksort algorithm, but for the life of me, I can't get anything to work on the hybrid version. Can anyone help, provide tips or pointers?

I'd love to help but could you please be more informative? What do you mean by "I can't get anything to work"? How many passes of quicksort do you do before you do insertion sort?

data := "5 2 4 7 3 1 6 8"
left := 0
right := 7
pivot := data[0] = 5

partition(data, left, right, pivot)

data := "2 4 3 1 5 7 6 8"
left := 0
right := 7
pivot : = data[4] = 5

quick_sort(data, left, right, pivot) {
  if left < right { 
    if(threshold(...)) {
      insertion_sort(data)  
    } else {
      pivot := partition(data, left, right, pivot)
      quicksort(data, left, pivot - 1)
      quicksort(data, pivot + 1, right)
    }
  }
}

insertion_sort(reference pointer to data) {
 ....
}

Sorry, I have to get back to work. If you could please post code, I can take a look later and help you out more.

I always find it instructive to see what others have done, and then apply my own style and ideas to the problem. In the case of a Quicksort/Insertion sort hybrid, you could look in Google for some other folk's source code, or if you have VC++, you can look at the source code for their 'qsort()' function. It is not recursive and therefore a bit ugly, but it works great. They've got tuning variables in there for figuring out at what point the insertion sort kicks in.

Like I say, don't copy their code but learn from it.

void quickSort(int numbers[], int array_size)
{
q_sort(numbers, 0, array_size - 1);
}


void q_sort(int numbers[], int left, int right)
{
int pivot, l_hold, r_hold;

l_hold = left;
r_hold = right;
pivot = numbers;
while (left < right)
{
while ((numbers >= pivot) && (left < right))
right--;
if (left != right)
{
numbers = numbers;
left++;
}
while ((numbers <= pivot) && (left < right))
left++;
if (left != right)
{
numbers = numbers;
right--;
}
}
numbers = pivot;
pivot = left;
left = l_hold;
right = r_hold;
if (left < pivot)
q_sort(numbers, left, pivot-1);
if (right > pivot)
q_sort(numbers, pivot+1, right);
}

This is code I used for Left most sorting an array of randomly generated data integer elements.

Basically I want, the program to recurse to the point where insertion sort is more optimal than quicksort...and I honestly don't even have a clue where to start.

Assuming you have an insertion sort routine, try this modification to your code:

void q_sort(int numbers[], int left, int right)
{
    int pivot, l_hold, r_hold;

        // If there are few records left to look at, use an insertion sort
        //   because it is faster.
    if ((right - left) < INSERTION_SORT_THRESHOLD)
    {    
        InsertionSort( numbers, left, right );
        return;
    }
    l_hold = left;
    r_hold = right;
. . .

and then just tinker with INSERTION_SORT_THRESHOLD's value to see what happens. For example, make it zero to disable the insertion sort, make it huge to ONLY do an insertion sort, and then try something like 5 or 10.

On Windows, you can use GetTickCount() to return the time since boot in miliseconds (mod 32 bits), and that can make for a simple elapsed time counter:

void quickSort(int numbers[], int array_size)
{
    DWORD startTime = GetTickCount();
    q_sort(numbers, 0, array_size - 1);
    printf( "Elapsed time:%d ms\n", GetTickCount() - startTime );
}

This isn't perfect for production code because the tick count will flip over eventually, but for your testing on where to put the insertion sort threshold this should do fine.

>I can't get anything to work on the hybrid version. Can anyone help, provide tips or pointers?
Oddly enough, I posted the full implementation for an optimized quicksort in the code snippets on this site. One of the optimizations is terminating recursion on small subfiles and then calling insertion sort on the "almost" sorted file. An alternative is to use insertion sort during recursion, but that typically has less desirable properties than waiting until after the recursive steps. Because you're testing performance, it couldn't hurt to implement both and see what you get. :)

I also give examples of two other improvements to the basic algorithm, one common and the other not so common because it's rather difficult and not well known.

>to recurse to the point where insertion sort is more optimal than quicksort
This really needs to be tweaked by the implementor, but a cutoff somewhere between 5 and 25 will work nicely in general and that's where you usually find yourself setting it.

One interesting way to figure out optimizations is through experimentation. You can time each pass for each implementation, record results and then plot them side by side on a graph. You can make adjustments accordingly to the insertion sort version's threshold to see if there's improvements.

One interesting way to figure out optimizations is through experimentation. You can time each pass for each implementation, record results and then plot them side by side on a graph. You can make adjustments accordingly to the insertion sort version's threshold to see if there's improvements.

Thats what I did, I found that the threshold value for the size of arrays I was dealing with would show increased efficiency at around 25 and lower.

>I can't get anything to work on the hybrid version. Can anyone help, provide tips or pointers?
Oddly enough, I posted the full implementation for an optimized quicksort in the code snippets on this site. One of the optimizations is terminating recursion on small subfiles and then calling insertion sort on the "almost" sorted file. An alternative is to use insertion sort during recursion, but that typically has less desirable properties than waiting until after the recursive steps. Because you're testing performance, it couldn't hurt to implement both and see what you get. :)

I also give examples of two other improvements to the basic algorithm, one common and the other not so common because it's rather difficult and not well known.

>to recurse to the point where insertion sort is more optimal than quicksort
This really needs to be tweaked by the implementor, but a cutoff somewhere between 5 and 25 will work nicely in general and that's where you usually find yourself setting it.

Narue,
I checked your excellent sorting snippet, and have a question. Is there a way to time this, milliseconds is much to slow? I know one could increase the size of the array, but that's not challenging. Something like a microsecond or nanosecond timer would help.

Narue,
I checked your excellent sorting snippet, and have a question. Is there a way to time this, milliseconds is much to slow? I know one could increase the size of the array, but that's not challenging. Something like a microsecond or nanosecond timer would help.

struct timeval

>I know one could increase the size of the array, but that's not challenging
Not challenging, but certainly informative when comparing methods. Because quicksort's speed isn't noticeable until the array gets very large, you won't see much difference between sorting methods no matter what kind resolution your timer is. It's best to use an array size that would be realistic for your application and span the timing across multiple calls so that you can get an average time distribution for the algorithm.

And if you're using my quicksort for 1000 items or less, that's what we in the field call "overkill". Try shellsort instead. :D

Since you asked, Windows and other OS' may have a higher resolution timer. On Windows, check out:

EngQueryPerformanceCounter()
and
EngQueryPerformanceFrequency()

The other issue about performance is: how much work do you do in your compare proc? I.e. if the compare is trivial, like an int compare, then what you are timing is the speed of your algorithm, and that's where simpler sorts can shine.

If your compare is complicated, or slow (read two records and compare some field, or do a international string compare on similar strings, ...), then the complexity of the sort algorithm is dwarfed by the compare time, and THAT's where all this fancy hybrid stuff will pay off.

>I know one could increase the size of the array, but that's not challenging
Not challenging, but certainly informative when comparing methods. Because quicksort's speed isn't noticeable until the array gets very large, you won't see much difference between sorting methods no matter what kind resolution your timer is. It's best to use an array size that would be realistic for your application and span the timing across multiple calls so that you can get an average time distribution for the algorithm.

And if you're using my quicksort for 1000 items or less, that's what we in the field call "overkill". Try shellsort instead. :D

Also, it's good to input different values of N (a scale of values from low to high) to prove the running time is correct. The improvement would be be seen over a log / N graph. You should also do the best and worse cases for the quicksort as well (with both implementations). If you're using *NIX like any respectable code monkey (hehe, jk),

#include <time.h> and check out struct timeval

This question has already been answered. Start a new discussion instead.