I recently unrolled an insertion sort and on 60 items it was twice as slow as the looped version. I'm guessing that's because it was too big for the instruction cache, being 1032 KB. First, does that seem a likely explanation? If so, to avoid that problem, do I just need to find the instruction cache size of my Intel Core Duos and not go over it, or is there a bit more to it?
I've read a bit about data caches but everything I've googled up about instruction cache is either a paper (expensive), a patent or I can't understand it.
I'm mostly hoping to learn, also hoping to write a fast sort for short lists to finish off a quicksort.
Thanks for any help.