954,499 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Unrolling made my sort slow.

I recently unrolled an insertion sort and on 60 items it was twice as slow as the looped version. I'm guessing that's because it was too big for the instruction cache, being 1032 KB. First, does that seem a likely explanation? If so, to avoid that problem, do I just need to find the instruction cache size of my Intel Core Duos and not go over it, or is there a bit more to it?
I've read a bit about data caches but everything I've googled up about instruction cache is either a paper (expensive), a patent or I can't understand it.
I'm mostly hoping to learn, also hoping to write a fast sort for short lists to finish off a quicksort.
Thanks for any help.

laehc
Newbie Poster
11 posts since Mar 2010
Reputation Points: 10
Solved Threads: 0
 

The cache works better when you HAVE loops, not when you don't.

If all you've got is straight-line code, then the very fast processor is essentially bypassing the cache and it is stuck waiting for slow memory to deliver the next instruction.

A populated cache on the other hand can keep up. Coupled with branch prediction (or speculative execution), a jump from one cache location to another is either a small overhead (or free).

> I'm guessing that's because it was too big for the instruction cache
For 60 elements - I doubt it.
Most tool chains have the ability to tell you how much code a single function occupies at the assembler level (try the map file).

Salem
Posting Sage
Team Colleague
11,531 posts since Dec 2005
Reputation Points: 5,862
Solved Threads: 953
 
I recently unrolled an insertion sort and on 60 items it was twice as slow as the looped version. I'm guessing that's because it was too big for the instruction cache, being 1032 KB.

The correct question would be "it was too big for the instruction cacheline". I don't know how large is the cache line on your system, but usually it is no more than 512 bytes. It is quite reasonable to expect that an unrolled code doesn't fit a cache line, and suffers all the consequences of extra cache misses, as Salem already explained.

nezachem
Posting Shark
903 posts since Dec 2009
Reputation Points: 719
Solved Threads: 194
 

Thanks Salem and nezachem, that's the info I wanted.

laehc
Newbie Poster
11 posts since Mar 2010
Reputation Points: 10
Solved Threads: 0
 

This question has already been solved

Post: Markdown Syntax: Formatting Help
You