The cache works better when you HAVE loops, not when you don't.
If all you've got is straight-line code, then the very fast processor is essentially bypassing the cache and it is stuck waiting for slow memory to deliver the next instruction.
A populated cache on the other hand can keep up. Coupled with branch prediction (or speculative execution), a jump from one cache location to another is either a small overhead (or free).
> I'm guessing that's because it was too big for the instruction cache
For 60 elements - I doubt it.
Most tool chains have the ability to tell you how much code a single function occupies at the assembler level (try the map file).
Salem
Posting Sage
11,531 posts since Dec 2005
Reputation Points: 5,862
Solved Threads: 953