Hi. I'm hoping someone here is familiar with x86 optimization, because I'm stumped.

I wrote two programs that find prime numbers in the same way, one in assembly and one in C.

It seems that no matter what I do to the assembly version, the C version is always faster after optimization by GCC. I'd like to understand how GCC generates code that's so much faster, but I have no idea where to begin. Any help would be appreciated!