Hi. I'm hoping someone here is familiar with x86 optimization, because I'm stumped.
It seems that no matter what I do to the assembly version, the C version is always faster after optimization by GCC. I'd like to understand how GCC generates code that's so much faster, but I have no idea where to begin. Any help would be appreciated!