User Name Password Register
DaniWeb IT Discussion Community
All
What is DaniWeb IT Discussion Community?
You're currently browsing the C++ section within the Software Development category of DaniWeb, a massive community of 428,572 software developers, web developers, Internet marketers, and tech gurus who are all enthusiastic about making contacts, networking, and learning from each other. In fact, there are 4,611 IT professionals currently interacting right now! Registration is free, only takes a minute and lets you enjoy all of the interactive features of the site.
Please support our C++ advertiser: Programming Forums

C++ Performance Tips

Join Date: May 2006
Location: France, Normandy
Posts: 10
Reputation: SuperKoko is an unknown quantity at this point 
Rep Power: 3
Solved Threads: 0
SuperKoko SuperKoko is offline Offline
Newbie Poster

Re: C++ Performance Tips

  #27  
May 21st, 2007
Originally Posted by thekashyap
2. I'm sure most ppl know this but writing as it's not already mentioned in this topic so far. Use unsigned int stored in registers for loop variables.
Reasons
2. registers are registers ! They're simply faster than memory access.
Most compilers ignore the "register" keyword, because their optimzations are far more advanced than back in the time, where a variable had only one storage in its life...

But, the register keyword can reduce performances on some compilers, because they interpret it as a strong request, from the programmer, to use a register for storage of the variable, which condemns one register.

Even on very old compilers where the register keyword was a useful hint, it would have been utterly stupid to use the register keyword everywhere, as it would be as slow, or slower, than using no keyword at all, because the compiler would not be better than using its default "guessing" algorithm.
The register keyword should be used only at places, where you want to give a hint to the compiler, that THIS variable needs to be accessed faster than other ones, even if it reduces performances of other variables.

1. unsigned int arithmatic is faster than signed int.
Ridiculous.
On two's complement machines, additions, substractions and multiplications have identical performances as signed or unsigned, since this is exactly the same operation.

For comparisons, the performances are identical on x86 architectures, since the same operation is used for all comparisons (cmp).
On all other architectures, there is absolutely nothing that would add a performance penalty on signed comparisons, because integer comparisons require only very very few transitors... The comparison itself, cannot require more than one CPU cycle.

For integer division, signed integer division was a bit slower than unsigned integer division on old CPU.
But, integer divisions are very rarely time critical, because division is not a very much used operation.

3. Optimizing the breaking conditions in for loops. If you know that the loop variable's range is from 0 to some +ve number AND it doesn't matter which way you traverse while looping, you can optimize the loop like this:
Assuming that the compiler doesn't optimize it for you.
GCC 3.4.5 (-O2) for i386, is clever enough to optimize the first loop with a decrementation operation.
This is because, forward loops are very common, and GCC has specific optimizations for this type of code.
On the other hand, GCC 3.4.5 is not clever enough to produce good code for your "manually optimized" loop.

for( int i = 0; i <= 30; i++ );

Produces this assembly code (MinGW 3.4.5 for Win32 -O2):
0x401300 :     dec    eax
0x401301 :     jns    0x401300

But:

for( int i = 30; i--; );

Produces:
0x401300 :     dec    eax
0x401301 :     cmp    eax,0xffffffff
0x401304 :     jne    0x401300

Why this pessimization?

First, the following code:
0x401300 : dec eax
0x401301 : js 0x401300
Would immediately stop if eax is negative... But, the conditional expression is about the inequality with zero!
The compiler could have seen that all the values of i will be in range [0,30), but GCC is not clever enough to understand that, because you wrote a weird loop.

A good code, benefiting from the fact that "dec eax" sets the zf and sf flags, would require that you stop when i reaches zero, not -1.

Your code confuses the compiler, and is a pessimization.
So, don't do that!

Note:
for( int i = 31; --i; );
Generates the right code:
0x401300 :     dec    eax
0x401301 :     jns    0x401300
With MinGW 3.4.5. I don't claim that it's ok. It may be a pessimization with another compiler.

4, Optimizing very small loops using switch-case.
When you know that the range of loop variable's value is
pretty small avoid the loop altogether.
Or... activate the -funroll-loops option (or your compiler equivalent), which is less error prone.

Also, you should notice that, this should only be used on the most critical code, as it greatly increases the code size, which can have a serious performance penalty.

5. Another way of optimizing loops is to unroll it:
Knowing that printf is so slow that the loop code itself is negligible...

I benchmarked the two programs, using NUL as the standard output.
There is no sensible speed difference.
Both require 3900 milliseconds to execute on my computer.

A revelant piece of code of the two things:
; without manual loop unrolling
; Assume a K6-2 CPU
0x401323 <main+67>:     inc    ebx
0x401324 <main+68>:     push   0x403000 ;1
0x401329 <main+73>:     call   0x4018a0 <printf>
0x40132e <main+78>:     add    esp,0x10
0x401331 <main+81>:     cmp    ebx,0xc350 ;1
0x401337 <main+87>:     jl     0x401320 <main+64> ; 3

; -> 3 CPU cycles + approximatively 39600 CPU cycles for the printf call (if the output stream is the NUL device).

; with manual loop unrolling
0x40134d <main+109>:    pop    eax
0x40134e <main+110>:    lea    eax,[ebx+1] ;1
0x401351 <main+113>:    pop    edx ;2
0x401352 <main+114>:    push   eax ;3
0x401353 <main+115>:    push   0x403000 ;4
0x401358 <main+120>:    call   0x4019c0 <printf>

; -> 4 CPU cycles + approximatively 39600 CPU cycles for the printf call.

So, your "optimization", approximatively reduce performances by 0.0025%.
Which is negligible.
Reply With Quote  
All times are GMT -4. The time now is 11:17 am.
Forum system based on vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
©2003 - 2008 DaniWeb® LLC