User Name Password Register
DaniWeb IT Discussion Community
All
What is DaniWeb IT Discussion Community?
You're currently browsing the C++ section within the Software Development category of DaniWeb, a massive community of 374,019 software developers, web developers, Internet marketers, and tech gurus who are all enthusiastic about making contacts, networking, and learning from each other. In fact, there are 2,706 IT professionals currently interacting right now! Registration is free, only takes a minute and lets you enjoy all of the interactive features of the site.
Please support our C++ advertiser:
Views: 35034 | Replies: 40
Reply
Join Date: May 2006
Posts: 2,650
Reputation: WaltP is a name known to all WaltP is a name known to all WaltP is a name known to all WaltP is a name known to all WaltP is a name known to all WaltP is a name known to all 
Rep Power: 14
Solved Threads: 217
Moderator
WaltP's Avatar
WaltP WaltP is offline Offline
Posting Maven

Re: C++ Performance Tips

  #21  
Feb 21st, 2007
Originally Posted by thekashyap View Post
3. Optimizing the breaking conditions in for loops. If you know that the loop variable's range is from 0 to some +ve number AND it doesn't matter which way you traverse while looping, you can optimize the loop like this:

  1. //original loop
  2. for( int i = 0; i <= 30; i++ )
  3. {/*do your stuff*/}
  4. //optimized loop
  5. for( int i = 30; i--; )
  6. {/*do your stuff*/}

Except if your description is true, you'd be better off using
  1. i = 30;
  2. while (i--)
  3. {
  4. /*do your stuff*/
  5. }
I'm like a superhero, but without powers nor motivation.
Reply With Quote  
Join Date: Feb 2007
Location: Bangalore, India
Posts: 535
Reputation: thekashyap will become famous soon enough thekashyap will become famous soon enough 
Rep Power: 4
Solved Threads: 50
thekashyap's Avatar
thekashyap thekashyap is offline Offline
Posting Pro

Re: C++ Performance Tips

  #22  
Feb 21st, 2007
5. Another way of optimizing loops is to unroll it:
  1. //unoptimized loop
  2. int loop_count = 50000; /* could be anything */
  3. for( int j = 0; j < loop_count; j++ )
  4. printf("process(%d)\n", j);
  5.  
  6. //optimized one
  7. static int BLOCKSIZE = 8 ;
  8. /* The loop_count may not be divisible by BLOCKSIZE,
  9.   * go as near as we can first, then tidy up.
  10.   */
  11. int i = 0;
  12. int blocklimit = (loop_count / BLOCKSIZE) * BLOCKSIZE ;
  13.  
  14. /* unroll the loop in blocks of 8 */
  15. while( i < blocklimit )
  16. {
  17. printf("process(%d)\n", i);
  18. printf("process(%d)\n", i+1);
  19. printf("process(%d)\n", i+2);
  20. printf("process(%d)\n", i+3);
  21. printf("process(%d)\n", i+4);
  22. printf("process(%d)\n", i+5);
  23. printf("process(%d)\n", i+6);
  24. printf("process(%d)\n", i+7);
  25.  
  26. /* update the counter */
  27. i += 8;
  28.  
  29. }
  30.  
  31. //we already know how to optimize small loops.
  32. switch( loop_count - i )
  33. {
  34. case 7 : printf("process(%d)\n", i); i++;
  35. case 6 : printf("process(%d)\n", i); i++;
  36. case 5 : printf("process(%d)\n", i); i++;
  37. case 4 : printf("process(%d)\n", i); i++;
  38. case 3 : printf("process(%d)\n", i); i++;
  39. case 2 : printf("process(%d)\n", i); i++;
  40. case 1 : printf("process(%d)\n", i);
  41. }
Reply With Quote  
Join Date: Dec 2005
Posts: 3,040
Reputation: Salem has much to be proud of Salem has much to be proud of Salem has much to be proud of Salem has much to be proud of Salem has much to be proud of Salem has much to be proud of Salem has much to be proud of Salem has much to be proud of 
Rep Power: 19
Solved Threads: 343
Colleague
Salem's Avatar
Salem Salem is offline Offline
void main'ers are DOOMed

Re: C++ Performance Tips

  #23  
Feb 21st, 2007
> //A quicker method is to simply use the value as an
Except your other two methods never risk an out of bounds memory access.

> 1. unsigned int arithmatic is faster than signed int.
Where's your evidence?

> 2. registers are registers ! They're simply faster than memory access.
True, but any decent compiler nowadays is far more capable of deciding which variables would be best placed in registers.

> AND it doesn't matter which way you traverse while looping,
Well if you're using it for indexing an array, and your cache is optimised in favour of incremental access, then you lose badly.
I've only ever seen counting backwards to zero save ONE instruction on those machines which specifically have a 'test-and-branch' instruction.

> Another way of optimizing loops is to unroll it:
Or just use the gcc flag -funroll-loops
Better yet, bone up on all the magic which a modern compiler can do, which doesn't involve you mauling the code into an unreadable mess.
http://gcc.gnu.org/onlinedocs/gcc-4....timize-Options
If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
Reply With Quote  
Join Date: Feb 2007
Location: Bangalore, India
Posts: 535
Reputation: thekashyap will become famous soon enough thekashyap will become famous soon enough 
Rep Power: 4
Solved Threads: 50
thekashyap's Avatar
thekashyap thekashyap is offline Offline
Posting Pro

Question Re: C++ Performance Tips

  #24  
Feb 23rd, 2007
@WaltP
Except if your description is true, you'd be better off using
  1. i = 30;
  2. while (i--)
  3. {/*do your stuff*/}


>> Sorry but I fail to see the different between for and while loop.. Do you mean it's faster to use while instead of for?

@Salem
> //A quicker method is to simply use the value as an
Except your other two methods never risk an out of bounds memory access.
KashAI>> TRUE. So I hope that anyone able enough to understand this won't blindly copy my code.


> 1. unsigned int arithmatic is faster than signed int.
Where's your evidence?
KashAI>> I was afraid someone will ask. . Anyway, simple answer is I don't know.
But here is what I know:
1. In VS 6.0 (on Intel H/W) a simple for loop with loop variable being unsigned is about 2 seconds faster than when loop variable is signed int. (looped some 100K and 500K times to print the value of loop variable)
2. In most cases one can see that there are seperate assemply instructions for signed and unsigned arithmetic. Which at least indicates a difference in performance.
3. Number of flags applicable (CF=carry-over-flag, SG=sign-flag, OF=over
flow-flag) to signed and unsigned instructions' execution are different.
4. I'm vaguely remember an instruction called SBB (substract using borrow) which, if i'm not wrong, is only applicable to signed arithmetic.
And use of it is in case where the requested substraction of 2 signed numbers can not be completed with a single instruction due to register size.
> 2. registers are registers ! They're simply faster than memory access.
True, but any decent compiler nowadays is far more capable of deciding which variables would be best placed in registers.
KashAI>> So if I understand it right what I've written is correct, but not neccesarry.
> AND it doesn't matter which way you traverse while looping,
Well if you're using it for indexing an array, and your cache is optimised in favour of incremental access, then you lose badly.
I've only ever seen counting backwards to zero save ONE instruction on
those machines which specifically have a 'test-and-branch' instruction.
KashAI>> So in short, should one NOT optimize it this way? May be you could add some practical numbers for the benefit of readers which will help them in deciding whether to use this optimization or not?
E.g. "in 80% of cases cache is optimised in favour of incremental access" OR
"Now-a-days most machines support "
'test-and-branch' instruction".

> Another way of optimizing loops is to unroll it:
Or just use the gcc flag -funroll-loops

Better yet, bone up on all the magic which a modern compiler can do,
which doesn't involve you mauling the code into an unreadable mess.

http://gcc.gnu.org/onlinedocs/gcc-4....timize-Options

KashAI>> Now that is useful info. Just checked and found that VS 6.0 and Sun Workshop 6.0 also support lopp unrolling.

Reply With Quote  
Join Date: Feb 2007
Location: Bangalore, India
Posts: 535
Reputation: thekashyap will become famous soon enough thekashyap will become famous soon enough 
Rep Power: 4
Solved Threads: 50
thekashyap's Avatar
thekashyap thekashyap is offline Offline
Posting Pro

Re: C++ Performance Tips

  #25  
Feb 23rd, 2007
Sorry forgot one more thing regarding "1. unsigned int arithmatic is faster than signed int.
Where's your evidence?
"
See http://lkml.org/lkml/2006/3/20/385
Reply With Quote  
Join Date: Sep 2004
Posts: 3
Reputation: asaretitilope is an unknown quantity at this point 
Rep Power: 0
Solved Threads: 0
asaretitilope asaretitilope is offline Offline
Newbie Poster

Solution Re: C++ Performance Tips

  #26  
May 19th, 2007
When displaying output use printf every where posible over C++'s cout because of the of constructors to be initialized and series of cout class variable to be initialized and never ending list of inline codes to be executed . If you want to see all this details try using <trace into> tool that ships with your IDE compiler and you will see what i mean.
Reply With Quote  
Join Date: May 2006
Location: France, Normandy
Posts: 10
Reputation: SuperKoko is an unknown quantity at this point 
Rep Power: 3
Solved Threads: 0
SuperKoko SuperKoko is offline Offline
Newbie Poster

Re: C++ Performance Tips

  #27  
May 21st, 2007
Originally Posted by thekashyap
2. I'm sure most ppl know this but writing as it's not already mentioned in this topic so far. Use unsigned int stored in registers for loop variables.
Reasons
2. registers are registers ! They're simply faster than memory access.
Most compilers ignore the "register" keyword, because their optimzations are far more advanced than back in the time, where a variable had only one storage in its life...

But, the register keyword can reduce performances on some compilers, because they interpret it as a strong request, from the programmer, to use a register for storage of the variable, which condemns one register.

Even on very old compilers where the register keyword was a useful hint, it would have been utterly stupid to use the register keyword everywhere, as it would be as slow, or slower, than using no keyword at all, because the compiler would not be better than using its default "guessing" algorithm.
The register keyword should be used only at places, where you want to give a hint to the compiler, that THIS variable needs to be accessed faster than other ones, even if it reduces performances of other variables.

1. unsigned int arithmatic is faster than signed int.
Ridiculous.
On two's complement machines, additions, substractions and multiplications have identical performances as signed or unsigned, since this is exactly the same operation.

For comparisons, the performances are identical on x86 architectures, since the same operation is used for all comparisons (cmp).
On all other architectures, there is absolutely nothing that would add a performance penalty on signed comparisons, because integer comparisons require only very very few transitors... The comparison itself, cannot require more than one CPU cycle.

For integer division, signed integer division was a bit slower than unsigned integer division on old CPU.
But, integer divisions are very rarely time critical, because division is not a very much used operation.

3. Optimizing the breaking conditions in for loops. If you know that the loop variable's range is from 0 to some +ve number AND it doesn't matter which way you traverse while looping, you can optimize the loop like this:
Assuming that the compiler doesn't optimize it for you.
GCC 3.4.5 (-O2) for i386, is clever enough to optimize the first loop with a decrementation operation.
This is because, forward loops are very common, and GCC has specific optimizations for this type of code.
On the other hand, GCC 3.4.5 is not clever enough to produce good code for your "manually optimized" loop.

for( int i = 0; i <= 30; i++ );

Produces this assembly code (MinGW 3.4.5 for Win32 -O2):
0x401300 :     dec    eax
0x401301 :     jns    0x401300

But:

for( int i = 30; i--; );

Produces:
0x401300 :     dec    eax
0x401301 :     cmp    eax,0xffffffff
0x401304 :     jne    0x401300

Why this pessimization?

First, the following code:
0x401300 : dec eax
0x401301 : js 0x401300
Would immediately stop if eax is negative... But, the conditional expression is about the inequality with zero!
The compiler could have seen that all the values of i will be in range [0,30), but GCC is not clever enough to understand that, because you wrote a weird loop.

A good code, benefiting from the fact that "dec eax" sets the zf and sf flags, would require that you stop when i reaches zero, not -1.

Your code confuses the compiler, and is a pessimization.
So, don't do that!

Note:
for( int i = 31; --i; );
Generates the right code:
0x401300 :     dec    eax
0x401301 :     jns    0x401300
With MinGW 3.4.5. I don't claim that it's ok. It may be a pessimization with another compiler.

4, Optimizing very small loops using switch-case.
When you know that the range of loop variable's value is
pretty small avoid the loop altogether.
Or... activate the -funroll-loops option (or your compiler equivalent), which is less error prone.

Also, you should notice that, this should only be used on the most critical code, as it greatly increases the code size, which can have a serious performance penalty.

5. Another way of optimizing loops is to unroll it:
Knowing that printf is so slow that the loop code itself is negligible...

I benchmarked the two programs, using NUL as the standard output.
There is no sensible speed difference.
Both require 3900 milliseconds to execute on my computer.

A revelant piece of code of the two things:
; without manual loop unrolling
; Assume a K6-2 CPU
0x401323 <main+67>:     inc    ebx
0x401324 <main+68>:     push   0x403000 ;1
0x401329 <main+73>:     call   0x4018a0 <printf>
0x40132e <main+78>:     add    esp,0x10
0x401331 <main+81>:     cmp    ebx,0xc350 ;1
0x401337 <main+87>:     jl     0x401320 <main+64> ; 3

; -> 3 CPU cycles + approximatively 39600 CPU cycles for the printf call (if the output stream is the NUL device).

; with manual loop unrolling
0x40134d <main+109>:    pop    eax
0x40134e <main+110>:    lea    eax,[ebx+1] ;1
0x401351 <main+113>:    pop    edx ;2
0x401352 <main+114>:    push   eax ;3
0x401353 <main+115>:    push   0x403000 ;4
0x401358 <main+120>:    call   0x4019c0 <printf>

; -> 4 CPU cycles + approximatively 39600 CPU cycles for the printf call.

So, your "optimization", approximatively reduce performances by 0.0025%.
Which is negligible.
Reply With Quote  
Join Date: May 2006
Location: France, Normandy
Posts: 10
Reputation: SuperKoko is an unknown quantity at this point 
Rep Power: 3
Solved Threads: 0
SuperKoko SuperKoko is offline Offline
Newbie Poster

Re: C++ Performance Tips

  #28  
May 21st, 2007
When displaying output use printf every where posible over C++'s cout because of the of constructors to be initialized and series of cout class variable to be initialized and never ending list of inline codes to be executed . If you want to see all this details try using <trace into> tool that ships with your IDE compiler and you will see what i mean.

You should not do that before checking whether printf is slower or faster than cout.
This is very highly compiler dependent.
For example, Borland C++ 5.0 is faster for cout than for printf.

Check my answer on this thread:
http://www.codeguru.com/forum/showthread.php?t=383112

And, don't forget that micro-optimizations are the ennemy of real optimizations.
Reply With Quote  
Join Date: Apr 2007
Posts: 6
Reputation: kmillen is an unknown quantity at this point 
Rep Power: 0
Solved Threads: 0
kmillen kmillen is offline Offline
Newbie Poster

Solution Re: C++ Performance Tips

  #29  
Jun 22nd, 2007
A performance tip that has served me well with C++, C#, Java, VB, and scripting languages is counting backwards. When using a comparison in a loop (presumably for termination purposes), if possible, count down to zero instead of up to a non-zero value.

Since every machine language in existence has "compare to zero" operators, smart compilers can utilize this efficiency.

Normally, the compiler saves the comparison value in memory and then accesses it indirectly for each loop. However, many compilers will recognize the comparison to zero and optimize it with a single machine language command, bne, bge, etc..

In complex, nested loops, the optimization can be amazing. The main down fall I've observed is compiler consistency. I have not observed .NET's IL doing this and it doesn't always do this when going native.
Reply With Quote  
Join Date: Jul 2007
Posts: 2
Reputation: yumvpvip is an unknown quantity at this point 
Rep Power: 0
Solved Threads: 0
yumvpvip yumvpvip is offline Offline
Newbie Poster

Re: C++ Performance Tips

  #30  
Jul 11th, 2007
Originally Posted by bitforce View Post
An easy way to swap 2 variables without using another variable:

a=a+b;
b=a-b;
a=a-b;



I don't think so. the result will be the fllowigs:
a=b;
b=0;
I am a beginner please give me some advice
Reply With Quote  
Reply

Only community members can participate in forum threads. You must register or log in to contribute.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 

DaniWeb C++ Marketplace
Thread Tools Display Modes

Similar Threads
Other Threads in the C++ Forum

All times are GMT -4. The time now is 11:07 pm.
Forum system based on vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
©2003 - 2008 DaniWeb® LLC