| | |
code optimization ...
Please support our C++ advertiser: Intel Parallel Studio Home
![]() |
•
•
•
•
I've never heard of using unsigned being "optimized." How's that one work? The inline and register keywords are (anymore) just hints to the compiler, it'll usually do what it thinks is best unless you give it specific flags.
>>How does it work?
Just use unsigned instead of signed int to declare a variable when you know that variable isn't supposed to have singed value. Most common would be:
C++ Syntax (Toggle Plain Text)
vector<int> v = getSomeVec() ; for( unsigned i = 0; i < v.size(); i++ ) cout << v[i] << endl ;
How is it faster?
>> Here is what I wrote in teh other thread (which remains uncontested). So that's my proof until proven otherwise.

----------------------------------------------
> 1. unsigned int arithmatic is faster than signed int.
Where's your evidence?
KashAI>> I was afraid someone will ask.
But here is what I know:
1. In VS 6.0 (on Intel H/W) a simple for loop with loop variable being unsigned is about 2 seconds faster than when loop variable is signed int. (looped some 100K and 500K times to print the value of loop variable)
2. In most cases one can see that there are seperate assemply instructions for signed and unsigned arithmetic. Which at least indicates a difference in performance.
3. Number of flags applicable (CF=carry-over-flag, SG=sign-flag, OF=overflow-flag) to signed and unsigned instructions' execution are different.
4. I'm vaguely remember an instruction called SBB (substract using borrow) which, if i'm not wrong, is only applicable to signed arithmetic.
And use of it is in case where the requested substraction of 2 signed numbers can not be completed with a single instruction due to register size.
Sorry forgot one more thing regarding "1. unsigned int arithmatic is faster than signed int.
Where's your evidence?"
See http://lkml.org/lkml/2006/3/20/385
----------------------------------------------
Additionally:
1. Someone siad signed/unsigned use same instructions, that's not true.
2. By "data caching" I meant if you are using ANY kind of data (viz not constant and used in multiple places) keep it in global (or member or static) variables so you initialize it only once and use it. This is a very vast topic and applies case by case, I'll just quote 2 common examples:
- When you've to read some data from a file, read it once and store in some container (map/vector) and look it up instead of opening and searching the file. (IO operations are costlier from perf. pov than memory access).
- Make constants used inside functions static: E.g.
This is better than:
C++ Syntax (Toggle Plain Text)
void my_class::my_func() { static const char f_name[] = "my_class::my_func()" ; traceObj.write("%s: Entering", f_name ) ; }
C++ Syntax (Toggle Plain Text)
void my_class::my_func() { const char f_name[] = "my_class::my_func()" ; traceObj.write("%s: Entering", f_name ) ; }
Even better is this (but you might wanna use f_name for something else as well):
C++ Syntax (Toggle Plain Text)
void my_class::my_func() { traceObj.write("my_class::my_func(): Entering" ) ; }
Last edited by thekashyap; Apr 27th, 2007 at 8:18 am.
•
•
Join Date: Sep 2006
Posts: 21
Reputation:
Solved Threads: 0
in many tutorials, i have read that changing the way a loop iterates.. ie, changing a loop from
for(i=0;i<10;i++)
to
for (i=10; i--; )
optimizes code.. but in my case, it seems to be getting worse...
where cud i be gng wrong..
i have checked everything.. the loops start from 0 and are of incrementing type.
anythign else i shud bear in mind b4 making such changes?
for(i=0;i<10;i++)
to
for (i=10; i--; )
optimizes code.. but in my case, it seems to be getting worse...
where cud i be gng wrong..
i have checked everything.. the loops start from 0 and are of incrementing type.
anythign else i shud bear in mind b4 making such changes?
Last edited by caltiger; May 9th, 2007 at 5:52 am.
>- use inline functions (new compilers do this implicitly though)
The compiler is in a better position to know what functions are best inlined. An explicit inline keyword strikes me as akin to the register keyword in premature optimization.
>- use registers.
Speaking of the register keyword, don't waste your time. A lot of compilers just ignore it, and those that don't tend to produce less efficient code because the programmer doesn't really know how to dole out register time.
>-- using the right type of variable, meaning, if yyou need to use decimal
>values in your program, if they're not too big, consider using float rather
>than double...
For size optimization, yes. For speed optimization, double is likely to be as fast or faster than float because many FPUs will work internally in double or long double precision. Matching the internal type can be faster by avoiding conversions.
>-- (where you can) using pointers instead of arrays, since it uses less memory
You're talking about a minimal constant size difference, if it exists at all. I wouldn't call this an optimization.
>-- cleaning useless information off the buffer
I don't really see how this matters.
>I`ve seen this data caching in a number of places... wat exactly does that refer to..
Caching is saving the result of an expensive operation so that you can quickly refer to it at a later time without repeating the expensive operation. On example might be pulling data from a database over a slow connection. You trade space (storing it in memory) for speed (only making one pull) by saving the data in an internal data structure.
>when optimizing for speed, one very useful thing to do (as mentioned
>above like infinite times) is reduce the amount of lines in your
>program... since the compiler runs through less instructions, which
>takes less time to do...
The only benefit of shorter code is fitting all of the instructions in a single cache line. However, C isn't 1-to-1 in statements to instructions, so how many lines your code has isn't an indication of how many instructions the machine code will have. You should keep your code as simple as possible, but don't try to be concise in the name of optimization. More often than not, you'll end up with the opposite result because your compiler had a harder time of optimizing the mess you created.
>In most cases one can see that there are seperate assemply
>instructions for signed and unsigned arithmetic. Which at least
>indicates a difference in performance.
Not really, it indicates a difference in operation. Signed arithmetic is different from unsigned arithmetic at the instruction level.
>3. Number of flags applicable (CF=carry-over-flag, SG=sign-flag,
>OF=overflow-flag) to signed and unsigned instructions' execution are
>different.
Once again because the operations are different and require the use of different flags.
>4. I'm vaguely remember an instruction called SBB (substract using
>borrow) which, if i'm not wrong, is only applicable to signed arithmetic.
SBB is sign neutral, you can use it with both.
>See http://lkml.org/lkml/2006/3/20/385
This is a specific instance from the machine code output of a specific version of a specific compiler. Not exactly good proof that unsigned is faster than signed except in that specific instance.
These kinds of micro optimizations are often pointless if you're careful to write efficient algorithms and use intelligent data structures. Those are the big wins when it comes to code performance. Also, a lot of people try to make code optimizations when their programs are data bound and not CPU bound and wonder why there's no noticeable effect. Optimize where appropriate as well as when appropriate.
The compiler is in a better position to know what functions are best inlined. An explicit inline keyword strikes me as akin to the register keyword in premature optimization.
>- use registers.
Speaking of the register keyword, don't waste your time. A lot of compilers just ignore it, and those that don't tend to produce less efficient code because the programmer doesn't really know how to dole out register time.
>-- using the right type of variable, meaning, if yyou need to use decimal
>values in your program, if they're not too big, consider using float rather
>than double...
For size optimization, yes. For speed optimization, double is likely to be as fast or faster than float because many FPUs will work internally in double or long double precision. Matching the internal type can be faster by avoiding conversions.
>-- (where you can) using pointers instead of arrays, since it uses less memory
You're talking about a minimal constant size difference, if it exists at all. I wouldn't call this an optimization.
>-- cleaning useless information off the buffer
I don't really see how this matters.
>I`ve seen this data caching in a number of places... wat exactly does that refer to..
Caching is saving the result of an expensive operation so that you can quickly refer to it at a later time without repeating the expensive operation. On example might be pulling data from a database over a slow connection. You trade space (storing it in memory) for speed (only making one pull) by saving the data in an internal data structure.
>when optimizing for speed, one very useful thing to do (as mentioned
>above like infinite times) is reduce the amount of lines in your
>program... since the compiler runs through less instructions, which
>takes less time to do...
The only benefit of shorter code is fitting all of the instructions in a single cache line. However, C isn't 1-to-1 in statements to instructions, so how many lines your code has isn't an indication of how many instructions the machine code will have. You should keep your code as simple as possible, but don't try to be concise in the name of optimization. More often than not, you'll end up with the opposite result because your compiler had a harder time of optimizing the mess you created.
>In most cases one can see that there are seperate assemply
>instructions for signed and unsigned arithmetic. Which at least
>indicates a difference in performance.
Not really, it indicates a difference in operation. Signed arithmetic is different from unsigned arithmetic at the instruction level.
>3. Number of flags applicable (CF=carry-over-flag, SG=sign-flag,
>OF=overflow-flag) to signed and unsigned instructions' execution are
>different.
Once again because the operations are different and require the use of different flags.
>4. I'm vaguely remember an instruction called SBB (substract using
>borrow) which, if i'm not wrong, is only applicable to signed arithmetic.
SBB is sign neutral, you can use it with both.
>See http://lkml.org/lkml/2006/3/20/385
This is a specific instance from the machine code output of a specific version of a specific compiler. Not exactly good proof that unsigned is faster than signed except in that specific instance.
These kinds of micro optimizations are often pointless if you're careful to write efficient algorithms and use intelligent data structures. Those are the big wins when it comes to code performance. Also, a lot of people try to make code optimizations when their programs are data bound and not CPU bound and wonder why there's no noticeable effect. Optimize where appropriate as well as when appropriate.
Last edited by Narue; May 9th, 2007 at 10:03 am.
New members chased away this month: 4
Just keep in mind that with caching, you also run the risk of having inconsistent data in your hand / in database. Almost in all cases, there is a pretty algorithm which sits tight taking care of all this, not to mention this would rarely come from your side in a real time scenario since its pretty much complicated. Usually the framework / API which you use provides a simple / easier ways of handling things. A nice example would be Entity beans in J2EE.
Last edited by ~s.o.s~; May 9th, 2007 at 1:01 pm.
I don't accept change; I don't deserve to live.
Jo Tujhe Jagaaye, Nindein Teri Udaaye Khwaab Hai Sachcha Wahi.
Nindon Mein Jo Aaye Jise To Bhul Jaaye Khawab Woh Sachcha Nahi.
Khwaab Ko Raag De, Nind Ko Aag De
Jo Tujhe Jagaaye, Nindein Teri Udaaye Khwaab Hai Sachcha Wahi.
Nindon Mein Jo Aaye Jise To Bhul Jaaye Khawab Woh Sachcha Nahi.
Khwaab Ko Raag De, Nind Ko Aag De
•
•
Join Date: Sep 2006
Posts: 21
Reputation:
Solved Threads: 0
I`ve been trying out some possibilites optimise code...
just one small doubt...
which is better?
1.
for(i-0; i<100; i++; )
{
;
;
;
}
//////////////
2.
for (i=0;i<25;i++; )
{
;
;
}
for (i=25;i<50;i++)
{
;
;
}
for (i=50; i<100; i++)
{
;
;
}
will ther ebe any change in performance if i replace one for loop with 3 or 4 for loops... in both cases, the number of iterations will be the same...
Thanks.
just one small doubt...
which is better?
1.
for(i-0; i<100; i++; )
{
;
;
;
}
//////////////
2.
for (i=0;i<25;i++; )
{
;
;
}
for (i=25;i<50;i++)
{
;
;
}
for (i=50; i<100; i++)
{
;
;
}
will ther ebe any change in performance if i replace one for loop with 3 or 4 for loops... in both cases, the number of iterations will be the same...
Thanks.
>which is better?
The first. It's simpler and shows your intentions more clearly. And no, the second isn't likely to be any faster. In fact, it might be slower because your compiler could treat the loops as completely separate and not perform optimizations that would be done if the loops were merged.
The first. It's simpler and shows your intentions more clearly. And no, the second isn't likely to be any faster. In fact, it might be slower because your compiler could treat the loops as completely separate and not perform optimizations that would be done if the loops were merged.
New members chased away this month: 4
![]() |
Similar Threads
Other Threads in the C++ Forum
- Previous Thread: Hey, I got a very simple program here. Yet the compiler is disagreeing with me.
- Next Thread: String class
| Thread Tools | Search this Thread |
Tag cloud for C++
api application array arrays based beginner binary bmp c++ c/c++ calculator char char* class classes code compile compiler console conversion convert count data delete deploy dll download dynamiccharacterarray email encryption error file format forms fstream function functions game givemetehcodez graph homeworkhelp iamthwee ifstream input int java lib library lines list loop looping loops map math matrix memory newbie news number numbertoword output pointer problem program programming project python random read recursion recursive reference return rpg search simple sorting spoonfeeding string strings struct temperature template templates text text-file tree url variable vector video visual visualstudio void win32 windows winsock wordfrequency wxwidgets






