Hi,
I have to port a code onto an ARM processor... what is the most commonly used methods for optimizing C code?
I`ve read abt limiting the number of parameters in a function, pass values by refernce in argunments rather than by value, and trying to avoid global variables...

My question is .. other than these commonly used techniques, are there any other techniques?

Thanks.

Depends on what you're optimizing for. In general, it's fine to just use efficient algorithms and let the compiler do the nitty-gritty. It doesn't really matter how well you optimize something if you optimize the wrong part.

Of course.. let the machine do your job. :)
Compilers provide options to generate optimized for a given hardware (it optimizes using the full instruction set of that hardware). I have no idea abt the ARM processor you mentioned but check your compiler's manual if it provides any option for it.
Of course almost every compiler support the generic optimization options. So those you can anyway use (these are platform independent).
Finally just 2 words of caution:
1. When you use optimization options you loose debug information. Suggest you also test your application in optimized form (if you are doing it in debug only so far).
2. Regarding the "commonly used techniques", I suggest you first find out what has to be optimized using some tools like Quantify, there is no point in spending time/effort to optimize a function that's called only 10 times in an hour. Sometimes removing a temporary variable in a small function helps performance because it's called 10000 times. So if you're using the "commonly used techniques" use it where it's needed.
Other things you can do:
- data caching
- using unsigned vars,
- use inline functions (new compilers do this implicitly though)
- use registers.
- Finally I hope you've had a look at what everyone else has said in this thread.

besides, you should consider:

-- using the right type of variable, meaning, if yyou need to use decimal values in your program, if they're not too big, consider using float rather than double...

-- (where you can) using pointers instead of arrays, since it uses less memory

-- being specific and simple in your processes, meaning you develop in 10 lines of programming what, with another process working just the same, you could use 25.

-- cleaning useless information off the buffer

-- using struct arrays instead of single variable arrays where you can

and such things that may seem simple, but can optimize your memory usage

Other things you can do:
- data caching
- using unsigned vars,
- use inline functions (new compilers do this implicitly though)
- use registers.
- Finally I hope you've had a look at what everyone else has said in this thread.

I've never heard of using unsigned being "optimized." How's that one work? The inline and register keywords are (anymore) just hints to the compiler, it'll usually do what it thinks is best unless you give it specific flags.

I`ve seen this data caching in a number of places... wat exactly does that refer to..

does it mean creating and using lookup tables instead of making the compiler calculate n generate values??

I've never heard of using unsigned being "optimized." How's that one work? The inline and register keywords are (anymore) just hints to the compiler, it'll usually do what it thinks is best unless you give it specific flags.

exactly.. from wat i`ve heard unsigned takes more cycles than signed ...

it's not that unsigned takes more cycles that signed 8correct me if i'm wrong), but the thing is that the space unsigned variables do not use in negative numbers is used in positive numbers... i.e.

int n

goes from -32767 to 32767, so, logically

unsigned int n

goes from 0 to 65524... right?

Unsigned is no more optimized than signed; they're the exact same instructions for addition, multiplication, and division.

If you really want to optimize numerical code, use Fortran.

And before you optimize your code, ask if you really need to optimize anything.

And what are you optimizing for? Memory usage? Speed?

exactly.. from wat i`ve heard unsigned takes more cycles than signed ...

its not that it takes more cycles (correct me if i'm wrong)... its just that an unsigned variable uses the memory that is not used innegative numbers in positive numbers... i.e.:

int n

goes from [-32767,32767], and

unsigned int n

goes from [0,65534]... right?:-/


And what are you optimizing for? Memory usage? Speed?

I`m optimizing for speed... sorry..forgot to mention it earlier... :)

when optimizing for speed, one very useful thing to do (as mentioned above like infinite times) is reduce the amount of lines in your program... since the compiler runs through less instructions, which takes less time to do...

It's also good to know a bit about your architecture. Some operations will take longer than others (multiplication and division usually do IIRC). But you'll want to make sure your code is streamlined in the right places first and foremost.

I've never heard of using unsigned being "optimized." How's that one work? The inline and register keywords are (anymore) just hints to the compiler, it'll usually do what it thinks is best unless you give it specific flags.

Lotsa ppl have said this (why u think unsigned is faster than singed).
>>How does it work?
Just use unsigned instead of signed int to declare a variable when you know that variable isn't supposed to have singed value. Most common would be:

vector<int> v = getSomeVec() ;
for( unsigned i = 0; i < v.size(); i++ )
    cout << v[i] << endl ;

How is it faster?
>> Here is what I wrote in teh other thread (which remains uncontested). So that's my proof until proven otherwise. :)
----------------------------------------------
> 1. unsigned int arithmatic is faster than signed int.
Where's your evidence?
KashAI>> I was afraid someone will ask. :icon_smile:. Anyway, simple answer is I don't know.
But here is what I know:
1. In VS 6.0 (on Intel H/W) a simple for loop with loop variable being unsigned is about 2 seconds faster than when loop variable is signed int. (looped some 100K and 500K times to print the value of loop variable)
2. In most cases one can see that there are seperate assemply instructions for signed and unsigned arithmetic. Which at least indicates a difference in performance.
3. Number of flags applicable (CF=carry-over-flag, SG=sign-flag, OF=overflow-flag) to signed and unsigned instructions' execution are different.
4. I'm vaguely remember an instruction called SBB (substract using borrow) which, if i'm not wrong, is only applicable to signed arithmetic.
And use of it is in case where the requested substraction of 2 signed numbers can not be completed with a single instruction due to register size.

Sorry forgot one more thing regarding "1. unsigned int arithmatic is faster than signed int.
Where's your evidence?"
See http://lkml.org/lkml/2006/3/20/385
----------------------------------------------

Additionally:
1. Someone siad signed/unsigned use same instructions, that's not true.
2. By "data caching" I meant if you are using ANY kind of data (viz not constant and used in multiple places) keep it in global (or member or static) variables so you initialize it only once and use it. This is a very vast topic and applies case by case, I'll just quote 2 common examples:
- When you've to read some data from a file, read it once and store in some container (map/vector) and look it up instead of opening and searching the file. (IO operations are costlier from perf. pov than memory access).
- Make constants used inside functions static: E.g.
This is better than:

void my_class::my_func()
{
      static const char f_name[] = "my_class::my_func()" ;
      traceObj.write("%s: Entering", f_name ) ;
}

This:

void my_class::my_func()
{
      const char f_name[] = "my_class::my_func()" ;
      traceObj.write("%s: Entering", f_name ) ;
}

Even better is this (but you might wanna use f_name for something else as well):

void my_class::my_func()
{
      traceObj.write("my_class::my_func(): Entering" ) ;
}

in many tutorials, i have read that changing the way a loop iterates.. ie, changing a loop from
for(i=0;i<10;i++)
to
for (i=10; i--; )
optimizes code.. but in my case, it seems to be getting worse...
where cud i be gng wrong..

i have checked everything.. the loops start from 0 and are of incrementing type.
anythign else i shud bear in mind b4 making such changes?

for (i=10; i--; )

you can do this if you dont care for loop order.
The answer you will find if you check the asm for both cases, then U will see for which case is less asm code.

>- use inline functions (new compilers do this implicitly though)
The compiler is in a better position to know what functions are best inlined. An explicit inline keyword strikes me as akin to the register keyword in premature optimization.

>- use registers.
Speaking of the register keyword, don't waste your time. A lot of compilers just ignore it, and those that don't tend to produce less efficient code because the programmer doesn't really know how to dole out register time.

>-- using the right type of variable, meaning, if yyou need to use decimal
>values in your program, if they're not too big, consider using float rather
>than double...
For size optimization, yes. For speed optimization, double is likely to be as fast or faster than float because many FPUs will work internally in double or long double precision. Matching the internal type can be faster by avoiding conversions.

>-- (where you can) using pointers instead of arrays, since it uses less memory
You're talking about a minimal constant size difference, if it exists at all. I wouldn't call this an optimization.

>-- cleaning useless information off the buffer
I don't really see how this matters.

>I`ve seen this data caching in a number of places... wat exactly does that refer to..
Caching is saving the result of an expensive operation so that you can quickly refer to it at a later time without repeating the expensive operation. On example might be pulling data from a database over a slow connection. You trade space (storing it in memory) for speed (only making one pull) by saving the data in an internal data structure.

>when optimizing for speed, one very useful thing to do (as mentioned
>above like infinite times) is reduce the amount of lines in your
>program... since the compiler runs through less instructions, which
>takes less time to do...
The only benefit of shorter code is fitting all of the instructions in a single cache line. However, C isn't 1-to-1 in statements to instructions, so how many lines your code has isn't an indication of how many instructions the machine code will have. You should keep your code as simple as possible, but don't try to be concise in the name of optimization. More often than not, you'll end up with the opposite result because your compiler had a harder time of optimizing the mess you created.

>In most cases one can see that there are seperate assemply
>instructions for signed and unsigned arithmetic. Which at least
>indicates a difference in performance.
Not really, it indicates a difference in operation. Signed arithmetic is different from unsigned arithmetic at the instruction level.

>3. Number of flags applicable (CF=carry-over-flag, SG=sign-flag,
>OF=overflow-flag) to signed and unsigned instructions' execution are
>different.
Once again because the operations are different and require the use of different flags.

>4. I'm vaguely remember an instruction called SBB (substract using
>borrow) which, if i'm not wrong, is only applicable to signed arithmetic.
SBB is sign neutral, you can use it with both.

>See http://lkml.org/lkml/2006/3/20/385
This is a specific instance from the machine code output of a specific version of a specific compiler. Not exactly good proof that unsigned is faster than signed except in that specific instance.

These kinds of micro optimizations are often pointless if you're careful to write efficient algorithms and use intelligent data structures. Those are the big wins when it comes to code performance. Also, a lot of people try to make code optimizations when their programs are data bound and not CPU bound and wonder why there's no noticeable effect. Optimize where appropriate as well as when appropriate.

Just keep in mind that with caching, you also run the risk of having inconsistent data in your hand / in database. Almost in all cases, there is a pretty algorithm which sits tight taking care of all this, not to mention this would rarely come from your side in a real time scenario since its pretty much complicated. Usually the framework / API which you use provides a simple / easier ways of handling things. A nice example would be Entity beans in J2EE.

I`ve been trying out some possibilites optimise code...

just one small doubt...

which is better?

1.
for(i-0; i<100; i++; )
{
;
;
;
}

//////////////

2.
for (i=0;i<25;i++; )
{
;
;
}

for (i=25;i<50;i++)
{
;
;
}

for (i=50; i<100; i++)
{
;
;
}

will ther ebe any change in performance if i replace one for loop with 3 or 4 for loops... in both cases, the number of iterations will be the same...

Thanks.

>which is better?
The first. It's simpler and shows your intentions more clearly. And no, the second isn't likely to be any faster. In fact, it might be slower because your compiler could treat the loops as completely separate and not perform optimizations that would be done if the loops were merged.

ok.. one more doubt...
suppose i have 2 arrays a[100] and b[100]

i ahve to copy the elements from b to a..

but the first 50 elements are the negative of the next 50.. soo is it better to continue copying from b to a..

or just take the elements from a, negate them and then copy them to the remaining 50 locations of array a...?

eg:
a[0]=-a[100]
a[1]=-a[99]
a[2]=-a[98]
a[3]=-a[97]

and so on...

so is it better to copy from b to a or just directly from a...??

Thnaks for all the help soo far.. it really helped me alot...

:)

It would probably be better to copy from a because you're more likely to get better cache performance. But it's still pointless to optimize if you aren't sure that this code causes a problem.

This article has been dead for over six months. Start a new discussion instead.