Hi,

if we had the following program::

int main()
{
	int a, d;

	int b=12;
	int c=13;

	a=b+c;
	d=b+c;

	return 0;
}

what would gcc produce as a result?

would it be::

fetch b
fetch c
add
store to a
fetch b
fetch c
add
store to b

Or

something more efficient like

fetch b
fetch c
add
store to a
store to b

How can i see the assembly produced by the compiler{if it is possible}?

Where does the compiler makes a better job than the programmer?

From your experience where do you need to employ assembly while programming a real life project...

thanks in advance,
nicolas

PS:: sorry if i am asking a lot of things, but i am really interested in the optimization part of programming,
so plz contribute in every way you can...

>How can i see the assembly produced by the compiler{if it is possible}?
You can add a switch to most compilers that tell them to produce assembly output.

>Where does the compiler makes a better job than the programmer?
Unless you're a good assembly programmer (and in some cases a fantastic assembly programmer), probably everywhere.

>From your experience where do you need to employ
>assembly while programming a real life project...
It depends on the project, but I'd start with nowhere and use assembly as little as possible and only when absolutely necessary.

> what would gcc produce as a result?
The compiler might infer that both operands are known at compile time, and simply store 25 in a and d.

The compiler might also infer that none of the calculations are used, and remove them completely, thus reducing the program to return 0; > Where does the compiler makes a better job than the programmer?
By consistently applying all optimisation tricks known to the vast team of programmers which developed the compiler, as opposed to the few techniques you've learnt so far, and can think to use in the given circumstance.
You may eventually beat the compiler on any given sample of code, but the effort on your part would be measured in days or weeks.

> From your experience where do you need to employ assembly while programming a real life project.
I've only ever used assembler in the early bootstrap of a processor before there is a viable environment to support writing C code.

I am an avid and would even venture to say an excellent assembly programmer. In the early 80's my assembly overlays to basic where an essential component to greater throughput. Example sorting 1,000 elements in basic would take 11 mins and with the overlay 30 sec with the kind of hardware available then. BIG difference. Can any compiler code as effectively as a good assembly programmer, never. Today though I even question why compilers even have the ability for inline assembly. Most operating systems and especially the two most popular ones don't allow you to address hardware directly and the amount of code you can crank out in an afternoon is monumental with a "C" compiler versus assembly.

So unless you plan for some reason becoming an extremely good assembly coder, I wouldn't even bother with it. I know if my experience didn't come from a time where it was needed I wouldn't bother for the few milliseconds you might eek of of an application over what most compilers can do.

> How can i see the assembly produced by the compiler{if it is possible}?
gcc: to generate assembly pseudo code, use the -S switch. eg. g++ -O2 -S -c whatever.cc to see the C code together with the assembly it was converted to: g++ -c -g -O2 -Wa,-a,-ad whatever.cc > whatever.asm microsoft: use one of /FA, /FAc, /FAs, /FAu compiler switches

> > From your experience where do you need to employ assembly while programming a real life project.
a. to implement something like the macros in <cstdarg> ( va_start, va_arg, va_end ) assembly may be required on some platforms.
b. defining things like sig_atomic_t, atomic_add etc. on some platforms.
c. to implement thunks and trampolines on some platforms.

Comments
helpful as always!

thank you all for your answers:

with the example i stated above gcc didn't do a better job {if i understand correctly the output}

.file	"testGCC.c"
	.text
.globl main
	.type	main, @function
main:
	leal	4(%esp), %ecx
	andl	$-16, %esp
	pushl	-4(%ecx)
	pushl	%ebp
	movl	%esp, %ebp
	pushl	%ecx
	subl	$16, %esp
	movl	$12, -12(%ebp)
	movl	$13, -8(%ebp)
	movl	-8(%ebp), %eax
	addl	-12(%ebp), %eax
	movl	%eax, -20(%ebp)
	movl	-8(%ebp), %eax
	addl	-12(%ebp), %eax
	movl	%eax, -16(%ebp)
	movl	$0, %eax
	addl	$16, %esp
	popl	%ecx
	popl	%ebp
	leal	-4(%ecx), %esp
	ret
	.size	main, .-main
	.ident	"GCC: (GNU) 4.1.2 (Ubuntu 4.1.2-0ubuntu4)"
	.section	.note.GNU-stack,"",@progbits

and with o2 i get::

.file	"testGCC.c"
	.text
	.align 2
	.p2align 4,,15
.globl main
	.type	main, @function
main:
.LFB2:
	leal	4(%esp), %ecx
.LCFI0:
	andl	$-16, %esp
	pushl	-4(%ecx)
.LCFI1:
	xorl	%eax, %eax
	pushl	%ebp
.LCFI2:
	movl	%esp, %ebp
.LCFI3:
	pushl	%ecx
.LCFI4:
	popl	%ecx
	popl	%ebp
	leal	-4(%ecx), %esp
	ret
.LFE2:
	.size	main, .-main
.globl __gxx_personality_v0
	.ident	"GCC: (GNU) 4.1.2 (Ubuntu 4.1.2-0ubuntu4)"
	.section	.note.GNU-stack,"",@progbits

By consistently applying all optimisation tricks known to the vast team of programmers which developed the compiler, as opposed to the few techniques you've learnt so far, and can think to use in the given circumstance.

yes i {also} share this opinion but is there any sound example??

the debate started when i argued with a hardware oriented classmate that has the opinion that you can always write better code with assembly as long as you stick with functions and the complexity stays low... of course i couldn't agree with him{i've been reading this forum for some months now!} but i didn't have any solid example...


another question:: i've learned {a year ago} some pretty basic assembly stuff{doing loops,conditionals, simple algorithms, manipulating the stack}... is there anything else that someone should know?

> > From your experience where do you need to employ assembly while programming a real life project.
a. to implement something like the macros in <cstdarg> ( va_start, va_arg, va_end ) assembly may be required on some platforms.
b. defining things like sig_atomic_t, atomic_add etc. on some platforms.
c. to implement thunks and trampolines on some platforms.

this is interesting....when you say trampoline you mean this, right?

When testing what assembler will be produced, don't use main() as the example function. Being the special function where the program starts, it tends to accumulate baggage which isn't present in any other function.

Eg.

void foo ( ) {
	int a, d;

	int b=12;
	int c=13;

	a=b+c;
	d=b+c;
}

int main(void) {
    foo();
    return 0;
}


$ gcc -S foo.c && cat foo.s
_foo:
        pushl   %ebp
        movl    %esp, %ebp
        subl    $16, %esp
        movl    $12, -12(%ebp)
        movl    $13, -16(%ebp)
        movl    -16(%ebp), %eax
        addl    -12(%ebp), %eax
        movl    %eax, -4(%ebp)
        movl    -16(%ebp), %eax
        addl    -12(%ebp), %eax
        movl    %eax, -8(%ebp)
        leave
        ret
# In other words, it does exactly what the code says

$ gcc -S -O2 foo.c && cat foo.s
_foo:
        pushl   %ebp
        movl    %esp, %ebp
        popl    %ebp
        ret
# The optimiser realises the results are never used, and the code is gone.

If foo() is declared 'static', then there isn't even that much. There is no function at all, and main() doesn't call it.

> has the opinion that you can always write better code with assembly as long
> as you stick with functions and the complexity stays low
It's certainly always "possible", and nobody is going to disagree that given enough time and experience, you can produce the equivalent translation which is better than the compiler.
But when you can write code in a high level language which achieves 95% of the performance with only 5% of the coding effort, you really need a damn good reason to resort to assembler.

Sure, if you've got nothing else to do, and no set time limit in which to do it, you can sit down and craft your assembler code. But some of us have deadlines to meet and changing requirements to cope with.

If you're writing user-land programs for a desktop machine, then there really isn't any need.

Here is another example of where you might choose assembler.
You're writing PIC code and have very tight memory and/or timing constraints. Every byte and every clock tick matters, and you need to make sure you're getting the best of everything.

Comments
very helpful!

> when you say trampoline you mean this, right?
yes, except for the java and ojective-c usage of the term.

perhaps the use of trampolines that you would be most familiar with is that in a thread function. the compiler would compile the code for a thread function like any other function; you *can* safely call it like a normal function if you want. when a return is executed from a thread function (other than the one which executes main), an implicit call needs to be made to the system call (pthread_exit, ExitThread) which ends a thread with proper cleanup. this cleanup code would call cleanup/cancellation handlers, destroy any thread specific data, release the stack(s) etc. this is achieved by placing a trampoline on the thread stack (where the return address is expected).

another common example of trampoline use is in implementing tail recursion (eg. in languages of the lisp family). almost all the implementations of compilers/interpreters are in C. generating code for tail recursion in C (without growing the stack) is achieved by a number of trampoline bounces.

trampolines are also used to implement closures in C/C++. for example, in the (non-standard) implementation of nested functions in gcc. http://gcc.gnu.org/onlinedocs/gcc/Nested-Functions.html

posted by salem:

$ gcc -S -O2 foo.c && cat foo.s
_foo:
        pushl   %ebp
        movl    %esp, %ebp
        popl    %ebp
        ret
# The optimiser realises the results are never used, and the code is gone.

if we use the -fomit-frame-pointer switch, no frame pointer is created and the code reduces to

_foo:
    xorl %eax, %eax
    ret

this may cause problems while debugging code on systems with non-reentrant kernels (eg. linux/gdb) which is probably why gcc does not omit frame pointers by default. linux signal trampolines don't have meaningful frames; when a signal trampoline is invoked from a frameless function, there are two frameless functions in a row. gdb tries to solve this by looking for patterns (of frame pointer set up) in the stack. this usually works, except when a signal occurs just as a function is entered, but before the frame has been set up.

...perhaps the use of trampolines that you would be most familiar with is that in a thread function...

truth be told, i have never heard about trampolines before your post{my fault}... still everything you wrote is very interesting.

thanks vijayan!

PS::
for other newbies{like me} :: here is a link...that explains some basics of optimization

This article has been dead for over six months. Start a new discussion instead.