Greetings

I wanna know what is going inside the steps when compiling a C++ source code to native code. I know the steps are such that:
1- Preprocessor copies the contents of included header files into the source code file.
Q1: So how would the C++ source code file look after such step?
Q2: Is the preprocessor a program?.
2- The expanded source code file produced after the step 1 is compiled into the assembly language for the platform.
3- The assembly code generated by the compiler is assembled into object code (native code) for the platform.
Q1: Why do the new source code file compiled into the assembly language before compiled into natve machine code?
After generating such assembly language it is supposed that the assembler is the program responsible for translating the assembly language or assembly instructions into native machine code.
Q2: So is the role of the compiler stops after generating the assembly language?
Q3: If the Q2 answer is Yes so will the compiler pass the produced assembly code file to an assembler and force it to translate the assembly code generated?
4- The object code file generated by the assembler is linked together with the object code files for any library functions used to produce an executable file.
Here is the role of the linker
Q1: Where is that linker is located?
Q2: I do not know what it is mean by "linked together with the object code files for any library functions used to produce an executable file"? For example, I'm using a library for drawing etc. what is that mean in the context of the previous quoted statment.

Regards,

Amr Mohammed

Finally, what is the best suggested book to read, assuming that I wanna start to study C++ as a beginner, to reach professionalism in C++

Recommended Answers

All 3 Replies

1-Q1. It would be easiest to give an example, rather than explain it. Let's say you have the files
foo.h

int foo(int x);
int bar(char* str);

foo.cpp

#include "foo.h"
#define QUUX 23

int foo(int x)
{
   return x * QUUX;
}

Then when the pre-processor has gone over it, the source would be:

int foo(int x);
int bar(char* str);

int foo(int x)
{
   return x * 23;
}

1-Q2. It depends on the compiler suite in question. Traditionally, the pre-processor was a separate program called cpp, and in most Unices this is still the case. Most Windows compilers, however, combine the preprocessor and the compiler into a single program.

3-Q1. Mostly out of simplicity; the assembler already exists, so it makes sense to use it rather than duplicate the code needed to generate an object file. Note that, once again, some compilers don't use a separate assembler, but instead generate the executable directly.

3-Q2 For compilers that work that way, yes.

3-Q3 Yes, though the word 'force' is a bit inaccurate.

4-Q1 Again, it depends on the compiler tool suite. Usually, it is a separate program; in Unix (and Linux), it is called ld and can be used separately from the compiler and assembler entirely.

4-Q2. What the linker does is take a copy of the object code from the object file and the library files, and copies it to the new executable file, making the necessary changes to resolve certain things which couldn't be known at assembly time, such as the relative addresses of the library functions in memory. Mind you, the job of linking isn't entirely finished until the program is loaded; it is the loader which actually sets up the final addresses and memory layout of the program when it runs. For more flavor, see the online text of Linkers and Loaders by John Levine.

Greetings
In the beginning, thanks for you quick reply

I cannot understand what do you mean by "the assembler already exists"!!! do you mean the assembler program? if that is what you meant by the quoted bolded statments so why do we need to compile the source code "macro" to the assembly language why not the compiler do all tasks including translating to native code (object code)?

I did not meant the actual meaning of the word force I meant something like calling the assembler and directing it to translate the assembly code file generated into the assembly language. So what is exactly happened is that step?

You meant that the linker copy the native code generated within the object code file of my application combines it with copies of native code of library files I used within my source code file, for example a drawing library, and create the new executable file.

Is the .exe file generated contains only native code within?

These are a lot of questions. I was actually planning on writing a tutorial explaining all this stuff. I'm a bit busy right now, so, in the mean time, I'll briefly answer your questions.

1- Preprocessor copies the contents of included header files into the source code file.

Yes, among other things. The preprocessor will also evaluate all the conditionals (e.g., #if, #elif, etc.), prune away the comments and other fluff that is irrelevant for compilation, perform the MACRO substitutions, perform tri-graph substitutions as well (if required), and process all the compiler pragmas.

Q1: So how would the C++ source code file look after such step?

That preprocessed source file probably never exists in actuality (depending on the implementation of the compiler), but if it did, it would probably look like a large cpp file (with all headers copy-pasted into it), which is clean of any conditional compilation blocks or comments. One could easily imagine that blank lines would also be pruned away. In any case, it would still be perfectly recognizable C++ code, which the pre-processor will not touch.

Q2: Is the preprocessor a program?

In theory, it could be a separate program that runs before the compiler runs, and some compiler suites do that. However, many compilers don't do that, and for a couple of good reasons. First, things like #pragmas are there to configure the compiler (not the pre-processor) and thus, they need to be registered with the compiler while the preprocessor runs. Second, as I hinted before, the pre-processor on most compilers probably doesn't output an intermediate source file (for each translation unit), and that is because most compilers try to preserve the line numbering and general correspondance between the original source code and the one it compiles in order to be able to output useful messages (with line numbers and context) in the event of a compilation error. In other words, the pre-processor probably creates some sort of segmented structure of text (source code) with annotations to relate it back to the original. There are other reasons I'm sure. In any case, it seems many compiler-vendors have settled for a pre-processor as part of the compiler, but, of course, it still runs before any actual compilation is done, and in that sense, for all practical purposes, you can see it as a separate program (or step).

2- The expanded source code file produced after the step 1 is compiled into the assembly language for the platform.

Yes.

3- The assembly code generated by the compiler is assembled into object code (native code) for the platform.

Yes.

Q1: Why do the new source code file compiled into the assembly language before compiled into natve machine code?

For a few reasons I guess. First, it's convenient to be able to output the assembly listings (assembly code), especially for those interested in code optimization. Second, the assembly listings are often not as close to machine code as you might think. They are often platform-agnostic, i.e., pseudo-assembly so to speak, or at least, they use a more basic set of instructions that will later expand more or fewer instructions depending on the specific details of the platform (e.g., optimizations using special instructions, etc.). And finally, you have to remember that most compiler-vendors not only provide C++ compilers, but usually an entire suite of compilers (e.g., GCC includes compilers for C, fortran, Java, Objective-C, Ada, Go, etc.), so, it makes sense to make one program to translate assembly listings into machine code, and reuse it for every compiler.

Q2: So is the role of the compiler stops after generating the assembly language?

Pretty much. The assembler is more or less a one-to-one translation with possibly some nano-optimizations. Of course, it's a delicate task to program an assembler (hence the separation of it from the compiler), but I wouldn't classify the assembly step as really being part of "compilation" because at that point the original code has evaporated and the rules of the language are no longer a concern.

Q3: If the Q2 answer is Yes so will the compiler pass the produced assembly code file to an assembler and force it to translate the assembly code generated?

Yes. Most compiler-vendors have the assembler as a separate program, in GCC, the assembler is called just as (or as.exe in MinGW). It is a separate program that takes assembly listings and produces binary code. When you invoke the compiler on some source code, the compiler program itself only goes as far as generating assembly listings, and then invokes the assembler to do the rest.

4- The object code file generated by the assembler is linked together with the object code files for any library functions used to produce an executable file.

Yes.

Q1: Where is that linker is located?

It's a separate program, again, invoked by the compiler when you simply compile a piece of code. Of course, you can also invoke it separately, as most build-systems will do, i.e., invoke the compiler just to create the object files and then selectively collect all the desired object files and invoke the linker on them. On GCC, the linker is called ld (or ld.exe), while with MSVC (Microsoft), the linker program is called link.exe. The linker is usually located in the same directory as the compiler (and assembler, and everything else (usually referred to as "binutils")).

Q2: I do not know what it is mean by "linked together with the object code files for any library functions used to produce an executable file"? For example, I'm using a library for drawing etc. what is that mean in the context of the previous quoted statment.

I also have a hard time understanding that sentence, given its poor grammar. The process is pretty simple, at least, conceptually. Any given translation unit (a cpp file and all the headers it #included) will have a number of functions, classes, and variables defined in it (implemented in them). Most of them (by default) will have "external linkage" meaning that they should be sort-of published after compilation, and those appear "published" as a set of symbols (a fancy word for "name") visible to the linker. Then, the translation units will also use a number of functions, classes and variables that are not defined in that translation unit, only declared (usually in an included header file). The "used" elements also get sort of published as a kind of "to be assigned" (or more precisely, as "an external reference to be resolved") set of symbols. The job of the linker is simply to take all the object files (and static libraries, which are pretty much the same) that were given to it, go through them, making up a big list of symbols and where their definitions can be found (in which object file and where in it) and where they are needed. At the end, the linker will start at the "main" function, and start pulling in any definition (compiled code for a given symbol) that the main functions uses into the destination executable, and work recursively from there, doing the same for each function that was pulled in, until everything that is needed is in the executable file. That's more or less what the linker does and how the executables are produced, it gets only a bit more complicated when you throw dynamic libraries into the mix.

Finally, what is the best suggested book to read, assuming that I wanna start to study C++ as a beginner, to reach professionalism in C++

Please refer to this thread.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.