Understanding compilation stages – Preprocessor, Compiler, Assembler, Linker, Loader

When we compile Any program in Linux using “gcc” for example ” gcc -o helloworld helloworld.c” it creates an executable with “helloworld” name in single command, but actually in background it goes on following first 4 stages as mentioned below,

1) Preprocessor
2) Compiler
3) Assembler
4) Linker
5) Loader

1) Preprocessor – The C preprocessor is the macro preprocessor for the C language. The preprocessor provides the ability for the inclusion of header files, macro expansions, conditional compilation, and line control. For example, when we write a code something like below,

#define TEST 5
printf(“%d \n”, TEST);

After the preprocessor steps the same code becomes as,

printf(“%d \n”, 5);

I.e. preprocessor goes on finding all #define, #include etc and add relative source code , definitions directly into the code.

2) Compiler – GCC : GNU project C and C++ compiler

Help – “man gcc”

When you invoke GCC, it normally does preprocessing, compilation, assembly and linking.
The “overall options” allow you to stop this process at an intermediate stage.
For example, the -c option says not to run the linker. Then the output consists
of object files output by the assembler.

3) Assembler (as)

Help – “man as”

GNU as is really a family of assemblers.
as is primarily intended to assemble the output of the GNU C compiler “gcc” for use by the linker “ld”.

If you are invoking as via the GNU C compiler, you can use the -Wa option to pass arguments through to the assembler.
The assembler arguments must be separated from each other (and the -Wa) by commas. For example:

READ  how to setup 301 redirect from non-www to www and subfolder

gcc -c -g -O -Wa,-alh,-L file.c

This passes two options to the assembler: -alh (emit a listing to standard output with high-level and assembly source)
and -L (retain local symbols in the symbol table).

4) Linker – (ld – The GNU linker)

ld combines a number of object and archive files, relocates their data and ties up symbol references.
Usually the last step in compiling a program is to run ld.

The Loader, as we seen below is not the step of compilation, but its one of the first stages of execution of a program, in which loader tries to load all the libraries along with the application during start time.

5) Loader
ld.so/ld-linux.so – dynamic linker/loader

ld.so loads the shared libraries needed by a program, prepares the program to run, and then runs it.
Unless explicitly specified via the -static option to ld during compilation, all Linux programs are
incomplete and require further linking at run time.

In Next Two posts we will understand how these steps actually works when we tries to compile “helloworld.c” program.
1. understanding gcc compilation steps : linux compilation steps
2. from source code to executable : how executable is created during compilation on linux

Leave a Reply

Your email address will not be published. Required fields are marked *