Hi, recently I have wanted to learn how to write Windows native binary code to make a binary script which in one form or another will display hello world. I am hoping to eventually create a more efficient compiler using this method but as you should understand already, this is a rather undocumented section of programming. So basically I need to know for the gcc compiler, how did they manually set the ones and zeros in accordance to the execution environment when you need to know how the execution envirment executes binary. Like is there any docs for how windows executes an executable in binary terms? I am hoping I don't get the odd person who says this isn't possible as people used to do this with pounch cards and now I'm doing the modern equivelant which is writing to the harddrive each one and zero. So please provide details on the Windows binary execution environment.

Thank you

cwarn23

but as you should understand already, this is a rather undocumented section of programming

Actually, it's very well documented if you know what to look for. Specifically, when directly writing machine code you need to do two things:

  1. Understand and create the formatting of PE (Portable Executable) files. This will be the boilerplate that allows Windows to execute your code.

  2. Understand the opcodes that correspond to an assembly language of your choice for the CPU architecture. Using a hex editor you can insert those opcodes manually (ie. "program" in machine code) just as an assembler would.

Alternatively, you could read the source code for an assembler to see how it parses and assembles things, since programming in machine code is tantamount to replacing the assembler with your own brain and fingers.

So basically I need to know for the gcc compiler, how did they manually set the ones and zeros in accordance to the execution environment when you need to know how the execution envirment executes binary.

Just as a matter of note: strictly speaking, the GCC C compiler doesn't manually set the opcodes. GCC itself is just a driver program that parses the shell arguments and directs the subprograms it uses to do the actual work. Compiling a C program as done by GCC is actually a multistage process. While I am not certain about all of the details, as I understand it, the sequence is something like this:

1) The pre-processor, which handles all the directives and emits the compilable source code;
2) The compiler front-end, which generates a system-independent (and language-independent) intermediate form;
3) The inner optimizer, which performs tasks such as constant folding;
4) The compiler back-end, which generates assembly source for the target processor;
5) The peephole optimizer, which optimizes the assembly source for the given processor type;
6) The assembler, gas, which converts the assembly source to object code; and
7) The linker, which produces the program file.

Strictly speaking, there is still one more step as well: the loader, which converts the executable file into the actual executable image at load time.

This multi-step approach is not universal, but most compilers do something like this internally, and the use of a stand-alone assembler is not at all uncommon. If you want to learn how to generate object code, you would want to learn about assemblers, as even those compilers which perform object code generation directly use techniques common to assemblers in general. While I haven't read it yet, I understand that the textbook Modern Compiler Design covers assemblers and linkers as well as compilers, and appears to discuss direct generation of object code from what little I've been able to gather.

If you want to understand how the executable image is loaded and run, you'll want to first understand about the loader, as well as the executable format being used. The online text Linkers and Loaders, while somewhat dated and full of typos (the print version fixes most of those), is an excellent introduction to the subject. For something more specific, you might want to look at this blog series, which covers the ELF format in detail. You might want a refresher on general computer architecture as well.

Unfortunately for us all, the x86 platform is a real bear to write a complete assembler for. But that's neither here nor there at this point. Some books which are relevant to it are x86 Disassembly and How Debuggers Work.

Edited 3 Years Ago by Schol-R-LEA

This article has been dead for over six months. Start a new discussion instead.