I've written (more like copied) the following assembly code :

section .text
    global main 

main:
    mov edx,len
    mov ecx,msg
    mov ebx,1

    mov eax,4
    int 0x80

    mov eax,1
    int 0x80

section .data

    msg db  'Hello, world!', 0xa
    len equ $ - msg

I've compiled it with NASM using the following command: nasm -g -f elf test.asm
and then gcc test.o -o testasm

then I used GDB to debug testasm, and issued the command "disassemble main", which gave the following return:

0x080483c0 <+0>: mov $0xe,%edx
0x080483c5 <+5>: mov $0x804a010,%ecx
0x080483ca <+10>: mov $0x1,%ebx
0x080483cf <+15>: mov $0x4,%eax
0x080483d4 <+20>: int $0x80
0x080483d6 <+22>: mov $0x1,%eax
0x080483db <+27>: int $0x80
0x080483dd <+29>: nop
0x080483de <+30>: nop
0x080483df <+31>: nop

Which is not quite the same. I undertand that in machine binary code there's no declaration of constants, but where in this disassembled code is the "Hello, world!" string? I tried to convert $0x804a010 to string but that's not it, so I don't know.

Any clues?

then I used GDB to debug testasm, and issued the command "disassemble main"

Thus GDB gave you the disassembly for main and nothing else. Because you put msg in a different section, NASM could have put that anywhere in the resulting binary. If I recall correctly, GDB's disassemble command only works on a range of memory, either explicit or identified with a symbol like main--someone who's more familiar with GDB might be able to correct me here if I'm wrong.

You might try a dedicated disassembler on the compiled executable; for x86 disassembly, the free version of IDA has worked well for me.

In GDB you can see where a string is located (gdb) x/s &msg

Another thing you can do is objdump -ds -Mintel <your program name> in command line and the result on my system being 64bit is;

Test1:     file format elf64-x86-64

Contents of section .text:
 4000b0 ba0e0000 00b9d000 6000bb01 000000b8  ........`.......
 4000c0 04000000 cd80b801 000000cd 80        .............   
Contents of section .data:
 6000d0 48656c6c 6f2c2077 6f726c64 210a      Hello, world!.  
Contents of section .stab:
 0000 01000000 00000900 09000000 01000000  ................
 0010 64000000 b0004000 00000000 44000400  d.....@.....D...
 0020 b0004000 00000000 44000500 b5004000  ..@.....D.....@.
 0030 00000000 44000600 ba004000 00000000  ....D.....@.....
 0040 44000700 bf004000 00000000 44000800  D.....@.....D...
 0050 c4004000 00000000 44000900 c6004000  ..@.....D.....@.
 0060 00000000 44000a00 cb004000 00000000  ....D.....@.....
 0070 64000000 00000000                    d.......        
Contents of section .stabstr:
 0000 00546573 74312e53 00                 .Test1.S.       

Disassembly of section .text:

00000000004000b0 <_start>:
  4000b0:   ba 0e 00 00 00          mov    edx,0xe
  4000b5:   b9 d0 00 60 00          mov    ecx,0x6000d0
  4000ba:   bb 01 00 00 00          mov    ebx,0x1
  4000bf:   b8 04 00 00 00          mov    eax,0x4
  4000c4:   cd 80                   int    0x80
  4000c6:   b8 01 00 00 00          mov    eax,0x1
  4000cb:   cd 80                   int    0x80

As you are using NASM you can set disassembly-flavor intel in GDB.

Wait, does that mean that the executable (ELF) file is not 100% machine code (0s and 1s)? The OS interprets them?

Exactly and because -g was used there an addional segment .stabs included. Sources assembled with -fbin don't have such embelishments and are indicative of bootloaders or more commonly at the time .com files for MSDOS.

The following example will not run, but does show how a pure binary file would be assembled. Your example modified a little

    [bits 32]

        mov edx,len
        mov ecx,msg
        mov ebx,1
        mov eax,4
        int 0x80
        mov eax,1
        int 0x80

    msg     db  'Hello, world!', 0xa
    len     equ     $ - msg

Would yeild object code like this.

        00  BA0E000000        mov edx,0xe 
        05  B91D000000        mov ecx,0x1d 
        0A  BB01000000        mov ebx,0x1 
        0F  B804000000        mov eax,0x4 
        14  CD80              int 0x80 
        16  B801000000        mov eax,0x1 
        1B  CD80              int 0x80

        1D  48 65 6C 6C 6F 2C 20 77 
        25  6F 72 6C 64 21 0A

Code would go into memory exactly as you had written it.

The book Linkers and Loaders by John Levine (available online in a beta version) is an excellent resource in this regard. It gives details about several of the most common file formats (though for better or worse, the book does not cover Mach-O, the Mach kernel format used by MacOS). It is a bit dated now, but since there haven't been any major changes (AFAIK) to either PE or ELF (used by Windows and Linux, respectively), there isn't much loss of information due to it's age.

Edited 4 Years Ago by Schol-R-LEA

Thanks guys I appreciate the help. I'm reading "Computer Archtecture The Hardware/Software Interface" and "Assembly Language Step by Step Programming With Linux" at the same time, my brain is overloaded.

Edited 4 Years Ago by brunoccs

This article has been dead for over six months. Start a new discussion instead.