I would just like to know how x86 ASM instructions are translated into binary. EG:
mov al,061h becomes
It is an agreement (or conversion).
If you ( d ) ecode or ( u ) nassemble that command in DEBUG in dos, you will see the op-codes [B0 61].
-a 100 13B3:0100 mov al, 61 13B3:0102 -u 100 13B3:0100 B061 MOV AL,61
Each of those convert to the binary you showed.
The op-codes closer represent the individual commands where B0 (or 10110100) represents the command "MOV AL" and the next byte is the value.
If you do the same for the other low registers, you will see the order:
13B3:0100 B061 MOV AL,61 13B3:0102 B161 MOV CL,61 13B3:0104 B261 MOV DL,61 13B3:0106 B361 MOV BL,61
Thanks, but how exactly do I do this? I am used to C++, but I do have MASM and CV for writing assembly when necessary. Is there any way I could create something like a batch command to convert an ASM program into binary representation?
Yes. If I understand the question correctly:
You will be creating (for all intents and purposes) a compiler.
You will need to either create (or use) a mapping of the commands to the opcodes OR cheat.
The commands are a direct one-to-one (mostly) mapping from the command to the op-code (or binary) just as I showed earlier.
You can also cheat by first compiling the code then read the bytes from the compiled executable and convert them to binary.
Here is a list of opcodes and their values:
If your intent is to make a compiler, then don't bother doing the cheat method.
VC++ 2010 and earlier versions will create assembly listings of your c and c++ programs. You have the option of having it include the op codes too.
So now, the bigger question is: What are you going to do with it after you convert it?
if this is merely for trivia, I would suggest doing it in a higher-level language.
For instance: Write a C++ program to convert a .com file to binary.
...and maybe convert the binary back to an executable .com file.
in addition to simply knowing what opcodes translate to what values, it is essential that you understand the so-called mode/reg/RM byte. there are quite a few addressing modes understood by the 8086 CPU. this byte is placed directly after the opcode byte if the opcode requires operand(s) that aren't immediate. (immediate, meaning given directly in the code's bytestream)
this is the general format of an operation's bytestream encoding, taken from a document i wrote when i began writing my PC emulator:
1. Prefix code(s) i.e. repetition, seg override, or lock (optional)
2. Opcode byte (required)
3. Mod-Reg-R/M addressing mode byte (optional)
4. Low disp/addr/data (optional)
5. High disp/addr/data (optional)
6. Low data (optional)
7. High data (optional)
prefixes include stuff like REP, segment overrides, and lock. i suppose it's best if i just link the document for you:
that should contain everything you need to know to convert to binary manually. but note there is an error on page 3. the part that reads:
"the two displacement bytes are treated as a SIGN-EXTENDED 8-bit signed value when addr. mode = 1"
there is actually only one displacement byte when addr mode = 1, not two.
i forgot to mention, there are also "group" opcodes where the actual operation is given in the register field of the mode/reg/RM byte. the 8086 bytecode format is very quirky.