| | |
Assembly, machine code and compilers
![]() |
•
•
Join Date: Jun 2006
Posts: 10
Reputation:
Solved Threads: 0
Hi,
I'm trying to construct a compiler for a really simple stack based language to learn how my computer really works, and to learn how to create a compiler. So far, I have never done anything in assembly (only higher level languages). Could you give me advice about the following things:
- If I create this compiler, is it easier/better to compile to assembly, and then let an assembler translate that to machine code, or compile directly to machine code?
- In the assembly tutorials I skimmed so far, there seems to be only one stack. There probably need to be two stacks: one callstack and one stack for data. How can this be done?
- Where do I find information about machine code? I'm looking for the mapping between instructions and binary code. So information that says: 10101001 is the code for push on stack, for
example. (but this isn't necessary if the compiler will compile to assembly)
- This language will only deal with numbers, not strings or other data structures for now. But I do need a way to output (print) the number on top of the stack, this is os-specific, I assume?
- How does an assembler create a windows .exe? Does it have to add extra information and instructions besides the assembled code? Can my compiler create such an exe file, or is it better to use a separate program for that?
- Are there any good online resources on the subject? If there are, where do I find them?
Thanks for your help & time,
Jules
I'm trying to construct a compiler for a really simple stack based language to learn how my computer really works, and to learn how to create a compiler. So far, I have never done anything in assembly (only higher level languages). Could you give me advice about the following things:
- If I create this compiler, is it easier/better to compile to assembly, and then let an assembler translate that to machine code, or compile directly to machine code?
- In the assembly tutorials I skimmed so far, there seems to be only one stack. There probably need to be two stacks: one callstack and one stack for data. How can this be done?
- Where do I find information about machine code? I'm looking for the mapping between instructions and binary code. So information that says: 10101001 is the code for push on stack, for
example. (but this isn't necessary if the compiler will compile to assembly)
- This language will only deal with numbers, not strings or other data structures for now. But I do need a way to output (print) the number on top of the stack, this is os-specific, I assume?
- How does an assembler create a windows .exe? Does it have to add extra information and instructions besides the assembled code? Can my compiler create such an exe file, or is it better to use a separate program for that?
- Are there any good online resources on the subject? If there are, where do I find them?
Thanks for your help & time,
Jules
Last edited by julesjacobs; Jun 18th, 2006 at 1:56 pm.
•
•
•
•
- If I create this compiler, is it easier/better to compile to assembly, and then let an assembler translate that to machine code, or compile directly to machine code?
•
•
•
•
- In the assembly tutorials I skimmed so far, there seems to be only one stack. There probably need to be two stacks: one callstack and one stack for data. How can this be done?
For a stack based language, a call stack and a data stack is a good idea, but that's a layer above the actual assembly level. You might want to look at the Forth language for ideas.
•
•
•
•
- Where do I find information about machine code? I'm looking for the mapping between instructions and binary code. So information that says: 10101001 is the code for push on stack, for
example. (but this isn't necessary if the compiler will compile to assembly)
•
•
•
•
- This language will only deal with numbers, not strings or other data structures for now. But I do need a way to output (print) the number on top of the stack, this is os-specific, I assume?
•
•
•
•
- Are there any good online resources on the subject? If there are, where do I find them?
I'm here to prove you wrong.
•
•
Join Date: Jun 2006
Posts: 10
Reputation:
Solved Threads: 0
Thanks for the quick response!
Is there a 1-to-1 relationship between assembly and machine code? (so it is pretty easy to translate once you've parsed it?) I think I'm going to compile to assembly first, and after that directly to machine code. Does an assembler add extra things to the machine code it generates, or is the generated machine code executable without modifications? Is it possible to save it in a .exe, and execute it under windows?
Do you mean that there is only one "real" stack, but it looks like two? How can mix the calls stack and data stack? If someone evaluates a procedure that pops values off the stack, how can I be sure it doesn't pop return addresses?
Thanks, I've heard of Forth, but I don't know the language. I'll certainly look for interesting things in Forth.
Thanks!
Tama is an interpreted language? Looks good, but it's very minimal as you say (no control structures so far?). It would be turing complete if you add first class named blocks, though.
I already have an interpreter (written in Ruby, +- 100 loc):
The 'proc syntax to quote procedures isn't nice, better would be {pop} {swap pop} ifel, (so inline first-class procs) but this requires more parsing, and I'll do that later.
It's turing complete, you can use recursive procedures to create loops, but this doesn't work well because the language doesn't have proper tail calls yet. This is hard to implement in Ruby, but it's trivial with a compiler (just pop the callstack *before* calling the last procedure in a procedure).
•
•
•
•
Originally Posted by Narue
It's easier to delegate the nitty gritty machine code stuff to an assembler. If you're writing a simple compiler then that's the way you want to go. If you're writing a commercial quality compiler then it's probably better to build the machine code yourself. However, if this is a learning project for how the computer works, you can learn a great deal more by building the output files manually.
•
•
•
•
Originally Posted by Narue
There's no reason to use two stacks when one will do. The "callstack" and "stack for data" are one in the same, and while it's easy to corrupt the stack in assembly by not respecting this duality, a compiler abstracts that problem away for high level programmers.
•
•
•
•
Originally Posted by Narue
For a stack based language, a call stack and a data stack is a good idea, but that's a layer above the actual assembly level. You might want to look at the Forth language for ideas.
•
•
•
•
Originally Posted by Narue
Look for manuals about the processor you're writing the compiler for. For example, you can find opcode information for the x86 architecture at www.intel.com.
•
•
•
•
Originally Posted by Narue
None that I know of. Though you can search these forums for "Tama" and find a very early draft of a stack based scripting language. How much that will help, I have no idea, but it might be useful.
I already have an interpreter (written in Ruby, +- 100 loc):
Assembly Syntax (Toggle Plain Text)
Simple Stack Based Programming Language > swap-pop: swap pop ok > choose: 2 top 'pop 'swap-pop ifel ok > false 1 2 choose . 2 > true 1 2 choose . 1 > double: 2 * ok > square: dup * ok > f: 'double 'square ifel ok > 3 true f . 6 > 3 false f . 9 > 2 3 4 > stack 4, 3, 2 > clear > stack >
The 'proc syntax to quote procedures isn't nice, better would be {pop} {swap pop} ifel, (so inline first-class procs) but this requires more parsing, and I'll do that later.
It's turing complete, you can use recursive procedures to create loops, but this doesn't work well because the language doesn't have proper tail calls yet. This is hard to implement in Ruby, but it's trivial with a compiler (just pop the callstack *before* calling the last procedure in a procedure).
>Is there a 1-to-1 relationship between assembly and machine code?
It depends on whom you ask. Traditional assembly was indeed 1-to-1. Assembly was defined as human readable machine code. However, with the advent of macro facilities, the 1-to-1 source code correspondance with machine code went away. So "technically", when an assembler encodes the machine code, 1-to-1 is there because the macro parser has already run, but to an assembly programmer, 1-to-1 is not there unless you stay strictly low level and don't take full advantage of the assembler's feature set.
>Does an assembler add extra things to the machine code it generates,
>or is the generated machine code executable without modifications?
It depends on how you assemble the machine code. If you assemble to an object file, it's not executable without a linking step to resolve library calls. If you compile to an executable file, then yes, it is executable immediately after the assembly process.
>Is it possible to save it in a .exe, and execute it under windows?
Yes, and if you have an assembler that writes directly to a PE, or if you do it yourself with your compiler, the output file can be immediately executed. There's no magic here; an executable file is just a file format that Windows recognizes as executable. If you know how the format is structured (and the specification is available online), you can create it and produce a .exe file that actually works.
>Do you mean that there is only one "real" stack, but it looks like two?
At the assembly level, there's only one stack, and it only looks like one.
You can create as many stacks as you want, but only one is directly supported by the architecture (referring to x86 assembly).
>If someone evaluates a procedure that pops values off the stack, how
>can I be sure it doesn't pop return addresses?
Don't give them absolute control over the stack. Since you're not writing an assembly language, you can abstract away their ability to directly control the stack. This gives you plently of wiggle room in forcing them to never encounter that situation.
Alternatively, rather than letting them use your low level stack, you provide a separate stack data structure that they can't easily corrupt.
It depends on whom you ask. Traditional assembly was indeed 1-to-1. Assembly was defined as human readable machine code. However, with the advent of macro facilities, the 1-to-1 source code correspondance with machine code went away. So "technically", when an assembler encodes the machine code, 1-to-1 is there because the macro parser has already run, but to an assembly programmer, 1-to-1 is not there unless you stay strictly low level and don't take full advantage of the assembler's feature set.
>Does an assembler add extra things to the machine code it generates,
>or is the generated machine code executable without modifications?
It depends on how you assemble the machine code. If you assemble to an object file, it's not executable without a linking step to resolve library calls. If you compile to an executable file, then yes, it is executable immediately after the assembly process.
>Is it possible to save it in a .exe, and execute it under windows?
Yes, and if you have an assembler that writes directly to a PE, or if you do it yourself with your compiler, the output file can be immediately executed. There's no magic here; an executable file is just a file format that Windows recognizes as executable. If you know how the format is structured (and the specification is available online), you can create it and produce a .exe file that actually works.
>Do you mean that there is only one "real" stack, but it looks like two?
At the assembly level, there's only one stack, and it only looks like one.
You can create as many stacks as you want, but only one is directly supported by the architecture (referring to x86 assembly).>If someone evaluates a procedure that pops values off the stack, how
>can I be sure it doesn't pop return addresses?
Don't give them absolute control over the stack. Since you're not writing an assembly language, you can abstract away their ability to directly control the stack. This gives you plently of wiggle room in forcing them to never encounter that situation.
Alternatively, rather than letting them use your low level stack, you provide a separate stack data structure that they can't easily corrupt.
I'm here to prove you wrong.
•
•
Join Date: Jun 2006
Posts: 10
Reputation:
Solved Threads: 0
•
•
•
•
Originally Posted by Narue
Don't give them absolute control over the stack. Since you're not writing an assembly language, you can abstract away their ability to directly control the stack. This gives you plently of wiggle room in forcing them to never encounter that situation.
Alternatively, rather than letting them use your low level stack, you provide a separate stack data structure that they can't easily corrupt.
If this code will be compiled:
Assembly Syntax (Toggle Plain Text)
b: 3 +
(define a function b, which pushes 3 on the stack, and calls +)
It could be something like this:
Assembly Syntax (Toggle Plain Text)
label b push(3) push(:b1) ; push a return location goto(:+) label b1 pop ; pop :b1 goto(pop)
This wil not work, of course. pop will not pop :b1, but the result of +. Is there a simple way to avoid this? The compiler cannot know how many values a procedure will pop, because you can define things like:
Assembly Syntax (Toggle Plain Text)
c: if pop = 1 then pop else pop pop end
I have no idea how to solve this :o
Mark the address of the return value. If the stack pointer ever drops to that address, any further user pops are an error. But this is what I meant with a separate user stack. That way you can easily check whether there would be underflow and act accordingly without risking your own call stack.
I'm here to prove you wrong.
•
•
Join Date: May 2006
Posts: 19
Reputation:
Solved Threads: 0
Narue did a good job with the answers. On thing I wish to mention is that assemblers are processor specific and generally similar for all the products of one processor manufacturer. Thus you would find many similarities among intel processor assembly languages and the same would go for the assemblers of all motorola products. Each assembler has it's own traits because each processor is different. Among intel assemblers you may find some commands available on only a few processors because these features are not available on most of the processors. In general, however, you would find a large selection of commands that were available on all of their products. For assembly language information you want the processor manufacturers web site. There are a number of intel in disquise processors. Many smaller processor manufacturers use intel assemblers, which leads me to believe they are actually intel.
![]() |
Similar Threads
- writing a a program in machine code (Assembly)
- Windows "Machine Code" (IT Professionals' Lounge)
Other Threads in the Assembly Forum
- Previous Thread: Newbie Confused!
- Next Thread: Procedures
| Thread Tools | Search this Thread |






