Hello everyone, I am currently taking up a project of making a Virtual Machine. I was wondering if anyone has any tips on making Bytecode interpreters, and Virtual machines.
I am not really basing it on anything, I am just making a VM with 640k memory and about 8 32 bit registers. I sort of have my (somewhat) instruction set, and I was wondering if it makes sense. I would post it at the moment but it is on my other computer that is out of access because I am on a trip. I am wondering if there is a way to possibly edit C++ ints in binary, like you can with Hex and Octal(Obviously Decimal also). Or do I need to use AND and check the bits (e.g. 10111001(185) AND 10010101(149) = 10010001(145) and then

if((185 & 149)==145){
        //Do code "145"
}

)

No, there isn't any form of numeric literals for binary numbers in C++. Using the bitwise operators as you've shown is probably your best bet, though using hex would probably make more sense than decimal.

if((0xb9 & 0x95) == 0x91){
        //Do code "0x91"
}

The reason why hex makes sense for this is because 16 is an even power of 2, hence the digit places 'line up' evenly with the bits. For example, 10 hex is equal to 00010000 binary. Decimal values do not 'line up' with binary values in this manner, making it harder to see the patterns in the bit values.

If it helps, you might want to take a look at some existing virtual machines, such as the classic P-code Machine, which is remarkably simple (though written in Pascal, which may be unfamiliar to you). My own Pmac virtual machine might also prove helpful.

Why 640K memory, BTW? How are you handling the addressing fields?

Edited 4 Years Ago by Schol-R-LEA: n/a

Yeah I would use hex, just for people who don't understand it the best I used decimal. The 640k memory is because I am not the "best" with dynamic memory... Haven't actually got to using it. I'll post more detailed later.

Sorry for the detached post, I was on my phone.

I am starting to think I am designing it wrong.

Some areas of code are as follows:

void RespondEM(string FirstWord, string Statement){
    stringstream SS(Statement);
    if (FirstWord == "print"){
        Statement.erase(0, 6);
        VM.printstring(Statement);
        cout << "\n";
    }

    if(FirstWord == "preg"){
        string *PH = new string[2];
        SS >> PH[0] >> PH[1];
        if(PH[1] == "a"||PH[1] == "A"){
            VM.print(&VM.A);
        }
        else if(PH[1] == "b"||PH[1] == "B"){
            VM.print(&VM.B);
        }
        else if(PH[1] == "c"||PH[1] == "C"){
            VM.print(&VM.C);
        }
        else if(PH[1] == "d"||PH[1] == "D"){
            VM.print(&VM.D);
        }
        else if(PH[1] == "e"||PH[1] == "E"){
            VM.print(&VM.E);
        }
        else if(PH[1] == "f"||PH[1] == "F"){
            VM.print(&VM.F);
        }
        else if(PH[1] == "g"||PH[1] == "G"){
            VM.print(&VM.G);
        }
        else if(PH[1] == "h"||PH[1] == "H"){
            VM.print(&VM.H);
        }
        else{
            cout << "Register not found. \n";
        }
        delete[]PH;
    }

    else if(FirstWord == "ldrg"){
        string *PHs = new string[2];
        int *PHi = new int;
        SS >> PHs[0] >> PHs[1] >> *PHi;
        if(PHs[1] == "a"||PHs[1] == "A"){
            VM.move(&VM.A, PHi);
        }
        else if(PHs[1] == "b"||PHs[1] == "B"){
            VM.move(&VM.B, PHi);
        }
        else if(PHs[1] == "c"||PHs[1] == "C"){
            VM.move(&VM.C, PHi);
        }
        else if(PHs[1] == "d"||PHs[1] == "D"){
            VM.move(&VM.D, PHi);
        }
        else if(PHs[1] == "e"||PHs[1] == "E"){
            VM.move(&VM.E, PHi);
        }
        else if(PHs[1] == "f"||PHs[1] == "F"){
            VM.move(&VM.F, PHi);
        }
        else if(PHs[1] == "g"||PHs[1] == "G"){
            VM.move(&VM.G, PHi);
        }
        else if(PHs[1] == "h"||PHs[1] == "H"){
            VM.move(&VM.H, PHi);
        }
        else{
            cout << "Register not found. \n";
        }
        delete PHi;
        delete[] PHs;
    }
    else if(FirstWord == ""){}
    else{
        cerr << "Command not found: " << FirstWord << "\n";
    }
}

This is code for my "beginning" assembly language, which I am planning to implement fully then make machine code, or should I be doing the opposite... I really do not know where to start as your post is probably one of the 1-10 things on the internet(exaggerated) that help me in what I need.

I am just thinking I am not going in the right direction, even if I am getting the right results. Like should I be making functions returning data-types filled with the instructions that are then turned into bytecode, and certain things like that are confusing me.

Also some other design issues bother me, like when I start reading files as programs, should they be in string format with
"LDRG A, 5
PREG A"

Or some binary format. Or both? I don't have much experience with binary format(which I have been meaning to catch up on...) but do have experience with string format.

Just another question, is it possible that If I make an enum with the registers in it, should the first register (A) be 41 so I can compare to to ASCII(I think) with if, or should I maybe switch to cases?

While I have never seen a virtual machine which interpreted the assembly code directly, I'm sure it would work and I expect it has been done before. A bytecode would be much easier to interpret, but would require some sort of additional software - an assembler or a compiler for a higher-level language - to make programming it practical. Conversely, you wouldn't need a separate translator with the assembly approach, but it would take more work to interpret - you would, in effect, be combining the assembler with the interpreter.

By the way, I don't see any code for handling labels, traditionally the most important - or at least trickiest - part of assembling, such that the vast majority of assemblers have a separate pass just to collect the relative locations of the labels. What did you mean to do about backwards references? Or did you intend to handle all the jump calculations manually?

I would definitely say that a switch() is going to be easier for a lot of the code, at least with regards to the opcodes themselves.

Edited 4 Years Ago by Schol-R-LEA: n/a

Ohh that was code from (beginning of)Day 1(And also the end because I am on Vacation) of the VM... Load Register and Print Register, my VM class has a lot of stuff that is hidden as member functions, but I haven't implemented jump system.

class Machine {
    public:
        int A;
        int B;
        int C;
        int D;
        int E;
        int F;
        int G;
        int H;
        int Memory[168340];
        void move(int *A, int *B);
        void add(int *A, int *B);
        void sub(int *A, int *B);
        void mul(int *A, int *B);
        void div(int *A, int *B);
        void input(int *A);
        void print(int *A);
        void printstring(string A);
};

This is kinda the first thing I wrote, made to handle all the prints, IO and memory. This, is the Machine as it is the "Box" and "Monitor" of the VM. So far it resembles Pythons IDLE with $ instead of >>> because I wanted a way to test the instructions quickly and without Programming in a (approx)1/1000 finished programming language. As for the Bytecode, I am PLANNING on making bytecode, but doing it string-by-string might make it unique, in a way. Like if I don't do the bytecode soon, I might just write it as an addon DLL(or something similar), depends, as I am concerned about performance.

So far the VM consists of a DLL(and static library that loads it), a static library, and your normal console project. Like I said that class is the Box and Monitor (keyboard in a way also), and every time the user enters Input to the console, it calls 2 functions, 1 in the console project(Implemented), one in the static library(PLANNED), and one in the DLL(Implemented). The one in the console project normally will contain the instructions for stuff that involves Machine class(Directly, dealing with the main instance of Machine). The DLL contains stuff like the Safe Exit calls, and things that are generic and don't need to use the Machine class (Little as possible). The Static library is kinda like the in between of the 2.

Edited 4 Years Ago by BCBTP: n/a

My ultimate goal of this project, is to create a virtual machine that can be easily adapted to other programming languages/platforms(like Java for operating systems that have Java but I have no access to, and also try and make implementations on phones, at least android, and even have a Market, like the app, only for VM programs, and make them all run the exact same, phone or pc). So, it cannot be long(well, within 500,000 lines of code at Max) and cannot use system calls. I want to first get a Windows-GCC build going, then I'll hand the source around to my friends with Mac, and I can do some of the linux builds myself. Like I said, I am being over ambitious, but I think I am ready for it(or hope at least).

Tall order, but I'd be the last person to tell you it can't be done, so go for it!

like Java for operating systems that have Java but I have no access to)

This sentence confused me a bit. Are you saying you don't have Java on your system, or did you mean you don't have access to the other operating systems? It isn't very clear.

As for not using system calls, I assume you mean that it can't call the system directly; you most certainly do use system calls for the standard iostream operations, they're just hidden behind an abstraction layer.

You might want to look at the source code for Mono, the open source implementation of Microsoft's .NET environment, VM, and languages/compilers. Also, you would learn a lot by inspecting the Java VM source code as well as languages such as Python.

Thanks for all the responses, and @Schoil-R-LEA I mean I will have an implementation of the VM on Java, because even the systems I don't have will have Java, sorta like Python and Jython(the exact same idea almost, but not quite so). As for the other things, I don't know where to get sources, that is not my specialty. I will look.

I have more questions but they are to long to say on my phone. I'll post when on my laptop.

I really like the nice responses everyone has been responding with, and I was wondering whether design choices like using IF or SWITCH/CASE and stuff like that effect it, as I see in Schoil-R-LEA's Pmac machine you use the SWITCH/CASE method which makes me wonder if you were using it because of personal reasons and usability or for some reasons along the time of execution. Great machine though, it really helps me understand it, as some of the larger projects are more harder to understand, even though they give somewhat an advantage. I have been able to check out the mono source, and it looks complected, but it gives me ideas on how the File structure should look.

I used switch() for the main instruction system mainly because it was convenient and because it reflected the overall structure of the operation somewhat better - it is clear that the switch() operation was a single grouping of related operations, whereas separate if() statements, even when grouped together, are not necessarily so connected.

In truth, I was never seriously considering using ad hoc if() statements, though I had considered using a jump table; I decided against that mainly because most compilers will convert a switch() into a jump table anyway, and also because my bytecode is fairly sparse, meaning that there would be a lot of no-op cases in such a table.

As for efficiency, that depends on the particular compiler - though as I said, most compilers will turn a switch() into a jump table, (at least when all of the cases are completely filled), which is quite a time-efficient way of matching a value to an operation.

If I were implementing a virtual instruction set and needed to squeeze all the performance I could out of it, I would probably do some bit-masking tests to isolate the different groups of instructions - most instruction sets, real or virtual, are designed so that related groups of instructions have some bit-pattern in common with them, as can be seen in the MIPS processor family, a cleanly-designed real world CPU family - then use either switches or explicit jump tables to find the specific instructions.

Now that I mention it, you might find it quite enlightening to study the MIPS instruction set, as it is the classic RISC architecture, and widely studied in Computer Systems courses. It is a much easier assembly language than the x86 instruction set, yet is for a real-world CPU (even if it is mostly used for game systems and cellphones these days). There are a number of MIPS simulators for Windows, the best known probably being SPIM (though it isn't the best, it is fairly easy to use) - which gives you another virtual machine example to study, albeit a complex one.

This article has been dead for over six months. Start a new discussion instead.