Hi everybody,

Is it possible to write a c++ compiler in c++ ? Its bootstrapping, right ??
In which language are usually written compilers ?
Thanks

Compilers can be written in pretty much any language - I've written one in Python, for example - though C and C++ are probably the most commonly used ones. There are some specialized tools specifically for writing parsers and lexical analyzers - the best known being LEX and YACC, and their various descendants (flex, bison, ANTLR, etc.) - but you can write a compiler entirely in C++ if you choose to.

As for 'bootstrapping' (also called 'self-hosting'), that would be a a bit more specifically when a compiler is used to compile itself. This is often done as a test of the correctness (or at least consistency) of a compiler's output: you compile the compiler in your existing compiler, then compile it again using the compiler you just wrote, then compile the compiler again using the second compiler's output, and if the resulting object code is identical to that produced in the second step, then you can safely say that it is producing stable object code. It is not a proof that there are no bugs in the compiler, but it does help eliminate a lot of them along the way. This technique goes back at least to 1962, and is still widely used, being popularized by the 'Small C Compiler' article in the 1970s; the GCC compiler build scripts even have an option to do this automatically.

In addition to the links I give above, a paper which might be of interest to you is 'An Incremental Approach to Compiler Construction'. It uses Scheme as a self-hosting language, but also explains how to take advantage of that language's regularities to avoid many of the details that would bog down most compiler projects.

Edited 4 Years Ago by Schol-R-LEA: n/a

>>Is it possible to write a c++ compiler in c++ ?

Well, the steering committee of GCC believe so. I don't know of any compiler that is entire written in C++, but it is certainly possible. The only problem is that compilers need to be rock solid, meaning that you cannot just throw away decades of robust C code just to satisfy some C++ zealots. But, some compiler writers like GCC are now allowing C++ code to be added to the existing C code, and I think there is also talk of creating a separate development branch to port the whole thing to C++ (at least, make all the C code compliant to C++98, as opposed to ANSI C). Other than that, I would imagine some compiler vendors use C++ at least for some parts of their compiler's code (e.g. the front-end), but I don't know of any that is known to be entirely in C++, and that really doesn't matter much.

>>Its bootstrapping, right ??

Sort of. You can already boot-strap GCC. When I compile the latest GCC version from source (which I do every month or so), I compile it with the GCC compiler I already have. The build script that GCC uses will compile then entire code of the C compiler 3 times: first with any C compiler already installed on the computer, then once with the newly compiled GCC C compiler, and then again with the newest compiled GCC C compiler. That is boot-strapping. Once you have the final compiled C compiler, then it uses that compiler to compile all the other compilers from GCC (gfortran, g++, the Java stuff, Ada, Objective C, etc.). If you use source code for the compiler that uses different languages, then you have to include those compilers in the bootstrapping sequence too. The idea of this process is that if there are any bugs with your current compiler, it might introduce small bugs in the compiled compiler, so the idea is that recompiling again and again is going eliminate any residual errors. Additionally, the compiler writers typically use an old language or old standard to make sure that the compilers that will compile the source will be well tested and robust (e.g. GCC is now considering using C++98, in part because by now most compilers do a almost flawless job at compiling C++98 code (at least the more pedantic dialects of it)).

>>In which language are usually written compilers ?

For the most part, C is used to write compilers, as it is to write most low-level, close-to-the-metal code. The main reason for that, I would say, is that the people with the competence necessary to write this type of stuff are generally more comfortable in C, and prefer that. Then, other languages will be mixed in with the C code. I think that GCC (and probably Intel compilers) have parts of their compilers written in Fortran. You are also pretty much guaranteed to find assembly code (inline or in listings) in the source files of any compiler. And, as mentioned before, more and more, I would expect to find some use of C++ as well in some of the higher-level parts of the code.

When you talk of other higher-level languages like C# or Java, or scripting languages like Python, Perl or others, they usually have a core run-time environment or compiler or virtual machine that is usually written either in C or in C++, and then a large amount of standard libraries that are written mostly in the high-level language in question (or are simple bindings to C/C++ implementations).

This article has been dead for over six months. Start a new discussion instead.