Hello! So recently I have written a lexer/tokenizer and am working on a parser for a language with a friend. At this point everything is going great but I'm beginning to think about how I want to do code generation. Are there languages or systems that generate for example C code, compiles it then executes? If I were to do this how should I set this up? What are other ways code generation is done? Right now I'm thinking about generated C or maybe even D code and compiling. Any ideas or other methods? Thanks!

I don't get what you mean by "code generation"? Please can you be a bit more specific about this. Are you talking more of say a drop down menue saying "Switch statement" and it'll generate a switch statement? Or.. A GUI where you drop buttons and it generats the code?

Oh no as I'm creating my own tiny scripting language with a friend, we have a lexer and big parts of the parser complete I just want to know what the simplest and quickest way would be to run/create the final program. I was thinking to generate C or C++ code and have the machine compile it but that just doesn't seem like a good idea. I was just curious what the easist or best way is to develop a backend to a compiler, or a pre-existing one.

Forgive me if I'm wrong.. But, doesn't "scripting language" imply that the code is therefore interpreted and not compiled? So generating C/C++ code from the inputted scripting language to then compile using C/C++ seems to defeat the object here.

I guess the scripting language/lexer is written in C++? If so, shouldn't therefore the lexer decide how the code is interpreted?

I have no idea how it is written as I haven't done anything like this in many, many years and my experience then was lacking a little; but, I'm sure someone else could probably offer a better solution to this problem.

I suppose I used scripting languuage incorrectly in this case. Direct execution is my backup plan if I can't get some sort of compilation working. If I were to execute code directly that would partially be part of the parser.

There are two common options for the code generation. Most compilers have an "intermediate language" that is kind of like C but less human-readable (i.e., a basic procedural language, but more compact). Other compilers simply generate C code and then compile it (for example, the Comeau C++ compiler just generates C code, and another example is Matlab which can also "compile" matlab code into C code and then compile it).

Writing a good back-end is super difficult (i.e., that's where all the crazy optimizations and static analysis goes on). So, you would probably be much better off using an existing back-end, like GCC, LLVM, or any C compiler. Generating C code is probably the easiest thing, but if you want tighter integration with existing back-ends, you might want to use their intermediate languages (and I believe LLVM and GCC have the same intermediate language).

Comments
Good advice.

Thanks for the info, I have researched these things, I have found limited resources on LLVM and none on GCC. What I'm mainly curious about is how the things you listed above that generate C code have that compiled? Do they use a compiler pre-installed on the host machine? Or do they have some sort of tiny compiler shipped with the software they use? I have been looking into this, is that something I should be using for compiling C?

And my lexer and parser are written in the D language, but it can interface to C/C++.

What I'm mainly curious about is how the things you listed above that generate C code have that compiled?

They would generally rely on the system's C compiler (if there is one). For instance, matlab's real-time workshop (RTW) just generates a bunch of C source files and a bunch of makefiles (and I think some TCL scripts), and then it just "makes" that project, which automatically detects whatever C compiler is installed on the system and uses that.

This is pretty much the same as what you would do to create any library. You need build scripts that are capable of detecting the installed compiler (and dependencies) and use that to compile all the source. You just do exactly the same, but for your generated code.

I have been looking into this, is that something I should be using for compiling C?

I don't see the point. If someone is going to install your special "New Language" compiler, wouldn't he be able to install any of the standard C compilers (GCC, Clang, MSVC, etc.)? In fact, he most probably would already have it installed.

Another possibility is to distribute your compiler as an add-on to GCC or LLVM (which already has many).

But, to be honest, I would worry about distribution later.

Thanks for the info, I have researched these things, I have found limited resources on LLVM and none on GCC.

In GCC, the intermediate language is called the Register Transfer Language (link to full specifications).
In LLVM, it is called the intermediate representation (link to full specifications).

Well, I just comment here to tell my immediate guess about making scripting language.

First of all know what software this lexer/tokenizer will be run into, e.g in browsers or you own made browser, which has an access to a specific and flexible built-in function to control some stuffs there. And for example on specific platform like Windows, all I can think of make your script to generate a specific existing sript that platform will understand.

Well it's just an idea. Just sharing :)

This article has been dead for over six months. Start a new discussion instead.