Hi all ive been thinking of a writing a compiler in java for my final year project. I have started to browse the net for help but to be honest its confused rather than helping me to get a clear image of what I would like to do.
Ok.. let me tell you what i mean by writing a cmpiler in java.. i mean to write a compiler in the JAVA Language to compile other languages like C, C++, C#, Python etc. (eihter of them.)
So I was just wondering what the necessary steps may be and where should I start. I would be very grateful if you could recommend some good reading material
Moreover what if i need the same compiler as a cloud application. Meaning that the compiler would originally be running on my server and users may access it usin a simple web browser

Thanks in advance.

Recommended Answers

All 12 Replies

So you've taken a whole bunch of buzzwords and just strung them together without knowing what they mean, and now want us to tell you what you've thought up?
Good start for your project...

If you Google "writing a compiler" you will find lots and lots of tutorial material that will tell you what you need.
It shouldn't be difficult to extend it to run on a server from a web browser.

First off, as JamesCherrill states, there is a vast amount of tutorials and other resources covering the subject of compiler development around, and while much of it is of questionable value, it should be possible to find something you can use to begin your work. This forum reference, while dated, is a good place to start. Also, the OS-Dev Wiki group is currently getting ready to fork off a compiler dev wiki, and while it is only in the nascent stages, the pages they do have up may prove helpful (if perhaps a bit discouraging). Two (very different) resources I recommend are Crenshaw's "Let's Build a Compiler" series (which never did get completed, but still gives a good starting point), and Ghuloum's "An Incremental Approach to Compiler Construction", which gives a very detailed explanation of how to write a Scheme compiler (in Scheme, but it can be applied to any implementation language, as demonstrated here).

That having been said, you are probably not really aware of the scope of the project you are preparing to undertake. Compiler design is demanding of both programming skill and theoretical knowledge, and if you haven't taken a formal course on the subject (or at least an online course, you are likely to miss important factors on how to do it most effectively. At the very least, you should read at least one textbook on the subject, preferably cover to cover. There are at least two available online: Basics of Compiler Design by Mogensen, and Compilers and Compiler Generators. However, most of the seriousd texts, such as the Dragon Book, are quite expensive. Amazon has an extensive selection to choose from.

For a different perspective, you could also try Lisp In Small Pieces, which covers both compilers and interpreters for the Lisp family of languages (primarily Scheme). While most of the techniques really only apply to Lisp, there is a lot one can learn from it.

I wouldn't worry about putting on the cloud, however; the compiler itself will be a big enough task, and then some. Besides, if you are targetting the JVM bytecode as your final output, making it work through a web page will be the least difficult part of the project.

My final piece of advice is not to be too ambitious. Writing a professional-grade compiler is the work of years on the part of many programmers. Don't expect to support more than a small sub-set of the language you choose to implement in the time you have allotted for it. You'll need to have a through understanding of both the source language, the implementation language (which from what you said would be Java) and the target language, before you begin your design.

Hmm... I will put it in short then. To implement a compiler, simple things you need are...

  • 1)knowledge of the language you are writing,
  • 2)knowledge of the input language,
  • 3)knowledge of the output to be used (i.e. computer architecture, hardware, language, etc), and
  • 4)knowledge of how to convert from input language to output (language)
  • 5)knowledge of optimization (optional)

If you are not an expert in any of the first 4, your work goes down the drain because the compiled result may not work or contains bugs and/or vulnerabilities. No one would want to compile their codes using your compiler if it is not correctly compiled, let alone the speed (optimization). This task is too big for you to chew.

To let you know about the time to even implement a simple compiler without bugs (i.e. from a course), it would take months of coding and intensive testing. Not recommended...

I hate to say it, but Taywin is right. Your typical Compiler Development course runs a full semester and only gets as far as implementing the most basic language functionality (assignment, simple expressions, conditionals, and maybe definite loops), while targetting a simplified hardware emulator (that's about where the course I took left off, in fact; you can see the final version of my 'baby' Algol compiler in my GitHub repo). A graduate followup course would be usually be needed just have time to implement indefinite loops and function calls, with POD data structures as an extra credit option most won't get to. I don't want to discourage you, but even a toy compiler is a major undertaking.

I might add that if you really intend to press on with this, you'll want to know at least two things first: the basics of Finite State Automata, as they are used in the lexical analysis phase of compiling, and how to use a version control system such as Subversion, Mercurial, Git, or Bazaar, as you will code yourself into a corner at least once during the project.

Let me be as blunt as I can about this last point: if you don't have any account on a repository such as Sourceforge, Heroku, or GitHub, get one right away and use it. Writing any non-trivial software without off-site revision control is just plain dumb, especially when most repos offer free basic accounts. It may seem like an unneeded extra step, but it will save your butt at least once. Which one you use is a matter of personal preferences (I personally use Git for the most part, as you may have noticed), but like with indent styles, what matters most is that you you use one of them, and that you're consistent about which one you use.

OP, are you talking about creating something like IDEOne which is an "online compiler" and allows executing code remotely (or in the "cloud" as you put it)? If yes, that should be a doable project wherein you provide a web interface to your users for putting in code and submitting the code executes it in a sandboxed environment. A challenging project for sure but definitely easier than "creating" compiler in Java for all the languages...

That's an excellent point; we may have been reading more into the OP's words than was intended.

m4mukulgarg: Is your plan to make an existing compiler accessible through a web page (a la IDEOne), or is developing a new compiler from scratch your main goal? While neither is a small project, they are orders of magnitude apart in difficulty.

The way I understood it, he wants to create a compiler of his own, which may not be something you want to put a (near) deadline for one person.

yes, he's got no clue what he's trying to do. He's just taken a bunch of buzzwords, strung them together, and now expects us to magically make it all work so he can turn it in as his own work.

jwenting: I agree that it certainly looks like that, and for all I know you may be right (the part suggesting it compile more than one language is particularly damning), but I prefer to give the benefit of the doubt for cases like this. I'd rather hear from the OP to see what they have to say for themselves before jumping to that conclusion.

Even if you are right, I'd rather treat it as an opportunity to teach something useful to someone rather than simply smack them down for their ignorance. Seeing how the OP hasn't replied to anything said so far, it is likely a moot point anyway; I am well aware, just as you are, that the majority of first-time posters never return. That doesn't mean we shouldn't make an effort to help those who eventually do.

(I'll also admit a bit of personal interest in this anyway, as language design and translation is one of my main interests in the field. I have a very ambitious ongoing project of my own in this area, though I have deliberately kept the core language minimal precisely to make it feasible to complete in a few years' time. That, plus the choice of Lisp dialects for both the implementation language (R6RS Scheme) and the source language (Thelema) - and even the target language (my Assiah x86 assembler) - mean I have some real shot at finishing it in a reasonable time frame. Even so, a 'reasonable time frame' is on the order of a year for the assembler and at least another year for the basic compiler, followed by maybe two more years for the library.)

Schol-R-LEA: even though I admit that JWenting's posts aren't poster examples for "positive critic", he does have a point.

one should never choose something as a final year project, if you don't know at the very least what it is and how to start. it's setting up for problems, since you won't be able to make good estimates (both in time and effort) and probably won't reach either your goal (what to develop) or your deadline.

as you said yourself, a compiler for multiple languages, even a compiler for a single language if all the details have to be worked out, is a serious project that might provide several developers work for years. This isn't something one person (whether very experienced or not) should dedicate him to deliver this within less than one year, especially if he knows upfront he won't be able to work fulltime on the project.

When trying to start a project without realizing what it contains, sometimes a (rather rude) wake up call will be more effective then twenty people saying: "it 'll be difficult, but maybe you can try this", even though the latter is a more approach attempt to state he should either minimize the scope or change projects.

sigh Yes, that is true. Looking back on the thread, I have to say I let my enthusiasm for the subject get the better of me. I should have laid out more clearly how impractical the idea was, especially when the OP was considering implementing more than one source language.

In the end, JWenting ans Stultuske are right: the OP is way over his head with this. Even a scaled down toy language would be hard if not impossible to complete in the time given, especially if you don't have a clear idea already of how to do it. As I said earlier, even a two-semester course sequence rarely gets past the most basic compiler techniques, and trying to write a full compiler - by which I mean one that implement more than 80% of a language's definition, never mind a commercial-grade implementation - without the necessary background is setting yourself up for failure.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.