hi

i hope you to help me .
my teacher in introduction to programming concept course ask me to make a small realy small compiler
that i did not study before ..
i don't have any idea about compiler never
if you can give me tutorial about it ..plz
i don't know where the starting point

Recommended Answers

All 29 Replies

I think you need to go back to your teacher and get some details. Asking you to create a compiler, regardless of size, in an intro to programming course is a little excessive.

we have holiday ..
so i cant asking ..

.. i just need the starting .. :(

.. i just need the starting ..

Compiler development is quite a can of worms to be opening when you're in an intro class (and presumably very green as far as programming goes). That's why I think you're misunderstanding the assignment and need clarification from the teacher.

But if you're really interested, I'd start with BNF for defining the grammar of your simple language. And you will need to define your own language to keep this compiler simple enough, because pretty much any posted grammar out there will have an eye more toward functionality than simplicity and be harder to parse[1].

Once you have a grammar you can write code to break a sequence of input characters (the source code) into tokens for subsequent parsing. Tokenization isn't especially difficult unless you have a ton of ambiguity in the grammar, which as a beginner you'll probably have a hard time avoiding at first.

Parsing is when the tokens are given semantic meaning, and this is where the simplest grammar possible makes your life easier. Ambiguities in the grammar will make parsing exponentially harder. Unless the grammar is strictly linear[2], you'll want to look at recursive descent parsing to evaluate it. I'd strongly recommend that your first language be linear though, so you can get a feel for the parts of a compiler without making any of them especially difficult.

A compiler is technically just a program that converts one language into another. Often the destination language is machine code, but you can output C++ as well and call it good. Alternatively you can write an interpreter that will directly execute your source code, which is very easy with a linear language.


[1] Unless you want to write a compiler for Brainfuck, or some other esoteric minimalist language.
[2] By linear I mean assembly-like in that there are no nested scopes, functions, loops, recursion, and such. Just straight top to bottom statements and maybe simple jumps (a gimped version of C++'s goto).

I doubt very much that your teacher is really expecting you to write a "real" compiler in the sense that is produces machine code, because that requires much more than a theoretical idea of the inner workings of a compiler, it requires to know very low-level details about how to set up application loading and entry points and things like that. I'm guessing he is expecting you to right a kind of virtual machine that interprets and executes a simple program.

You should start incrementally. I would suggest you restrict the domain of your language. For example, if you restrict the domain to simple numerical computing, then you can basically just write a glorified calculator.

I would suggest you start by writing a simple calculator that can parse and compute mathematical expressions (a = b + 4 * c) or something like that. Then add mathematical functions (sin, cos, pow, etc.). I would probably recommend that you use implicit types, this way you can assume all types are floating-point values (and round them if they ought to be integers).

When that is done, you can add things like arrays and loops. if-statements shouldn't be so hard either.

You should also make your "standard library" as simple as possible, just a few functions like "getUserInput(a)" to fill the variable "a" with some user input and similarly a "print(b)" to print the value of b to the console.

Your "compiler" should probably just parse / tokenize the code and then translate it to some simple byte codes (like a simple assembly language, but more high-level a bit). And then have your virtual machine execute that. If you can get that done, I don't see how your prof could refuse you the highest mark possible.

I doubt very much that your teacher is really expecting you to write a "real" compiler in the sense that is produces machine code, because that requires much more than a theoretical idea of the inner workings of a compiler, it requires to know very low-level details about how to set up application loading and entry points and things like that. I'm guessing he is expecting you to right a kind of virtual machine that interprets and executes a simple program.


.

she is know about that .
she give us a difficult question ..
she say ." i know you didn't study it befor ,so study know to do your combiler by yourself "

i realy didn't have any idea about this ..and i spend a lot of time to learn something .and search in the internet ,. but i fail to find something helpfull .:(
if you know haw i can write in c++ ..
she give just idea . waht should our compiler do .like :
a=a+b
or combiler to solve linear equation
or define loops

Ok. It seems to me that your teacher wants you to think about how to parse some well-formed statements into tokens and a structure that allows you to further evaluate them. There are several key steps you need to implement:

1. Reading the input and breaking it into separate pieces (called tokens or symbols).
2. Organizing the symbols into some sort of structure (such as a tree or list) that makes evaluation of them possible.
3. Evaluating the structure and symbols it contains to obtain a result.

So, the thing is to start you thinking about data structures and algorithms. As the title of Niklaus Wirth's (the inventor of the Pascal and Modula languages) seminal CS text book states "Algorithms + Data Structures = Programs". :-)

No doubt, the easiest language to write a compiler for is Brainf*ck. Writing an interpreter for this language is probably gonna require less lines of code that the "Hello World" program in Brainf*ck.

No doubt, the easiest language to write a compiler for is Brainf*ck. Writing an interpreter for this language is probably gonna require less lines of code that the "Hello World" program in Brainf*ck.

ROFLMAO! I think it should be renamed to "Brain Fart"... :-)

int lex() {
	getchar();
	switch (charclass)
		/* parse identifiers and reserved words */
	case LETTER:
		addchar();
		getchar();
		while (charClass == LETTER
			charClass == DIGIT) {
		addchar();
		getchar();
		}
		return lookup(lexeme);
		break;
		/* parse integer literals */
	case DIGIT:
		addchar();
		getchar();
		while (charClass == DIGIT) {
		addchar();
		getchar();
		}
		return INT_LIT;
		break;
} /* End of switch */
} /* end of function lex */

can any one explain this for me ..
how i can run it in c++ compiler ....

No doubt, the easiest language to write a compiler for is Brainf*ck. Writing an interpreter for this language is probably gonna require less lines of code that the "Hello World" program in Brainf*ck.

That's pretty strange. Then again so is this.

int lex() {
	getchar();
	switch (charclass)
		/* parse identifiers and reserved words */
	case LETTER:
		addchar();
		getchar();
		while (charClass == LETTER
			charClass == DIGIT) {
		addchar();
		getchar();
		}
		return lookup(lexeme);
		break;
		/* parse integer literals */
	case DIGIT:
		addchar();
		getchar();
		while (charClass == DIGIT) {
		addchar();
		getchar();
		}
		return INT_LIT;
		break;
} /* End of switch */
} /* end of function lex */

this exaple that me teacher give us ..
but i didn't understand ..
how i can run it in c++ compiler

^^
your welcome ..

plz how is understand the abpve code ..

Either that code is _very_ incomplete, or it is not valid C++ code (i.e. just a guideline example that doesn't really work or compile, but just gives an idea of the process).

what should i start with as a begaining..

You should start by defining the language that you will be "compiling".

You need to define the types that you will use.

Then, define the operators (+ - = * / etc.).

And then, define some standard functions like print and scan.

I would suggest you use a simple subset of C.

Then, start by writing a parser that will look for the different tokens (type names, operator symbols, standard functions names, etc.). Try to identify the tokens and see how well it does by taking a simple code in your programming language and printing out the list of tokens you find it in, with their identification as a variable, operation, function call, etc.

then....

i will use c++ language ..
my idea is make a compiler for a simple calculater ..


i will back

>>i will use c++ language ..

stick to a very simplified version of C. Don't even dream of using the C++ language as the language that your "compiler" will interpret (C++ is essentially the hardest major programming language to parse, very few good parsers for it exists and most C++ compilers are based the few that work).

Go with the simple calculator idea.

see
i search and i found the first stage is lexical analyser
2-syntactic analyzer
3-semantic analyzer

see

exp:      NUM             { $$ = $1;         }
        | exp  '+'     { $$ = $1 + $2;    }
        | exp  '-'     { $$ = $1 - $2;    }
        | exp  '*'     { $$ = $1 * $2;    }
        | exp  '/'     { $$ = $1 / $2;    }

laxem and token like this :

 laxems         tokens
  $$              idenifier
   =               assign_op
   $1               idenifier
    +                 Plus_op
     *                 Multi_op
     /                 division_op
      -                 sub_op
   $2                  idenifier

i think i lost very much
oooh my goad compiler realy a difficult thing ..

oooh my goad compiler realy a difficult thing ..

Compiler development is second only to operating system development as far as difficulty goes, in my opinion.

Compiler development is second only to operating system development as far as difficulty goes, in my opinion.

I concur.

I do not. Making a good compiler is very difficult indeed, but I can name few tasks much more difficult.

commented: Like answering question with the details give in many of these threads. +16

is there any ne who can make a small -realy small- compiler
like
a=b+c
just for sum operatoer
to understand how i can work to define my operators
because i didn't study this course at all ..

You are starting to get a handle on what the process of "compilation" consists. How complex each of these steps are, depends on the language. Some, such as a simple calculator, will be very simple, and not too much code. Others, such as languages like C++, are infinitely more complex. Fortunately, you don't need to write one of those! :-)

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<ctype.h>
#define BSIZE 128
#define NONE -1
#define EOS '\0'
#define NUM 256
#define DIV 257
#define MOD 258
#define ID 259
#define DONE 260
int lineno=1;
 int tokenval=NONE;
 int lookahead;


struct entry{
	char *lexptr;
	int token;
};

struct entry symtable[];

 
error(m)
char *m;
{
	fprintf(stderr, "line %d: %s\n", lineno, m);
	exit(1);
}
 
#define STRMAX 999
#define SYMMAX 100
char lexemes[STRMAX];
int lastchar = -1;
struct entry symtable[SYMMAX];
int lastentry = 0;
//lookupÏÇáÉ
int lookup(s)
char s[];
{
	int p;
	for (p = lastentry; p > 0; p =p-1)
		if(strcmp(symtable[p].lexptr, s) == 0)
			return p;
		return 0;
}



 

	int insert(s, tok)
char s[];
int tok;
{
	int len;
	len = strlen(s);
	if(lastentry + 1 >= SYMMAX)
		error("symbol table full");
	if(lastchar + len + 1 >= STRMAX)
		error("lexemes array full");
	lastentry = lastentry +1;
	symtable[lastentry].token = tok;
	symtable[lastentry].lexptr = &lexemes[lastchar + 1];
	lastchar = lastchar + len +1;
	strcpy(symtable[lastentry].lexptr, s);
	return lastentry;
}

 lexer 
char lexbuf[BSIZE];
 

int lexan()
{
	int t;
	while(1){
		t = getchar();
		if(t==' '|| t=='\t')
			;
		else if(t=='\n')
			lineno= lineno +1;
		else if(isdigit(t)){
			ungetc(t,stdin);
			scanf("%d",&tokenval);
			return NUM;
		}
		else if (isalpha(t)) {
			int p,b =0;
			while(isalnum(t)) {
				lexbuf[b] = t;
				t=getchar();
				b=b+1;
				if( b>= BSIZE)
					error("complier error");
			}
			lexbuf[b] = EOS;
			if (t != EOF)
				ungetc(t,stdin);
			p = lookup(lexbuf);
			if (p==0)
				p= insert(lexbuf, ID);
			tokenval = p;
			return symtable[p].token;
		}
		else if (t==EOF)
			return DONE;
		else {
			tokenval= NONE;
			return t;
		}
	}
}
 
int lookahead; 
match(t)
int t;
{
	if (lookahead ==t)
		lookahead = lexan();
	else error("syntax error");
}


 
emit(t, tval)
int t ,tval;
{
	switch(t) {
	case '+': case '-': case '*': case'/':
		printf("%c\n",t); break;
	case DIV:
		printf("DIV\n"); break;
	case MOD:
		printf("MOD\n"); break;
	case NUM:
		printf("%d\n", tval); break;
	case ID:
		printf("%s\n", symtable[tval].lexptr); break;
	default:
		printf("token %d, tokenval %d\n", t, tval);
	}
}
//factor 

int expr();
factor()
{
	switch(lookahead) {
	case')':
		match(')');
		expr();
		match(')');
		break;
	case NUM:
		emit(NUM, tokenval); 
		match(NUM);
		break;
	case ID:
		emit(ID,tokenval);
		match(ID); 
		break;
	default:
		error("syntax error");

	}
}
//ÏÇáÉterm
int term()
{
	int t;
	factor();
	while(1)
		switch(lookahead) {
case'*' : case'/' : case DIV : case MOD:
	t= lookahead;
	match(lookahead);  factor();  emit(t,NONE);
	continue;
default:
	return 0;


	}
}

 
int expr()
{
	int t;
	term();
	while(1)
		switch (lookahead) {
case '+': case'-':
	t =lookahead;
	match(lookahead); term(); emit(t,NONE);
	continue;
	default:
	return 0;

	}
}

// parse
parse()
{
	lookahead= lexan();
	while (lookahead != DONE) {
		expr();  match(';');
	}
}


//init 

struct entry keywords[] = {
	"div",DIV,
		"mod",MOD,
	
	0,    0
};
init()
{
	struct entry *p;
	for(p = keywords; p->token; p++)
		insert(p->lexptr, p->token);
}
main()
{
	init();
	parse();
	exit(1);
}

i make and my groupwork this code
what is the error ?

Well, some indicator of either compilation or linkage or runtime errors would be appropriate, before you ask us to analyze a 200+ lines of code program.

Also, change the function signatures to use ANSI instead of early K&R standards. IE, instead of this:

int insert(s, tok)
char s[];
int tok;
{
.
.
.
}

int expr()
{
.
.
.
}

use this:

int insert(char s[], int tok)
{
.
.
.
}

int expr(void)
{
.
.
.
}

Compilers: Principles, Techniques, and Tools by Andrew S. Tanenbaum

Compilers: Principles, Techniques, and Tools by Andrew S. Tanenbaum

That's a good one. Another is Holub's "Compiler Design in C".

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.