Hey all,

This is my first time posting on GoExpert, so I'm excited to see what the community here has to offer! Anyway, I am currently working on creating a compiler (using Flex and Bison) in C for a subset of the Pascal programming language in my Compilers course... and to be honest - I'm a bit stuck!

I think I've made a lot of progress but, I'm getting stuck on this transition from Semantic Analysis to Code Generation... and what's worse is I'm not entirely sure I've gotten the semantic Analysis part all down!

I've created a Lex file (.l), a Bison/YACC file for my grammar (parser), a symbol table file that holds my insert() and lookup() methods for the symbol records, as well as a code generation file as well (although, there are some pieces of code in here that I'm unsure about - on how I translated the code ( ie. STORE/JUMP instructions).

Anyway, let me get to the specific question/request: I need some help modifying my .y Bison file to install, generate code, and back patch labels the different parts of the Pascal grammar. For Example:

Here's a small snippet of my grammar (starting symbol):

program:            program_header   block   PERIOD
;
program_header: PROGRAM IDENT SEMICOLON { install( $2 ); }
;


** where install( $2 ) installs the identifier (symbol name) into the symbol table. **
** where lowercase are non-terminals and UPPERCASE are terminals (Keywords in Pascal) **

I really have plenty of questions and would probably bog down this forum post if I went into any more random detail... if you have any clue about Compiler Design using Flex/Lex and Bison/YACC, I would REALLY appreciate your attention and help. I could and probably would need to go into more detail if I get some responses.

Just let me know what I need to elaborate on or to upload any files you need to view like: the mini-Pascal grammar, my lex file, my bison file, my symbol table, code generator, etc.

Thanks in advance,

-Krysis09

Edited 3 Years Ago by happygeek: fixed formatting

This is no GoExpert, yet we do "have a clue".
So, what is your question?

Heh, sorry for the slip up there... anyway, on to my question. I've been referencing Anthony A. Aaby's book: Compiler Construction using Flex and Bison at the following link: Aaby's Book

I am at the point in my compiler creation at which I need to modify my parser file (.y file) to both install and generate code/labels for the grammar (pg. 38 in Aaby's book) for the subset the Pascal language... but, I'm confused as to how to go about it. More specifically, I don't understand WHEN to "install" identifiers, generate code, or, as Aaby put it, "content_check" identifiers in the grammar. I tried starting on doing my calls to install and gen_code, but I doubt I'm doing it correctly...

Install, gen_code(), and context_check() are all methods described by Mr. Aaby which:
1. Place a symbol into the symbol table
2. Generate the code for a particular terminal symbol (I think!)
and
3. Check whether a particular symbol exists in the symbol table... (or something to that effect).

Here are the methods I've written (basically based off of Aaby's methods) to install, generate code, and check the context of symbols of the grammar:

install( char *sym_name )
{
        symrec *s;

        s = lookup( sym_name );
        if(s == 0)
                s = insert(sym_name);
        else
        {
                errors++;
                printf( "%s is already defined\n", sym_name );
        }
}

context_check( enum code_ops operation, char *sym_name )
{
        symrec *identifier;

        identifier = lookup( sym_name );

        if( identifier == 0 )
        {
                errors++;
                printf( "%s", sym_name );
                printf( "%s\n", " is an undeclared identifier." );
        }
        else
                gen_code( operation, identifer->offset );
}
/* Generates code at the current offset */
void gen_code( enum code_ops operation, int arg )
{
        code[code_offset].op = operation;
        code[code_offset++].arg = arg;
}

Where insert and lookup() are methods in my symbol table header file that inserts a new symbol and looks up a symbol name in the symbol table (respectively).

Attached is my .y file and the grammar for the subset of Pascal that I'll be creating the compiler for.

Hopefully, I've explained myself well enough.

EDIT: Symtab.h contains my symbol table, codeGen.h = code generator, stackMachine = contains instructions, pascalScanner = lex file, pascalParser = parser (Bison/YACC) file, pascalGrammar = actual Pascal grammar.

Edited 6 Years Ago by Krysis: Attaching files (lex, symbol Table, code Generator)

Attachments
/* Code Generation */

int data_offset = 0; 	// Initial offset starting at ZERO

/* Returns the address of the Reserved Location */
int data_location()
{
	return data_offset++;
}

int code_offset = 0;	// Start of the code segment

/* Returns the address of the Reserved Location */
int reserve_loc()
{
	return code_offset++;
}

/* Returns the value of the Code Offset */
int gen_label()
{
	return code_offset++;
}

/* Generates code at the current offset */
void gen_code( enum code_ops operation, int arg )
{
	code[code_offset].op = operation;
	code[code_offset++].arg = arg;
}

/* Generates code at some previously reserved address */
void back_patch( int addr, enum code_ops operation, int arg )
{
	code[addr].op = operation;
	code[addr].arg = arg;
}
MicroPascal Grammar:

        program   ->  program_header block "."
        program_header   ->  "program" identifier ";"
        block   ->   declarations compound_statement
        declarations   ->   declarations declaration | e
        declaration   ->   "var" variable_decls | procedure_decl
        variable_decls   ->   variable_decls variable_decl |
                              variable_decl
        variable_decl   ->   identifier_list ":" identifier ";"
        procedure_decl   ->   procedure_header block ";"
        procedure_header   ->   "procedure" identifier
                                "(" parameter_list ")" ";" |
                                "procedure" identifier ";"
        parameter_list   ->   parameter_group |
                                parameter_list ";" parameter_group
        parameter_group   ->   "var" identifier_list ":" identifier |
                                identifier_list ":" identifier
        compound_statement   ->  "begin" statement_list "end"
        statement_list   ->  statement | statement_list ";"
                                statement
        statement   ->  e // empty statement // |
                        identifier |
                        identifier "(" expression_list ")" |
                        identifier ":=" expression |
                        "while" expression "do" statement |
                        "repeat" statement_list "until" expression |
                        "if" expression "then" statement |
                        "if" expression "then" statement
                        "else" statement |
                        compound_statement
        expression_list   ->  expression | expression_list "," expression
        expression   ->  simple_expression |
                         simple_expression rel_op simple_expression
        simple_expression   ->  term | add_op term |
                                simple_expression add_op term
        term   ->  factor | term mul_op factor
        factor   ->  identifier |
                     constant |
                     "(" expression ")"
        constant   ->  string_constant | number
        number   ->  unsigned_integer |
                        floating_point
        identifier_list   ->  identifier |
                              identifier_list "," identifier
/*
Use the line:

%option yylineno

when compiling with flex

*/

%option yylineno

%{

#include "y.tab.h"
#include <math.h>
#include <string.h>		/* For strdup */
#include "micropc.tab.h"	/* For token definitions and yylval */

void fatal( char *, char * );
void error( char *, char * );

%}

letter	[a-zA-Z]
digit	[0-9]
singquo \'\'
string	\'[^'\n\r]*\'|\'[^'\n\r]*{singquo}[^'\n\r]*\'
comment "{".*"}"|"(*".*"*)"
ident	{letter}({letter}|{digit})*
uint	{digit}{digit}*
error	{uint}{ident}+
plus	\+
minus   \-
exp	[eE]({plus}|{minus})?{uint}
float	{uint}"."{uint}|{uint}"."{uint}{exp}
times   \*
divide  \/
equals  \=
assign	":="
comma	\,
colon	\:
smcol	\;
lftpar	\(
rgtpar	\)
lessth	\<
grthan	\>
lesseq	"<="
lesgrt	"<>"
grteq	">="
ws      [ \n\t\r]
period	"."

%%

[iI][fF]				return IF;
[tT][hH][eE][nN]			return THEN;
[bB][eE][gG][iI][nN]			return BEGIN1;
[eE][nN][dD]				return END;
[eE][lL][sS][eE]			return ELSE;
[dD][iI][vV]				return MULOP;
[pP][rR][oO][cC][eE][dD][uU][rR][eE]	return PROCED;
[vV][aA][rR]				return VAR;
[pP][rR][oO][gG][rR][aA][mM]		return PROGRAM;
[wW][hH][iI][lL][eE]			return WHILE;
[dD][oO]				return DO;
[rR][eE][pP][eE][aA][tT]		return REPEAT;
[uU][nN][tT][iI][lL]			return UNTIL;
[fF][uU][nN][cC][tT][iI][oO][nN]        |
function	return KEYWORD; 

{ident}		{ yylval.id = (char *) strdup(yytext);
		  return (IDENT); }
{singquo}	printf( "SINGLE QUOTE: '");
{string}        return STRCONST;
{error}		{ error ( "Illegal combination (INT and IDENT): ", yytext); }
{uint}		return UNSIGNED;
{exp}		return EXPONENT;
{float}		return FLOAT;
{minus}         |
{plus}		return ADDOP;
{divide}       	|
{times}		return MULOP;
{equals}	return EQUALS;
{lessth}	|
{grthan}	|
{lesseq}	|
{lesgrt}	|
{grteq}		return RELOP;
{lftpar}	return LFTPAR;
{rgtpar}	return RGTPAR;
{assign}	return ASSIGN;
{comma}		return COMMA;
{colon}		return COLON;
{smcol}		return SEMICOLON;
{comment}	;
{ws}		;
{period}	return PERIOD;
.		{ error( "Illegal input: ", yytext ); }

%%

int yywrap( void )
{
  return 1;
}
/*
int main( int argc, char **argv )
{
  int tok;

  if ( argc > 1 ) {

    yyin = fopen( argv[1], "r" );

    if ( yyin == NULL )
      fatal( argv[0], "cannot open input file" );
  }

  while( tok = yylex() ) {
    switch( tok ) {
    case KEYWORD:
      printf( "KEYWORD: %s\n", yytext );
      break;
    case IDENT:
      printf( "INDENTIFIER: %s\n", yytext );
      break;
    case UNSIGNED:
      printf( "UNSIGNED INT: %s (%d)\n", yytext, atoi(yytext) );
      break;
    case FLOAT:
      printf( "FLOATING POINT: %s (%g)\n", yytext, atof(yytext) );
      break;
    case RELOP:
      printf( "RELATIONAL OPERATION: %s\n", yytext );
      break;
    case PLUS:
      printf( "PLUS\n");
      break;
    case MINUS:
      printf( "MINUS\n");
      break;
    case TIMES:
      printf( "TIMES\n");
      break;
    case DIVIDE:
      printf( "DIVIDE\n");
      break;
    case EQUALS:
      printf( "EQUALS\n");
      break;
    case LFTPAR:
      printf( "LPAR\n");
      break;
    case RGTPAR:
      printf( "RPAR\n");
      break;
    case STRCONST:
      printf( "STRING CONSTANT: %s\n", yytext);
      break;
    case EXPONENT:
      printf( "EXPONENT: %s\n", yytext);
      break;
    default:
      printf( "Unknown token: %d\n", tok );
      break;
    } // switch
  } // while

return 0;
} // main
*/
/*
 * Declare tokens
 */
%start program
%token KEYWORD
%token <id> IDENT
%token <intval> UNSIGNED
%token FLOAT 
%token ADDOP  
%token MINUS   
%token MULOP   
%token DIVIDE  
%token EQUALS  
%token RELOP   
%token LFTPAR  
%token RGTPAR  
%token STRCONST 
%token EXPONENT 
%token ASSIGN  
%token COMMA   
%token COLON   
%token SEMICOLON 
%token PERIOD  
%token NESTED  
%token PROGRAM
%token VAR    
%token PROCED
%token BEGIN1
%token END    
%token <lbls> WHILE  
%token DO     
%token REPEAT 
%token UNTIL   
%token <lbls> IF      
%token THEN    
%token ELSE    
%token DIV

%expect 1
%{

#include <stdio.h> 		/* For error messages and I/O */
#include <stdlib.h>		/* For malloc in symbol table */
#include <string.h>		/* For strcmp in symbol table */
#include "symtab.h"		/* The Symbol Table Header File */
#include "stackMachine.h"	/* The Stack Machine Header File */
#include "codeGen.h"		/* The Code Generator Header File */

#define YYDEBUG 1 	/* For debugging */

void free(void *ptr);
void yyerror ( const char *msg );
int yylex( void );
int errors;			/* Error count-incremented in CodeGen */

struck lbs
{
	int for_goto;
	int for_jump;
};

/* Allocate space to hold the labels */
struct lbs *newlblrec()
{
	return (struct lbs*) malloc(sizeof(struct lbs));
}

install( char *sym_name )
{
	symrec *s;
	
	s = lookup( sym_name );
	if(s == 0)
		s = insert(sym_name);
	else
	{
		errors++;
		printf( "%s is already defined\n", sym_name );
	}
}

context_check( enum code_ops operation, char *sym_name )
{
	symrec *identifier;
	
	identifier = lookup( sym_name );

	if( identifier == 0 )
	{
		errors++;
		printf( "%s", sym_name );
		printf( "%s\n", " is an undeclared identifier." );
	}
	else
		gen_code( operation, identifer->offset );
}

extern int yylineno;

%}

%union semanticRecord	/* The Semantic Records */
{
	int intval;		/* For Integer Values */
	char *id;		/* For Identifiers */
	struct lbs *lbls	/* For label back-patching */
}
/*

	MicroPascal Grammar:

	program   ->  program_header block "."  
	program_header   ->  "program" identifier ";"  
	block   ->   declarations compound_statement  
	declarations   ->   declarations declaration | e  
	declaration   ->   "var" variable_decls | procedure_decl  
	variable_decls   ->   variable_decls variable_decl |  
			      variable_decl  
	variable_decl   ->   identifier_list ":" identifier ";"  
	procedure_decl   ->   procedure_header block ";"  
	procedure_header   ->   "procedure" identifier  
				"(" parameter_list ")" ";" |  
				"procedure" identifier ";"  
	parameter_list   ->   parameter_group |  
				parameter_list ";" parameter_group  
	parameter_group   ->   "var" identifier_list ":" identifier |  
				identifier_list ":" identifier  
	compound_statement   ->  "begin" statement_list "end"  
	statement_list   ->  statement | statement_list ";"  
				statement  
	statement   ->  e // empty statement // |  
			identifier |  
			identifier "(" expression_list ")" |  
			identifier ":=" expression |  
			"while" expression "do" statement |  
			"repeat" statement_list "until" expression |  
			"if" expression "then" statement |  
			"if" expression "then" statement  
			"else" statement |  
			compound_statement  
	expression_list   ->  expression | expression_list "," expression  
	expression   ->  simple_expression |  
			 simple_expression rel_op simple_expression  
	simple_expression   ->  term | add_op term |  
				simple_expression add_op term  
	term   ->  factor | term mul_op factor  
	factor   ->  identifier |  
		     constant |  
		     "(" expression ")"  
	constant   ->  string_constant | number  
	number   ->  unsigned_integer |  
			floating_point  
	identifier_list   ->  identifier |  
			      identifier_list "," identifier 

*/

%%
program:	program_header 
		block 
		PERIOD 		{ gen_code( HALT, 0 ); YYACCEPT; }
		;

program_header:	PROGRAM IDENT SEMICOLON { install( $2 ); }
		;

block:		declarations compound_stat
		;

declarations:	declarations declaration
		| 
		/* empty statement */
		;

declaration:	VAR variable_decls
		|
		procedure_decl
		;

variable_decls:	variable_decls variable_decl
		|
		variable_decl
		;

variable_decl:	identifier_lst COLON IDENT SEMICOLON 	{ context_check( PUSHA, $3 ); }
		;

procedure_decl:	procedure_head block SEMICOLON
		;

procedure_head:	PROCED IDENT LFTPAR parameter_list RGTPAR SEMICOLON { install( $1 ); }
		|
		PROCED IDENT SEMICOLON { install( $2 ); }
		;

parameter_list:	parameter_grp
		|
		parameter_list SEMICOLON parameter_grp
		;

parameter_grp:	VAR identifier_lst COLON IDENT { context_check( $4 ); }
		|
		identifier_lst COLON IDENT { context_check( $3 ); }
		;

compound_stat:	BEGIN1 statement_list END
		;

statement_list:	statement
		|
		statement_list SEMICOLON statement
		;

statement:	/* empty statement */
		|
		IDENT { context_check( $1 ); }
		|
		IDENT LFTPAR expression_lst RGTPAR { context_check( $1 ); }
		|
		IDENT ASSIGN expression { context_check( $1 ); }
		|
		WHILE expression DO statement
		|
		REPEAT statement_list UNTIL expression
		|
		IF expression THEN statement
		|
		IF expression THEN statement ELSE statement
		|
		compound_stat
		;

expression_lst:	expression
		|
		expression_lst COMMA expression
		;

expression:	simple_express
		|
		simple_express RELOP simple_express
		;

simple_express:	term
		|
		ADDOP term
		|
		simple_express ADDOP term
		;

term:		factor
		|
		term MULOP factor
		;

factor:		IDENT { context_check( $1 ); }
		|
		constant
		|
		LFTPAR expression RGTPAR
		;

constant:	STRCONST
		|
		number
		;

number:		UNSIGNED
		|
		FLOAT
		;

identifier_lst:	IDENT { context_check( $1 ); }
		|
		identifier_lst COMMA IDENT { context_check( $3 ); }
		;


%%
#ifdef YYDEBUG
extern int yydebug;
#endif
extern int yylineno;

main(int argc, char *argv[] )
{
	extern FILE *yyin;
	++argv; --argc;

	yyin = fopen( argv[0], "r" );
//	yydebug = 1;
	yyparse();
}
void yyerror( const char *msg )
{
	printf("Line %d: %s\n", yylineno, msg);
}
/* Stack Machine Header */

/* Code Operations: Representation in Machine */

enum code_ops { PUSHCI, PUSHI, POPI, PUSHGI, POPGI, FETCHI, POPII, PUSHA, PUSHGA, PUSHS, STOREI,
		ADDI, SUBI, MULI, DIVI, NEGI,
		EQI, NEI, LTI, LEI, GTI, GEI,
		JUMPZ, JUMPNZ, JUMP,
		ENTER, ALLOC,
		SETRVI, RETURN, RETURNF, CALL,
		INT, INTB, FLT, FLTB,
		HALT };

/* Code Operations: External Representation */

char *op_name[] = "pushcI", "pushI", "popI", "pushgI", "popgI", "fetchI", "popiI", "pusha", "pushga", "pushs", "storeI",
		  "addI", "subI", "mulI", "divI", "negI",
		  "eqI", "neI", "ltI", "leI" "gtI", "geI",
		  "jumpz", "jumpnz", "jump",
		  "enter", "alloc",
		  "setrvI", "return", "returnf", "call",
		  "int", "intb", "flt", "fltb",
		  "halt" };

struct instruction
{
	enum code_ops op;
	int arg;
};

/* Code Segement */

struct instruction code[999];

/* Run-time Data and Expression Stack */

int stack[999];

/* Special Purpose Registers */

int	pc = 0;		// Program Counter
struct 	instruction ir;	// Instruction Register
int	ar = 0;		// Frame Pointer (Activation Record)
int	top = 0;	// Stack Pointer (Top of Stack)

/* Fetch Execution Code Cycle */

void fetch_execute_cycle()
{
	do
	{
		/* Fetch the operation */
		ir = code[pc++];
	
		/* Execute the operation */
		switch (ir.op)
		{
			case HALT:
				printf( "halt\n" );
				break;
			case PUSHCI:
				top = top + 1;
				stack[top] = ir.arg;
				break;
			case PUSHI:
				top = top + 1;
				stack[top] = stack[ar + ir.arg];
				break;
			case POPI:
				stack[ar + ir.arg] = stack[top--];
				break;
			case PUSHGI:
				top++;
				stack[top] = stack[ar + ir.arg];
				break;
			case POPGI:
				stack[ar + ir.arg] = stack[top--];
				break;
			case FETCHI:
				stack[top] = stack[stack[top]];
				break;
			case POPII:
				stack[stack[top-1]]; = stack[top];
				top -= 2;
				break;
			case PUSHA:
				top++;
				stack[top] = ar + ir.arg;
				break;
			case PUSHGA:
				top++;
				stack[top] = ar + ir.arg;
				break;
			case PUSHS:
				top++;
				stack[top] = ir.arg;
				break;
			case STOREI:
				stack[stack[top-1]] = stack[top];
				stack[top-1] = stack[top];
				top--;
				break;
			case ADDI:
				stack[top-1] = stack[top-1] + stack[top];
				top--;
				break;
			case SUBI:
				stack[top-1] = stack[top-1] - stack[top];
				top--;
				break;
			case MULI:
				stack[top-1] = stack[top-1] * stack[top];
				top--;
				break;
			case DIVI:
				stack[top-1] = stack[top-1] / stack[top];
				top--;
				break;
			case NEGI:
				stack[top] = 0 - stack[top];
				break;
			case EQI:
				if( stack[top-1] == stack[top] )
					stack[--top] = 1;
				else
					stack[--top] = 0;
				break;
			case NEI:
				if( stack[top-1] != stack[top] )
					stack[--top] = 1;
				else
					stack[--top] = 0;
				break;
			case LTI:
				if( stack[top-1] < stack[top] )
					stack[--top] = 1;
				else
					stack[--top] = 0;
				break;
			case LEI:
				if( stack[top-1] <= stack[top] )
					stack[--top] = 1;
				else
					stack[--top] = 0;
				break;
			case GTI:
				if( stack[top-1] > stack[top] )
					stack[--top] = 1;
				else
					stack[--top] = 0;
				break;
			case GEI:
				if( stack[top-1] >= stack[top] )
					stack[--top] = 1;
				else
					stack[--top] = 0;
				break;
			case JUMPZ:
				if( stack[top--] == 0 )
					pc = ir.arg;
				break;
			case JUMPNZ:
				if( stack[top--] != 0 )
					pc = ir.arg;
				break;
			case JUMP:
				pc = ir.arg;
				break;
			case ENTER:
				stack[top+1] = &stack[ar]; // base(N)?
				stack[top+2] = ar;
				top += 3;
				break;
			case ALLOC:
				top = top + ir.arg;
				break;
			case SETRVI:
				stack[ar] = stack[top--];
				break;
			case RETURN:
				top = ar - 1;
				pc = stack[ar+2];
				ar = stack[ar+1];
				break;
			case RETURNF:
				top = ar;
				pc = stack[ar+2];
				ar = stack[ar+1];
				break;
			case CALL:
				ar = top - (ar+2);
				stack[ar+2] = pc;
				pc = ir.arg;
				break;
			case INT:
				stack[top] = ((int)stack[top]);
				break;
			case INTB:
				stack[top-1] = ((int)stack[top-1]);
				break;
			case FLT:
				stack[top] = ((long)stack[top]);
				break;
			case FLTB:
				stack[top-1] = ((long)stack[top-1]);
				break;
		}
	}
	while (ir.op != HALT);

} // end fetch_execute_cycle()
/**
 **    This example can be compiled with either C or C++ compiler
 **/

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <search.h>

/*   this is an Entry   */

struct symrec
{
  char *name;	/* Name of the symbol */
  int offset;	/* Data offset */
  int   val;	/* Value of symbol */
};
typedef struct symrec symrec;

symrec *root = NULL;

/* and its comparison function -- note the explicit casts */

int symrec_cmp( const void *e1, const void *e2)
{
  return strcmp( ((symrec*)e1)->name, ((symrec*)e2)->name );
}

void printEntry( const void *node, VISIT v, int level )
{
  symrec *e = (*(symrec **)node);

  if( v == postorder || v == leaf )
    printf( "Level: %d, %s: %d\n", level,
            e->name, e->val );
}
/*
void insert( char *key, int value )
{
	symrec *new_symrec;
	symrec **found;

	new_symrec = (symrec*) malloc( sizeof(symrec) );
	new_symrec->name = strdup( key );
	new_symrec->val = value;

	printf( "Inserting %s...\n", key);
	fflush( stdout );
	found = (symrec**) tsearch( new_symrec, (void**)&root, symrec_cmp);

	/* If already exists */
	if (*found != new_symrec )
		printf("symrec already exists.\n");
	else
		printf("Inserted %s.\n", key);
}
*/
symrec *insert( char *key )
{
        symrec *new_symrec;
        symrec **found;

        new_symrec = (symrec*) malloc( sizeof(symrec) );
        new_symrec->name = strdup( key );
	
	new_symrec->offset = data_location();

        printf( "Inserting %s...\n", key);
        fflush( stdout );
        found = (symrec**) tsearch( new_symrec, (void**)&root, symrec_cmp);

        /* If already exists */
        if (*found != new_symrec )
	{
                printf("symrec already exists.\n");
	}
	else
	{
            	printf("Inserted %s.\n", key);
		return found;
	}
}
symrec *lookup( char *key )
{
	symrec *new_symrec;
	symrec **found;

	new_symrec = (symrec*) malloc( sizeof(symrec) );
	new_symrec->name = strdup( key );

	found = (symrec**) tfind( new_symrec, (void**)&root, symrec_cmp );

	if ( found == NULL )
	{
		printf("Could not locate symrec.\n");
		return 0;
	}
	else
	{
		printf( "Found %s. Value: %d\n", (*found)->name, (*found)->val );
		return found;
	}
}
/*
int main ()
{
  symrec **found;
  symrec  *new_symrec;

  /* create a new symrec */
/*  new_symrec = (symrec*) malloc( sizeof(symrec) );
  new_symrec->name = strdup( "b" );
  new_symrec->val = 11;

  insert(new_symrec->name, new_symrec->val);

  /* repeat with a couple more entries */
/*  new_symrec = (symrec*) malloc( sizeof(symrec) );
  new_symrec->name = strdup( "a" );
  new_symrec->val = 12;

  insert(new_symrec->name, new_symrec->val); 

  new_symrec = (symrec*) malloc( sizeof(symrec) );
  new_symrec->name = strdup( "c" );
  new_symrec->val = 13;

  insert(new_symrec->name, new_symrec->val);

  /* now try to insert an already existing key */
/*  new_symrec = (symrec*) malloc( sizeof(symrec) );
  new_symrec->name = strdup( "a" );
  new_symrec->val = 14;
 
  insert(new_symrec->name, new_symrec->val);

  new_symrec = (symrec*) malloc( sizeof(symrec) );
  new_symrec->name = strdup( "d" );
  new_symrec->val = 141;

  insert(new_symrec->name, new_symrec->val);

  /* and now lookup an existing key */
/*  new_symrec = (symrec*) malloc( sizeof(symrec) );
  new_symrec->name = strdup( "b" );
  printf( "Looking up \"b\"..." );
  fflush( stdout );

  lookup(new_symrec->name);

  /* and now lookup an non-existing key */
/*  new_symrec = (symrec*) malloc( sizeof(symrec) );
  new_symrec->name = strdup( "xxx" );
  printf( "Looking up \"xxx\"..." );
  fflush( stdout );
  found = (symrec**) tfind( new_symrec, (void**)&root, symrec_cmp );
  if ( found == NULL )
    printf("not there\n");
  else
    printf( "found it: %s val %d\n", (*found)->name, (*found)->val );

  twalk( root, printEntry );

  return 0;
}
*/

I only had a brief look at the grammar file.

One thing that strikes immediately is that you call context_check in the variable_decl rule. This is definitely wrong. The variable declaration introduces a new identifier, so you want to install it into a symbol table, rather than make sure it exists.

Another is that you do not need to (and in fact, cannot) context_check on expressions (as you do in the 2nd and 3rd choices in the statement rule. You are correctly checking at the IDENT choice, and that should be enough.

Regarding gen_code, in my opinion, it does not belong to the context_check. A check is a check, no more, no less. Code should be generated later, when you have a sensible piece of code (such as a statement) parsed into an AST. But, at the state the project is right now, I wouldn't worry about code generation at all.

Regarding gen_code, in my opinion, it does not belong to the context_check. A check is a check, no more, no less. Code should be generated later, when you have a sensible piece of code (such as a statement) parsed into an AST. But, at the state the project is right now, I wouldn't worry about code generation at all.

Hmm, I see.

Soo, I made the suggested corrections to the grammar (removed the unnecessary context_checks), but when I came to this part of your reply... I was confused. What do you mean "gen_code does not belong to the context_check"?

Can you clarify as to what direction I need to move towards to get my grammar looking more "sensible"? Other than removing the context_checks which were placed in error, I really didn't know where to progress from there.

Thanks.

Hmm, I see.

Soo, I made the suggested corrections to the grammar (removed the unnecessary context_checks), but when I came to this part of your reply... I was confused. What do you mean "gen_code does not belong to the context_check"?

You have the context_check function which calls gen_code(). In my opinion this is wrong.

context_check( enum code_ops operation, char *sym_name )
{
        symrec *identifier;
        identifier = lookup( sym_name );
        if( identifier == 0 )  {
            ...
        } else
                gen_code( operation, identifer->offset );
}

The said opinion is in part due to the functionality of context_check, which only validates the existence of the symbol. Really checking the context is much more involved (see below).

Can you clarify as to what direction I need to move towards to get my grammar looking more "sensible"? Other than removing the context_checks which were placed in error, I really didn't know where to progress from there.

The "sensible" related to the Pascal code being parsed. Speaking of direction, I'd concentrate efforts first on a symbol table. Right now you only store symbol names there. You need much more of course.
First of all, symbol types. I am not sure if your grammar supports any type system, but at least you have to somehow tell apart variables from procedures.
Second, local variables. Currently your symbol table is flat, which means that every symbol belongs to a global scope. Each procedure should have a private symbol table, and a proper name lookup discipline should be devised.

That will give you a more or less solid validator.

In parallel, you must create a bunch of testcases (the more the better). Think of the most incomprehensible yet valid constructs; think of most subtle syntax errors. When you have your validator to correctly parse valid programs, and correctly flags erroneous ones, then you may start worry of code generation.

Thanks.

Always welcome.

Hey Krysis

I have a similar project for one of my courses now with a changes grammar. WIll you be able to upload the whole project files so that i can use it to do my project.

Thanks in advance

If you check the dates, you would see that this thread is four years old, and that Krysis has not been on Daniweb since then.

Please avoid thread necromancy in the future, thank you.

Edited 1 Year Ago by Schol-R-LEA

This article has been dead for over six months. Start a new discussion instead.