hi
i'm writting a program by lex and yacc. i want to add some invariants in some special of an input file. my expected output is the same input file while some invaraints are place through statements.how can i do it? i can transfor the input file to output in lex but the order of invaraints and statements is not right . i want the invaraint placed before the statement.any suggestion?

I have used Lex and Yacc and hate them! Part of the problem is that writing parsing rules is not trivial, and the other part is that they generate in-line code that cannot be easily modified - any changes are wiped out the next time you generate the code. My preference, after having written a number of parsers over the years, is to utilize an optimizing finite-state-machine that can use a text-based description of the event-state transitions to build a parser. Once the basic functions are defined, you can just adjust the description to change the behavior of the state machine - no recompilation required. I have used this to build display processors, ftp and other TCP/IP protocols, and others. I have found that this approach is better, faster, and more efficient than the Lex+Yacc approach.

That said, this approach is not without a lot of up-front work, but downstream, it provides a lot of leverage. The first thing is that you REALLY need to understand how finite-state-machines work.

Part of the problem is that writing parsing rules is not trivial

I think that is the crux of the matter. lex and yacc are fine tools for writing a parser/generator. The nature of languages and their compilation is that they are largely static don't often change.

any changes are wiped out the next time you generate the code

It is also entirely expected to have inputs yield new outputs on every invocation - compilers would be less useful if that were not the case.

That being said, it sounds like the problem being described is that the inserted code is placed after the statement in question instead of before. You can fix this by changing where in the generator the new code is emitted. Without an example from you this is hard to describe specifically but you can start to understand your code flow (as far as lex and yacc are concerned) by inserting debugging statements to help you follow the code path.

thanks all,
my invariants are a statement that should placed before a special statements of c code. for example , when we reach <memcpy(d,s,n)> i want to add a statement like (require n>0 ) before this statement , and these statements(the input and added invaritant) should be output to a file. my input is a c source code.

*it sounds like the problem being described is that the inserted code is placed after the statement in question instead of before. You can fix this by changing where in the generator the new code is emitted. *

yes , it's my problem . but how can i fix ? may i modify the tab.y.c file???it's hard...how i can change the yacc file to reach my goal?

It is hard to describe your problem without seeing an example of your code. Language parsing is non-trivial and there are many things that you may need to do to get the results you want.

With that being said, here is a simplified example of what I mean.

File: example.l

%{
#include <stdio.h>
#include "y.tab.h"
%}
%%
special return KEYWORD;
%%

This simply returns the token KEYWORD when the text special is found in the input stream.

File: example.y

%{
#include <stdio.h>
void yyerror(const char *str) {}
int yywrap () { return 1; }
int main () { return yyparse (); }
%}
%token KEYWORD
%%
words:
    | words special words
    ;
special:
    { printf ("<before>"); } KEYWORD { printf ("<after>"); }
    ;

This file just accepts a stream of input and when the special text is found text is output prior to evaluating the token and then again after the token is evaluated.

My best guess is that you have some code inserted where the <after> appears in my example when you actually want it placed where the <before> appears. This may or may not map to the problem you are experiencing.

Edited 2 Years Ago by L7Sqr

that's an intersting solution but unfortunately this is not work for me, because this way is good for terminal expresion and for non-terminal doesn't work. in the example special is a terminal then its work .

If it is not a terminal you can do this on the parsing side (as opposed to the generating side). For example:

%{
#include <stdio.h>
#include "y.tab.h"
%}
%%
\n      { printf ("\n"); }
[\t ]+  { printf (" "); }
memcpy  { printf ("[inserted]\n"); return TOKEN; }
[a-z]+  { return TOKEN; }
%%

Will insert the text [inserted] prior to every occurance of memcpy in the source. The TOKEN is still sent to the generator so that normal processing may take place.

thanks for your responce.
but i think it's not a good solution too, because the string thant i want to insert is not constant, for example the string is contained of second arguman of the function...
i actualy confused , i don't know the solution.

After reading your posts again I'm not sure what you want to acheive is directly applicable at the source level. For example, you mentioned wanting to do the following:

when we reach <memcpy(d,s,n)> i want to add a statement like (require n>0 ) before this statement

What happens when you need to handle: memcpy (d, s, get_size ()) What are you going to insert? require get_size () > 0? If so, what if get_size has side-effects?

It might be better if you try and add elements to the generated code instead of at the source level.

Again, I can not help too much more without some example of what you are doing. Maybe you add another rule to the grammar; maybe you can support what you want from the existing grammar and haven't looked at it just right; maybe what you are trying to do requires tools beyond what you are using.

Hard to tell currently.

Dear L7Sqr
you're right. i'm working on source code static anlysis, and the limitation like you metioned is in the area.(beacause they need some information on runtime )

what do you mean about elements?

my grammar is ANCI C grammar (i get it from net) and my prgram shoul get a c code and add some invariants for special library function ( as i said previousely before memcpy, it shoud add the mentioned precondtion)
thanks

Why not use something like llvm? That will allow you to know what each argument is; reorder it if necessary; and add code where you like during lexical analysis.

A benefit of this approach is that clang (built on top of llvm) has static analysis tools built in so you wont have to design everything from scratch.

llvm
clang
A decent starting point

but much time taken i write the program, i generate the invariants for several functions,if i want to change my tool ,it's time consuming.

Well, you are going to need to make a decision. You need to decide to

  1. continue using a tool maybe not well suited to your goals
  2. change to a tool designed to do what you want

I would suggest you not let sunk costs influence your decision. You may have invested time in lex and yacc but if they are misaligned with your goals then you will continue to carry their baggage if you stick with them. Use the knowledge you have gained (that is the real product of your effort) and apply that to a more applicable set of tools.

Edited 2 Years Ago by L7Sqr

you're helped me much, thanks so much for your advices. i realy have no time but i think your suggestion and try to decide reasonable.
with the best wishes.

This article has been dead for over six months. Start a new discussion instead.