954,492 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Buidling AST with Bison/Flex

Hi friends,
I need to make a simple parser for parsing PHP files (classes, functions et al) and I have read both Flex and Bison manual. I have read somewhere that I must build Abstract Syntax Tree (AST) and utilize that. But I cannot find any tutorial on how to make AST with Bison

Please help me with a link or explanations.
Thanks!

evstevemd
Senior Poster
3,713 posts since Jun 2007
Reputation Points: 462
Solved Threads: 392
 

Define a generic node type; somewhat akin to

typedef struct node_s {
    enum node_type_t node_type;
    void * node_specific_data;
    node_s ** children;
} node_t;


You may need other fields as well.

Define "node constructors" for each language entity (terminals and nonterminals alike).

In the semantic action for each rule, call a relevant constructor, assign it to $$ , allocate the children array, and fill it up with pointers to child nodes, available as $1, $2 etc.

That's pretty much it. As soon as you have a grammar, this process is almost mechanical.

PS: tutorial

nezachem
Posting Shark
903 posts since Dec 2009
Reputation Points: 719
Solved Threads: 194
 

thanks for reply nezachem.
I'm very newbee (actually just starting) with code analysis so take my words with grain of salt! that being said, let say I want to get a class from PHP file and all its Childrens (members as well as attributes) and thereby build a tree I can access. How do I go about?
I have read that tutorial and it is funny to learn how Lexical analyzer works but to be frank, Bison presented me with nightmares (have a lot of complex stuffs but I'm stll trying to grasp). Should I write a new grammar file or just use this one from zend ?
NB: I want to write a source code browser

evstevemd
Senior Poster
3,713 posts since Jun 2007
Reputation Points: 462
Solved Threads: 392
 

Thanks for tutorial too, It was one of first things to read on the field and it did help me to understand some stuffs. I will check again!

evstevemd
Senior Poster
3,713 posts since Jun 2007
Reputation Points: 462
Solved Threads: 392
 
thanks for reply nezachem. I'm very newbee (actually just starting) with code analysis so take my words with grain of salt! that being said, let say I want to get a class from PHP file and all its Childrens (members as well as attributes) and thereby build a tree I can access. How do I go about?

Feed the PHP code you want to analyze to yyparse(). That would build the tree (unfortunately, you'd be stuck with a global variable to build the tree from; that's a vary annoying shortcoming of bison).

Sorry if I misunderstood the question. Should I write a new grammar file or just use this one from zend ?

I do not know how authoritative is Zend (I am pretty far from the PHP world). If you trust it to reliably represent the grammar, then, by all means, use it. Of course, all the zend_* functions shall be tailored for your needs.

nezachem
Posting Shark
903 posts since Dec 2009
Reputation Points: 719
Solved Threads: 194
 

Feed the PHP code you want to analyze to yyparse(). That would build the tree (unfortunately, you'd be stuck with a global variable to build the tree from; that's a vary annoying shortcoming of bison).

Sorry if I misunderstood the question.


Thanks, just leave PHP thing, suppose you want to parse C++ (which will pretty be a like), how are you going to build AST from that global variable? That is my question where I hit a wall. I can tokenize the file into tokens using flex and I can feed those tokens to a parser but I don't know how to build the tree of those tokens!

evstevemd
Senior Poster
3,713 posts since Jun 2007
Reputation Points: 462
Solved Threads: 392
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You
View similar articles that have also been tagged: