Hi friends,
I need to make a simple parser for parsing PHP files (classes, functions et al) and I have read both Flex and Bison manual. I have read somewhere that I must build Abstract Syntax Tree (AST) and utilize that. But I cannot find any tutorial on how to make AST with Bison

Please help me with a link or explanations.
Thanks!

Recommended Answers

All 5 Replies

Define a generic node type; somewhat akin to

typedef struct node_s {
    enum node_type_t node_type;
    void * node_specific_data;
    node_s ** children;
} node_t;

You may need other fields as well.

Define "node constructors" for each language entity (terminals and nonterminals alike).

In the semantic action for each rule, call a relevant constructor, assign it to $$ , allocate the children array, and fill it up with pointers to child nodes, available as $1, $2 etc.

That's pretty much it. As soon as you have a grammar, this process is almost mechanical.

PS: tutorial

commented: Great step for me! +13

thanks for reply nezachem.
I'm very newbee (actually just starting) with code analysis so take my words with grain of salt! that being said, let say I want to get a class from PHP file and all its Childrens (members as well as attributes) and thereby build a tree I can access. How do I go about?
I have read that tutorial and it is funny to learn how Lexical analyzer works but to be frank, Bison presented me with nightmares (have a lot of complex stuffs but I'm stll trying to grasp). Should I write a new grammar file or just use this one from zend?
NB: I want to write a source code browser

Thanks for tutorial too, It was one of first things to read on the field and it did help me to understand some stuffs. I will check again!

thanks for reply nezachem.
I'm very newbee (actually just starting) with code analysis so take my words with grain of salt! that being said, let say I want to get a class from PHP file and all its Childrens (members as well as attributes) and thereby build a tree I can access. How do I go about?

Feed the PHP code you want to analyze to yyparse(). That would build the tree (unfortunately, you'd be stuck with a global variable to build the tree from; that's a vary annoying shortcoming of bison).

Sorry if I misunderstood the question.

Should I write a new grammar file or just use this one from zend?

I do not know how authoritative is Zend (I am pretty far from the PHP world). If you trust it to reliably represent the grammar, then, by all means, use it. Of course, all the zend_* functions shall be tailored for your needs.

Feed the PHP code you want to analyze to yyparse(). That would build the tree (unfortunately, you'd be stuck with a global variable to build the tree from; that's a vary annoying shortcoming of bison).

Sorry if I misunderstood the question.

Thanks, just leave PHP thing, suppose you want to parse C++ (which will pretty be a like), how are you going to build AST from that global variable? That is my question where I hit a wall. I can tokenize the file into tokens using flex and I can feed those tokens to a parser but I don't know how to build the tree of those tokens!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.