Help me with my parsing tree

Question

gparadox 0 Newbie Poster

16 Years Ago

These are the directions:
Recursive Descent Parsing
Consider the following BNF grammar:

E -> T + E | T - E | T
T -> P * T | P / T | P
P -> I | (E)
I -> a | b | ... | y | z

Using the technique described in class implement a recursive descent parser that recognizes strings in this language. Input should be from a file called input.dat and output should be to the console. An example session might look like this:
String read from file: a+b-c*d
The string "a+b-c*d" is in the language.
String read from file: a**b++c
The string "a**b++c" is not in the language.

You must implement the project in BOTH Java and C++! Implementations that do not include a solution in both languages will, at best, receive half credit. To simplify things you will not have to handle whitespace when parsing the string, i.e. " " and similiar are illegal characters in this language.

My output right now is saying everything is out of the language.
\My input data file is
a+b/c-d
a-b*c+d
1+a
a+1
aa
aa+1
a+-c

Please Dan Web help. I did a google search and found nothing to help me.

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

string c;
bool term();
bool paremeter();
bool identifier();
bool expression();
int place;
int main()
{
ifstream infile;
ofstream outfile;
infile.open("input.dat");
    while (infile) 
          {
         infile >>  c;

         
       place = 0;
         if ( expression() )
            {
             cout <<"The sentence " << c << " is in the language" << endl << endl;
            }//end of if
            else 
            cout <<"The sentence " << c << " is not in the language" << endl<<endl;;
         c = '0';
         }   //end of while 
infile.close();
system("pause");
return 0;
} //end of main
/*
E -> T + E | T - E | T
T -> P * T | P / T | P
P -> I | (E)
I -> a | b | ... | y | z
*/
 
//E -> T + E | T - E | T
bool expression()
{

  /*   if (term() )
     {if (c[place] == '+' || c[place] == '-'  )
         {place++; }
         return true; 
         */
         
         if (term() )
         {
                    if ( c[place] == '+' || c[place] == '-' )
                    {
                         if (term() )
                         {
                         return true; } } }
else
return false; }
//T -> P * T | P / T | P
bool term()
{      
     /*  if (paremeter() )
      {if (c[place] == '*' || c[place] == '/' )
         {place++; }
     return true; 
     }
             */
     if (paremeter() )
      {if (c[place] == '*' || c[place] == '/' )
         { if (paremeter() )
    {place++; return true; 
     }        } }
     else 
return false;
}
//P -> I | (E)
bool paremeter()
{       
if (identifier() )
{ return true;}

if (c[place] == '(' )
    { 
         place++;
         if (expression() )
         {
            if (c[place] == ')' )
            { place++; return true;
            }}}
 return false;
}
     //I -> a | b | ... | y | z   
bool identifier()
{
     if (  c[place] >= 'a' && c[place] <= 'z' )
        { place++; return true; }

    return false;
}

c++

3 Contributors
7 Replies
117 Views
2 Days Discussion Span
Latest Post 16 Years Ago Latest Post by gparadox

bugmenot 25 Posting Whiz in Training

16 Years Ago

I can point out a few things.

* You are relying on the default return values of your boolean functions being false. In the case of 'term', if the first 'perimeter' call returns false, you might still return a positive value. Same deal with 'expression'. If possible, configure your compiler to give you all the warnings it can.

* When a production fails with 'return false;', you need to make sure it rewinds any input it consumed. If an operator character is matched but, the right operand isn't, you'll need to restore 'place' to the value it had at the start of the function.

* After you've finished parsing, you need to check that all input has been consumed. Given that, you'll need to get the length of your input string so you can compare that with 'place'.

* 'perimeter' doesn't consume the parentheses when it sees them.

* I would recommend developing a layout style that uses more whitespace and is consistent with curly braces. For example,

bool term()
{
    if ( perimeter() )
    {
        if ( c[place] == '*' || c[place] == '/' )
        {
            place++;

            if ( perimeter() )
            {
                return true;
            }
            else
            {
                return false;
            }

            // NOTE: Because the code is spaced nicely, we see
            // that the following statement can never be reached.
            return true;
        }
    }
}

bool perimeter()
{
    if ( identifier() )
    {
        return true;
    }

    if ( c[place] == '(')
    {
        if ( expression() )
        {
            if ( c[place] ==')' )
            {
                return true;
            }
        }
    }
}

Is more readable than,

bool term()
 {
      if ( perimeter() )
      {
           if ( c[place] == '*' || c[place] == '/' )
           {
                place++;
                        if (perimeter() )
                        {return true;} else {return false;}
           return true;
           }
      }
}
  
  bool perimeter()
  {
       if (identifier() )
       {
                        return true;
       }
       if (c[place] == '(')
       { if (expression() )
         {if (c[place] ==')' )
         return true;
          }
       }
   }

Exactly what style you choose is a matter of personal preference but, make sure it's easy for you to read.

Narue 5,707 Bad Cop

16 Years Ago

>My output right now is saying everything is out of the language.
I'd start with the fact that you aren't actually parsing the given grammar. Consider two examples:

1) The grammar states that a term by itself is a valid expression, but your expression function doesn't implement this behavior. In fact, you ignore that production completely.

2) A valid expression is either a term by itself, or a term followed by "+" or "-" and an expression. Your code requires a term on the right hand side, which won't create the correct parse tree for strings such as "a+b+c". To get the same behavior, you'd have to rely on parentheses: "a+(b+c)". This is a serious problem with operators that aren't commutative. "a-(b-c)" is not the same as a-b-c, for example. You do the same thing in the term function.

Your code for a function should be directly transferable to a production in the grammar. For example:

// E -> T [('+' | '-') E]
bool expression ( lexer& lex )
{
  if ( !term ( lex ) )
    return false;
  else if ( lex.next() == "+" || lex.next() == "-" ) {
    lex.extract();
    return expression ( lex );
  }
  
  return true;
}

>My input data file is
Very incomplete. You don't even test the parenthesis rules for a parameter.

>When a production fails with 'return false;', you need
>to make sure it rewinds any input it consumed.
That's probably necessary when the result is nothing more than a message saying whether the string is valid or invalid. If you want detailed diagnostics on why the string failed, or need to backtrack for some reason, then it makes more sense.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

gparadox 0 Newbie Poster · Answer 1 · 2008-10-12T06:30:41+00:00

Let me make it more readable

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

string c;
bool term();
bool perimeter();
bool identifier();
bool expression();
int place;
int main()
{
ifstream infile;
ofstream outfile;
infile.open("input.dat");
    while (infile) 
          {
         infile >>  c;

         
       place = 0;
        // if ( expression() )
          //if ( term() )
           if ( identifier() )
              // if ( paremeter() )
            {
             cout <<"The sentence " << c << " is in the language"  << endl;
            }//end of if
            else 
            cout <<"The sentence " << c << " is not in the language" <<endl;;
         }   //end of while 
infile.close();
system("pause");
return 0;
} //end of main
/*
E -> T + E | T - E | T
T -> P * T | P / T | P
P -> I | (E)
I -> a | b | ... | y | z
*/

bool expression()
{
     if ( term() )
     {
          if (c[place] == '+' || c[place] == '-')
          {
                       place++;
                       if ( term() )
                       { return true;} else {return false;}
          return true;
          }
     }
 }
 
 bool term()
 {
      if ( perimeter() )
      {
           if ( c[place] == '*' || c[place] == '/' )
           {
                place++;
                        if (perimeter() )
                        {return true;} else {return false;}
           return true;
           }
      }
}
  
  bool perimeter()
  {
       if (identifier() )
       {
                        return true;
       }
       if (c[place] == '(')
       { if (expression() )
         {if (c[place] ==')' )
         return true;
          }
       }
   }
   
   bool identifier()
   {
        if( c[place] >= 'a' && c[place] <= 'z') 
        {
            place++;
            return true;
        }
    }

gparadox 0 Newbie Poster · Answer 2 · 2008-10-13T06:58:31+00:00

I can point out a few things.
* You are relying on the default return values of your boolean functions being false. In the case of 'term', if the first 'perimeter' call returns false, you might still return a positive value. Same deal with 'expression'. If possible, configure your compiler to give you all the warnings it can.
* When a production fails with 'return false;', you need to make sure it rewinds any input it consumed. If an operator character is matched but, the right operand isn't, you'll need to restore 'place' to the value it had at the start of the function.
* After you've finished parsing, you need to check that all input has been consumed. Given that, you'll need to get the length of your input string so you can compare that with 'place'.
* 'perimeter' doesn't consume the parentheses when it sees them.
* I would recommend developing a layout style that uses more whitespace and is consistent with curly braces. For example,

Exactly what style you choose is a matter of personal preference but, make sure it's easy for you to read.

Ok. Thanks for your input. I corrected that I never returned false. I think I restored place back to where it goes if its false.
Maybe Ill code for a semi-colin check to see if thats the end of the expression.
But why is it still not working? Everything comes out false.

gparadox 0 Newbie Poster · Answer 3 · 2008-10-13T07:09:23+00:00

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

string c;
bool term();
bool perimeter();
bool identifier();
bool expression();
int place;
int main()
{
ifstream infile;
ofstream outfile;
infile.open("input.dat");
    while (infile) 
          {
         infile >>  c;

         
       place = 0;
        if ( expression() )
          //if ( term() )
           //if ( identifier() )
              //if ( perimeter() )
            {
             cout <<"The sentence " << c << " is in the language"  << endl;
            }//end of if
            else 
            cout <<"The sentence " << c << " is not in the language" <<endl;;
         }   //end of while 
infile.close();
system("pause");
return 0;
} //end of main
/*
E -> T + E | T - E | T
T -> P * T | P / T | P
P -> I | (E)
I -> a | b | ... | y | z
*/

bool expression()
{
     if ( term() )
     {
          if (c[place] == '+' || c[place] == '-')
          {
                       place++;
                       if ( term() )
                       { 
                            return true;
                       } 
                       else 
                            {
                                place--; 
                                return false;
                            }
          }
     }
return false;
 }
 
 bool term()
 {
      if ( perimeter() )
      {
           if ( c[place] == '*' || c[place] == '/' )
           {
                place++;
                        if (perimeter() )
                        {
                        return true;
                        } 
                        else 
                          {
                            place--; 
                            return false;
                          }
       }
      }
return false;
}
  
  bool perimeter()
  {
       if (identifier() )
       {
                        return true;
       }
       if (c[place] == '(')
       { place++;
                     if (expression() )
                       {
                        if (c[place] ==')' )
                         {
                          place++;               
                          return true;
                          }
                        } 
       place--;
       }
       return false;
   }
   
   bool identifier()
   {
        if( c[place] >= 'a' && c[place] <= 'z') 
        {
            place++;
            return true;
        }
       
        return false;
    }

bugmenot 25 Posting Whiz in Training · Answer 4 · 2008-10-13T18:35:47+00:00

A few more things.

* In expression, the case where a single term matches will fail because, after failing to match a '+' or '-', the code just falls through to returning false. Similar thing with term just matching one perimeter.

bool expression()
{
    if ( term() )
    {
        if (c[place] == '+' || c[place] == '-')
        {
            place++;
            if ( expression() )
            {
                return true;
            }
            else
            {
                place--;
                return false;
            }
        }
        else // Added so a single term can match
        {
            return true;
        }
    }
    return false;
}

* You haven't quite translated the grammar into the correct function calls. In expression, you're matching two terms but it should be a term and another expression. Same with term. It should match a perimeter and another term.

* You don't need to check for a semi-colon at the end.

As you parse, you're advancing the 'place' variable so, when parsing finishes, it will be one greater than the index of the furthest accepted character. If all input has been accepted, that value will be the same as the length of the string because, the length starts counting at 1 while the index starts counting at 0.

Your input is being read as a 'string' object so, you can get its length by calling the 'length' method. int length = c.length(); * Only the first two strings of your test data are in the language your grammar describes. This isn't a problem, I'm just making sure you know what the desired results are.

* In future, when you get stuck with a programming problem like this, a good way to work through it is to print some debugging output to understand your own code. You want to print out the value of variables at important points and the function you are in. This will make the problem much more obvious and, hopefully, you'll see where you made the mistake.

This kind of stuff,

cout << "Entered expression" << endl;
cout << "Inside expression: place = " << place << " char = " << c[place] << endl;
cout << "Parsing finished: place = " << place << " char = " << c[place] << endl << endl;

* Short of doing your homework for you, there isn't anymore help I can give. Good luck.

gparadox 0 Newbie Poster · Answer 5 · 2008-10-14T06:06:19+00:00

Hey thanks everyone for your help. The only thing im stuck with now is that if the setence ends with a ), even though it doesn't have a beginning parenthesis; it still comes out true.
a*b-a-b) comes out as in the language. Isnt this checked for? And its not between a and z.

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

string c;
bool term();
bool perimeter();
bool identifier();
bool expression();
int place;
int main()
{
    int length = c.length(); 
ifstream infile;
ofstream outfile;
infile.open("input.dat");
    while (infile) 
          {
         infile >>  c;

         
       place = 0;
        if ( expression() )
          //if ( term() )
           //if ( identifier() )
              //if ( perimeter() )
            {
             cout <<"The sentence " << c << " is in the language"  << endl;
            }//end of if
            else 
            cout <<"The sentence " << c << " is not in the language" <<endl;;
         }   //end of while 
infile.close();
system("pause");
return 0;
} //end of main
/*
E -> T + E | T - E | T
T -> P * T | P / T | P
P -> I | (E)
I -> a | b | ... | y | z
*/
bool expression()
{
    if ( term() )
    {
        if (c[place] == '+' || c[place] == '-')
        {
            place++;
            if ( expression() )
            {
                return true;
            }
            else
            {
                place--;
                return false;
            }
        }
        else // Added so a single term can match
        {
            return true;
        }
    }
    return false;
}
//T -> P * T | P / T | P
 bool term()
 {
      if ( perimeter() )
      {
           if ( c[place] == '*' || c[place] == '/' )
           {
                place++;
                        if (term() )
                        {
                        return true;
                        } 
                        else 
                          {
                            place--; 
                            return false;
                          }
       } else
       return true;
      }
return false;
}

//  P -> I | (E)
  bool perimeter()
  {
       if (identifier() )
       {
                        return true;
       }
       if (c[place] == '(')
       { place++;
                     if (expression() )
                       {
                         if (c[place] ==')' )
                         {
                          place++;               
                          return true;
                          }
                        } 
       place--;
       }
       return false;
   }
//I -> a | b | ... | y | z   
   bool identifier()
   {
        if( c[place] >= 'a' && c[place] <= 'z') 
        {
            place++;
            return true;
        }
       
        return false;
    }