Expression Parser

Question

mattloto 0 Light Poster

13 Years Ago

I'm trying to make a calculator that parses input like [TEX]( 3 + 2 ) * 5[/TEX]

I want to use formal grammar for this, and I'm wondering how to organize this. I was thinking having a class for Value, which would basically just be a number. Then I want to have another class called Expression, which is made up of 3 parts: the left, the middle (the operator) and the right. However, the left and right could be Expressions, or they could be Values. I was looking into polymorphism, but that was a little confusing. How would I structure this nicely?

And also, what about expressions that dont follow the value, operator, value scheme, like square root? Thanks in advance!

c++

4 Contributors
8 Replies
136 Views
1 Day Discussion Span
Latest Post 13 Years Ago Latest Post by mike_2000_17

rubberman 1,355 Nearly a Posting Virtuoso

13 Years Ago

A recursive descent parser is usually used for this, building up a binary tree structure where the root of each node is the operation, and each leaf of a node is either a value, or a link to another computational node. The Wikipedia has a lot on this stuff: http://en.wikipedia.org/wiki/Recursive_descent_parser

iamthwee

13 Years Ago

RPN would be good here, or binary trees.

See my code snippet for details?

http://www.daniweb.com/software-development/cpp/code/216831

Edited 13 Years Ago by iamthwee because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster · Answer 1 · 2011-05-24T12:22:02+00:00

As for data structuring for this, you can use polymorphism, in which case you have to look for what is common to all things involved. In this case, I would suggest at least one function: "evaluate()". All expressions or values have to be able to be evaluated to yield a value. In the case of expressions, they have to evaluate their operands and then compute the operation. While values only have to output the value they hold.

As for handling the case of functions (like sqrt or exp), they are simply unitary operators (i.e. they have one operand). While operations like + - * / are binary operators (i.e. they have two operands, one on either side).

The parsing will certainly require some look-ahead to determine what the next appearing expression is between a value, a unitary operator, or a binary operator. And then, it will require that it be added to the data structure that represents the overall expression. You can naturally see that the data-structure generated will be a tree-structure with branching factors from 0 to 2 (i.e. values have 0 children, unitary operators have 1 child (operand), and binary operators have 2).

The one additional thing to worry about is priority of operation. You should associate, to each expression, a priority value. The idea is that expressions of highest priority should be the deepest in the tree (nearest to the leaf nodes), because the expressions will be evaluated by descending the tree to the leafs (values), evaluating those, and climbing back up, evaluating expressions along the way back to the top (called a depth-first visit of the tree). The priority value associated to expressions will determine whether new occurring expressions should be added as a child or a parent of the last encountered operation. Example: "1 + 2 * 3". When parsing this, you will first find a value expression (1) which has very high priority. Then, you encounter the addition, since the addition expects an operand on the left whatever expression you currently have (value 1) will become the first child of the addition expression. Once you parsed the value (2), you can say that it could be associated with the addition operator (with its priority value). Then, you see the multiply operator which has higher priority than the addition and also expects an operand on its left. So, at this point, you can say that the value (2) should be a child to the multiply operator, and that the result of the multiply operator should be a child of the addition operator. So, you see, it is all about maintaining an order in the priorities in the tree (i.e. the lowest-priority operator + ends up at the root, the mid-priority operator * ends up as an intermediate node, and all the high-priority value expressions end up as leaf nodes). This kind of parse is simple because you typically only have to keep only a few expressions under consideration at any given time (like in the example, you only need to keep track of the "under-construction" expression (addition), the "candidate" expression (value 2) and the next expression found (multiply)), and the rebranching needed to construct the tree is very local and simple.

Another little hint: parentheses () should be considered as a unitary operator that yields the identity (i.e. evaluates its child expression and forwards it as its own evaluation result).

mattloto 0 Light Poster · Answer 2 · 2011-05-25T03:04:31+00:00

Thanks for all the help guys! I'm probably going to go with the polymorphism, not because it's easier, but because it would be a better learning experience in OOP.

Anyway, time to write an interpreter!

mattloto 0 Light Poster · Answer 3 · 2011-05-25T04:41:04+00:00

Okay, well I started to write the evaluator before the interpreter, but I'm running into some issues. This is what I have so far:

#include <iostream>
#include <string>
#include <math.h>

using namespace std;

enum Operator {ADD, SUBTRACT, MULTIPLY, DIVIDE, POWER, SQRT};

float calculate(float left, Operator op, float right) {
  if (op == ADD) return left + right;
  else if (op == SUBTRACT) return left - right;
  else if (op == MULTIPLY) return left * right;
  else if (op == DIVIDE) return left / right;
  else if (op == POWER) return pow(left, right);
  else if (op == SQRT) return sqrt(left);
  else return 0;
}

class Expression {
  public:
    Expression();
    virtual float evaluate() = 0;
} ;

class Node : public Expression {
  public:
    Node(Expression left, Operator op, Expression right) {
      this->left = left;
      this->right = right;
      this->op = op;
    }
    Operator op;
    Expression *left, *right;
    float evaluate() {
      return calculate(left->evaluate(), op, right->evaluate());
    }
} ;

class Value : public Expression {
  public:
    Value(float value) { this->value = value; }
    float evaluate() { return value; }
  private:
    float value;
} ;

int main() {
  Node n(Value(3), ADD, Value(4));
  cout << n.evaluate() << endl;
}

The thing is, when I run this, I get an error on line 27 saying: "cannot declare parameter 'left' to be of abstract type 'Expression' because the following virtual functions are pure within 'Expression': virtual float Expression::evaluate()"

So I tried to change Line 27 to:

Node(Expression *left, Operator op, Expression *right) {

But then I get this error: "no matching function for call to 'Node::Node(Value, Operator, Value)', candidates are: Node::Node(Expression*, Operator, Expression*)"

I at least know what that one means (unlike the first error), but I don't get why it isn't working, since a Value is an Expression and this is the whole basis of polymorphism. Any Help?

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster · Answer 4 · 2011-05-25T08:51:24+00:00

You need to give pointers to the constructor, not objects. So, line 48 could be replaced by:

Value v1(3), v2(4);
  Node n(&v1, ADD, &v2);

mattloto 0 Light Poster · Answer 5 · 2011-05-25T09:35:30+00:00

When I do that, all works well except I still get errors saying "no matching function for call to 'Node::Node(Value*, Operator, Value*)', candidates are: Node::Node(Expression, Operator, Expression)" Thats slightly different from the error I got before, but it's the same idea of polymorphism failing (or me failing to understand it :P).

I then tried and make the constructor read:

Node(Expression &left, Operator op, Expression &right) {

and put back the old definition of n (Without the &), and I get this:

)]+0xd)||undefined reference to `Expression::Expression()'|
)]+0x24)||undefined reference to `Expression::Expression()'|
)]+0x32)||undefined reference to `Expression::Expression()'|
C:\Users\Matt\Desktop\C++\Expressions.o:Expressions.cpp:(.text$_ZN5ValueC1Ef[Value::Value(float)]+0xd)||undefined reference to `Expression::Expression()'|
||=== Build finished: 4 errors, 0 warnings ===|

I can't tell if this even means something or if MinGW is being stupid.

mike_2000_17 2,669 21st Century Viking Team Colleague Featured Poster · Answer 6 · 2011-05-25T09:51:46+00:00

The right combination is:

//constructor:
Node(Expression* left, Operator op, Expression* right) {

//use:
  Value v1(3), v2(4);
  Node n(&v1, ADD, &v2);