# What is the algorithm for parsing expressions in infix notation?

algorithm expression-trees language-agnostic parsing php ### Question

I would like to parse boolean expressions in PHP. As in:

``````A and B or C and (D or F or not G)
``````

The terms can be considered simple identifiers. They will have a little structure, but the parser doesn't need to worry about that. It should just recognize the keywords `and or not ( )`. Everything else is a term.

I remember we wrote simple arithmetic expression evaluators at school, but I don't remember how it was done anymore. Nor do I know what keywords to look for in Google/SO.

A ready made library would be nice, but as I remember the algorithm was pretty simple so it might be fun and educational to re-implement it myself.

Recursive descent parsers are fun to write and easy to read. The first step is to write your grammar out.

Maybe this is the grammar you want.

``````expr        = and_expr ('or' and_expr)*
and_expr    = not_expr ('and' not_expr)*
not_expr    = simple_expr | 'not' not_expr
simple_expr = term | '(' expr ')'
``````

Turning this into a recursive descent parser is super easy. Just write one function per nonterminal.

``````def expr():
x = and_expr()
while peek() == 'or':
consume('or')
y = and_expr()
x = OR(x, y)
return x

def and_expr():
x = not_expr()
while peek() == 'and':
consume('and')
y = not_expr()
x = AND(x, y)
return x

def not_expr():
if peek() == 'not':
consume('not')
x = not_expr()
return NOT(x)
else:
return simple_expr()

def simple_expr():
t = peek()
if t == '(':
consume('(')
result = expr()
consume(')')
return result
elif is_term(t):
consume(t)
return TERM(t)
else:
raise SyntaxError("expected term or (")
``````

This isn't complete. You have to provide a little more code:

• Input functions. `consume`, `peek`, and `is_term` are functions you provide. They'll be easy to implement using regular expressions. `consume(s)` reads the next token of input and throws an error if it doesn't match `s`. `peek()` simply returns a peek at the next token without consuming it. `is_term(s)` returns true if `s` is a term.

• Output functions. `OR`, `AND`, `NOT`, and `TERM` are called each time a piece of the expression is successfully parsed. They can do whatever you want.

• Wrapper function. Instead of just calling `expr` directly, you'll want to write a little wrapper function that initializes the variables used by `consume` and `peek`, then calls `expr`, and finally checks to make sure there's no leftover input that didn't get consumed.

Even with all this, it's still a tiny amount of code. In Python, the complete program is 84 lines, and that includes a few tests.

Prime Library

More Projects...