Haskell - Reverse polish notation regular expression to expression tree

expression-trees haskell postfix-notation regex

Question

Im struggling with coming up with algorithm that converts a regular expression (in context of regular languages, there are only 3 operations '.' for concat, '+' for "or" and '*' for iteration) in reverse polish notation to the regular expression tree.

For example i have a regular expression

aa.bb.+

for which i need to construct following expression tree

      +
    /   \
   .     .
  / \   / \
 b   b a   a  

Would appreciate any help. Thanks!

Accepted Answer

This is not hard to do using an stack-based algorithm (which translates into a foldl in Haskell):

data Tree
  = Symbol Char
  | Op Char Tree Tree
  deriving Show

type Stack = [Tree]

step :: Stack -> Char -> Stack
step (r:l:s) '.' = (Op '.' l r):s
step (r:l:s) '+' = (Op '+' l r):s
step s c         = (Symbol c):s

parse :: String -> Stack
parse = foldl step []

which would yield something like this:

λ> parse "aa.bb.+"
[Op '+' (Op '.' (Symbol 'a') (Symbol 'a')) (Op '.' (Symbol 'b') (Symbol 'b'))]

you did not write anything about your data-structures so I took some liberties - it should be easy to adapt to your situation once you understood step and the foldl in parse


if you want to change left/right as in your example you just have to do it in step:

step :: Stack -> Char -> Stack
step (l:r:s) '.' = (Op '.' l r):s
step (l:r:s) '+' = (Op '+' l r):s
step s c         = (Symbol c):s

example:

λ> parse "aa.bb.+"
[Op '+' (Op '.' (Symbol 'b') (Symbol 'b')) (Op '.' (Symbol 'a') (Symbol 'a'))]

but I would expect the first one if the order would matter (here you have bb before aa which is strange ...


of course it's strange that the star operation is binary too so I would suggest:

data Tree
  = Symbol Char
  | Concat Tree Tree
  | Star Tree
  deriving Show

type Stack = [Tree]

step :: Stack -> Char -> Stack
step (r:l:s) '.' = (Concat r l):s
step (t:s)   '+' = (Star t):s
step s c         = (Symbol c):s

parse :: String -> Stack
parse = foldl step []

which would change your syntax a bit:

λ> parse "aa.bb..+"
[Star (Concat (Concat (Symbol 'b') (Symbol 'b')) (Concat (Symbol 'a') (Symbol 'a')))]

but which seems saner IMO


convert list to Maybe

of course you might not like the [Stack] return so you could do this:

parse :: String -> Maybe Tree
parse input =
  case result of
    [t] -> Just t
    []  -> Nothing
    _   -> error "syntax error"
  where result = foldl step [] input

examples:

λ> parse "aa.bb..+"
Just (Star (Concat (Concat (Symbol 'b') (Symbol 'b')) (Concat (Symbol 'a') (Symbol 'a'))))
λ> parse ""
Nothing
λ> parse "aa.bb.+"
*** Exception: syntax error


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow