I want to parse expressions like those in typical Haskell source. I get an input stream, which is already tokenized and annotated with fixity and precedence. The set of operators is not known at compile time and may be arbitrary. The output should be a tree representing the expression. Here's a bit of what I tried:
-- A single token of the input stream data Token a = Prefix a | Infix a Int Fixity -- The Int parameter represents the precedence | LBrace | RBrace deriving (Show,Eq) data Fixity = InfixL | InfixR | InfixC deriving (Show,Eq) data Expression a = Atom a | Apply a a deriving Show -- Wrapped into either, if expression is malformed. exprToTree :: [Token a] -> Either String (Expression a) exprToTree = undefined
For the sake of simpleness, I don't treat lambdas special, they are just atoms.
But now, I am completely lost. How can I convert the stream of atoms into a tree? Can somebody please point me to an algorithm or help me with finding one.
In a nutshell then, even though you have a token list you still need a parser.
Parsec can handle alternative token streams, but you'll probably have to refer to the manual - a PDF available at Daan Leijen's "legacy" home page - http://legacy.cs.uu.nl/daan/download/parsec/parsec.pdf. You can roll your own parser without using a combinator library but you will be re-implementing some fraction of Parsec. As far as I remember UU_parsing expects to work with a separate scanner so its another option.
Although it doesn't handle parsing you might find Lennart Augustsson's "Lambda calculus cooked four ways" helpful for other things - http://www.augustsson.net/Darcs/Lambda/top.pdf
Edit - here is a partly worked out plan of how you can go about it with Parsec, for details you'll have to consult section 2.11 of the manual.
Suppose you have this data type for concrete "internal" tokens:
data InternalTok = Ident String | BinOpPlus | BinOpMinus | UnaryOpNegate | IntLiteral Int deriving (Show)
Then you end get these types for the Parsec token and parse:
type MyToken = Token InternalTok type MyParser a = GenParser MyToken () a
Define a helper function as per the Parsec manual - this handles show and pos so individual definitions are shorter cf. the
mytoken function on page 19.
mytoken :: (MyToken -> Maybe a) -> MyParser a mytoken test = token showToken posToken testToken where showToken tok = show tok posToken tok = no_pos testToken tok = test tok
For the moment your token type does not track source position, so:
no_pos :: SourcePos no_pos = newPos "" 0 0 0
For each terminal you have to define a token function:
identifier :: MyParser MyToken identifier = mytoken (\tok -> case tok of a@(Prefix (Ident _)) -> Just a _ -> Nothing) intLiteral :: MyParser MyToken intLiteral = mytoken (\tok -> case tok of a@(Prefix (IntLiteral _)) -> Just a _ -> Nothing) binPlus :: MyParser MyToken binPlus = mytoken (\tok -> case tok of a@(Infix BinOpPlus _ _) -> Just a _ -> Nothing) binMinus :: MyParser MyToken binMinus = mytoken (\tok -> case tok of a@(Infix BinOpMinus _ _) -> Just a _ -> Nothing) unaryNegate :: MyParser MyToken unaryNegate = mytoken (\tok -> case tok of a@(Prefix UnaryNegate _ _) -> Just a _ -> Nothing)
Edit - to handle custom infix operators you'll need these token parsers:
tokInfixL :: Int -> MyParser MyToken tokInfixL n = mytoken $ \tok -> case tok of a@(Infix _ i InfixL) | i == n -> Just a _ -> Nothing) tokInfixR :: Int -> MyParser MyToken tokInfixR n = mytoken $ \tok -> case tok of a@(Infix _ i InfixR) | i == n -> Just a _ -> Nothing) tokInfixC :: Int -> MyParser MyToken tokInfixC n = mytoken $ \tok -> case tok of a@(Infix _ i InfixC) | i == n -> Just a _ -> Nothing) tokPrefix :: MyParser MyToken tokPrefix = mytoken (\tok -> case tok of a@(Prefix _) -> Just a _ -> Nothing)
Now you can define the parser - you need to fix the number of levels of precedence beforehand, there is no way around that fact as you need to code a parser for each level.
The top-level expression parse is simply calls the highest precedence parser
pExpression :: Parser Expersion pExpression = expression10
For each precendence level you need a parser roughly like this, you'll have to work out non-assoc for yourself. Also you might need to do some work on chainl / chainr - I've only written a parser in this style with UU_Parsing it might be slightly different for Parsec. Note Apply is usually at the precedence highest level.
expression10 :: Parser Expression expression10 = Apply <$> identifier <*> pExpression <|> Prefix <$> tokPrefix <*> pExpression <|> chainl (Infix <$> tokInfixL 10) expression9 <|> chainr (Infix <$> tokInfixR 10) expression9 expression9 :: Parser Expression expression9 = Prefix <$> tokPrefix <*> pExpression <|> chainl (Infix <$> tokInfixL 9) expression8 <|> chainr (Infix <$> tokInfixR 9) expression8 ...
You'll have to extend your syntax to handle IntLiterals and Identifiers which are at level 0 in precedence:
expression0 :: Parser Expression expression0 = IntLit <$> intLiteral <|> Ident <$> identifier <|> ...
Edit - for unlimited precedence - maybe if you only have application and Atom maybe something like this would work. Note you'll have to change the tokInfixL and tokInfixR parsers to no longer match assoc-level and you may have to experiment with the order of alternatives.
expression :: Parser Expression expression = Apply <$> identifier <*> expression <|> Prefix <$> tokPrefix <*> expression <|> chainl (Infix <$> tokInfixL) expression <|> chainr (Infix <$> tokInfixR) expression <|> intLiteral <|> identifier intLiteral :: Parser Expression intLiteral = Atom . convert <$> intLiteral where convert = ?? identifier :: Parser Expression identifier = Atom . convert <$> intLiteral where convert = ??
After searching the web for another topic, I found this nice piece of code to do exactly what I want. Have a look:
data Op = Op String Prec Fixity deriving Eq data Fixity = Leftfix | Rightfix | Nonfix deriving Eq data Exp = Var Var | OpApp Exp Op Exp deriving Eq type Prec = Int type Var = String data Tok = TVar Var | TOp Op parse :: [Tok] -> Exp parse (TVar x : rest) = fst (parse1 (Var x) (-1) Nonfix rest) parse1 :: Exp -> Int -> Fixity -> [Tok] -> (Exp, [Tok]) parse1 e p f  = (e, ) parse1 e p f inp@(TOp op@(Op _ p' f') : TVar x : rest) | p' == p && (f /= f' || f == Nonfix) = error "ambiguous infix expression" | p' < p || p' == p && (f == Leftfix || f' == Nonfix) = (e, inp) | otherwise = let (r,rest') = parse1 (Var x) p' f' rest in parse1 (OpApp e op r) p f rest' -- Printing instance Show Exp where showsPrec _ (Var x) = showString x showsPrec p e@(OpApp l (Op op _ _) r) = showParen (p > 0) $ showsPrec 9 l . showString op . showsPrec 9 r -- Testing plus = TOp (Op "+" 6 Leftfix) times = TOp (Op "*" 7 Leftfix) divide = TOp (Op "/" 7 Leftfix) gt = TOp (Op ">" 4 Nonfix) ex = TOp (Op "^" 8 Rightfix) lookupop '+' = plus lookupop '*' = times lookupop '/' = divide lookupop '>' = gt lookupop '^' = ex fromstr [x] = [TVar [x]] fromstr (x:y:z) = TVar [x] : lookupop y : fromstr z test1 = fromstr "a+b+c" test2 = fromstr "a+b+c*d" test3 = fromstr "a/b/c" test4 = fromstr "a/b+c" test5 = fromstr "a/b*c" test6 = fromstr "1^2^3+4" test7 = fromstr "a/1^2^3" test8 = fromstr "a*b/c"
(I took it from this page: http://hackage.haskell.org/trac/haskell-prime/attachment/wiki/FixityResolution/resolve.hs)