Plotting expression trees in R

expression-trees plot r

Question

I know that I can create an expression tree in R using the substitute function. Let's say that I generate the following expression tree:

expT <- substitute(a+(2*b+c))

Is it possible to visualize the expression tree in R, producing something like:

Expression Tree

I know that ( is also a function in R, but I would like to omit that in the plot.

Accepted Answer

Here is an approach taking advantage of the function utils::getParseData and borrowing from a function written for the parser package and using igraph for the visuals. The linked function almost does what you wanted, but the data returned by the getParseData function has blank nodes with the numerical values/symbols/operators etc. on the leaves. This makes sense if you try to parse functions or ternary expressions or more complicated things.

This function simply creates an edgelist from the parse data.

## https://github.com/halpo/parser/blob/master/R/plot.parser.R
## Modified slightly to return graph instead of print/add attr
parser2graph <- function(y, ...){
    y$new.id <- seq_along(y$id)
    h <- graph.tree(0) + vertices(id = y$id, label= y$text)
    for(i in 1:nrow(y)){
        if(y[i, 'parent'])
            h <- h + edge(c(y[y$id == y[i, 'parent'], 'new.id'], y[i, 'new.id']))
    }
    h <- set_edge_attr(h, 'color', value='black')
    return(h)
}

The next function collapses the parse tree by removing all the '(){}' and remaining gaps. The idea is to first move all the labels up one level in the tree, then clip the leaves. And finally all the gaps from nested expressions ('(){}') are removed by creating/destroying edges. I colored the edges blue where levels of nesting from brackets/braces were removed.

## Function to collapse the parse tree (removing () and {})
parseTree <- function(string, ignore=c('(',')','{','}'), ...) {
    dat <- utils::getParseData(parse(text=string))
    g <- parser2graph(dat[!(dat$text %in% ignore), ])
    leaves <- V(g)[!degree(g, mode='out')]                             # tree leaves
    preds <- sapply(leaves, neighbors, g=g, mode="in")                 # their predecessors
    vertex_attr(g, 'label', preds) <- vertex_attr(g, 'label', leaves)  # bump labels up a level
    g <- g - leaves                                                    # remove the leaves
    gaps <- V(g)[!nchar(vertex_attr(g, 'label'))]                      # gaps where ()/{} were
    nebs <- c(sapply(gaps, neighbors, graph=g, mode='all'))            # neighbors of gaps
    g <- add_edges(g, nebs, color='blue')                              # edges around the gaps
    g <- g - V(g)[!nchar(vertex_attr(g, 'label'))]                     # remove leaves/gaps
    plot(g, layout=layout.reingold.tilford, ...)
    title(string, cex.main=2.5)
}

An example, slightly more nested expression. The animation shows how original tree is collapsed.

## Example string
library(igraph)
string <- "(a/{5})+(2*b+c)"

parseTree(string,  # plus some graphing stuff
          vertex.color="#FCFDBFFF", vertex.frame.color=NA,
          vertex.label.font=2, vertex.label.cex=2.5,
          vertex.label.color="darkred", vertex.size=25,
          asp=.7, edge.width=3, margin=-.05)

enter image description here


Popular Answer

The following gets most of the way there. It mimics pryr:::tree to recursively examine the call tree, then assigns data.tree Nodes. I would have preferred igraph but it is intolerant of duplicate node names (e.g. + appearing twice). I also cannot get dendrogram to label any of the branches other than the root.

#install.packages("data.tree")
library(data.tree)

make_tree <- function(x) {
  if (is.atomic(x) && length(x) == 1) {
    as.character(deparse(x)[1])
  } else if (is.name(x)) {
    x <- as.character(x)
    if (x %in% c("(", ")")) {
      NULL
    } else {
      x
    }
  } else if (is.call(x)) {
    call_items <- as.list(x)
    node <- call_items[[1]]
    call_items <- call_items[-1]
    while (as.character(node) == "(" && length(call_items) > 0) {
      node <- call_items[[1]]
      call_items <- call_items[-1]
    }
    if (length(call_items) == 0) 
      return(make_tree(node))
    call_node <- Node$new(as.character(node))
    for (i in 1:length(call_items)) {
      res <- make_tree(call_items[[i]])
      if (is.environment(res))
        call_node$AddChildNode(res)
      else
        call_node$AddChild(res)
    }
    call_node
  } else
    typeof(x)
}

tree <- make_tree(quote(a+(2*b+c)))
print(tree)
plot(as.dendrogram(tree, edgetext = T), center = T, type = "triangle", yaxt = "n")

Which gives a reasonable text output:

      levelName
1 +            
2  ¦--a        
3  °--+        
4      ¦--*    
5      ¦   ¦--2
6      ¦   °--b
7      °--c    

and a graphic. The multiplication symbol doesn't appear in the mid-tree node (I can't figure out why) but otherwise, I think this does the job. call tree plot



Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why