Caching Compiled Expression tree

c# expression-trees

Question

How to efficiently cache methods compiled from an expression tree ?

public void SomeToStringCalls()
{
    ToString(i => (i + 1).ToString(), 1);
    ToString(i => (i + 1).ToString(), 2);
    ToString(i => (i + 2).ToString(), 3);
    ToString(i => (i + 2).ToString(), 4);
}

private string ToString<T>(Expression<Func<T, string>> expression, T input)
{
    var method = expression.Compile();
    return method.Invoke(input);
}

Above, every call is going to recompile each expression, even if some are identical. I can't have a Dictionary<Expression<Func<T, string>>, Func<T, string>>() caching the compiled method from the expression because the equals will fails.

1
24
8/27/2015 2:46:15 PM

Accepted Answer

I found quite some time ago this article, which is exposing pros & cons of expression caching (with constant extraction... which allows you to compile .Where(t=>t.prop==3) and .Where(t=>t.prop==5) to the same delegate).

5
3/25/2017 8:18:56 PM

Popular Answer

The problems with caching expression trees in a centralized cache are:

  1. You will need comprehensive equality comparison and hashing algorithms.
  2. You will need to implement these algorithms yourself, as the standard expression types do not provide them out of the box.

A comprehensive equality comparison will be expensive to perform, but the cost can be alleviated somewhat with a cheap hash function. Also, since expression trees are immutable, you can cache the hash code after you've calculated it for the first time. That may shave some time off lookups, but each time you check the cache using a newly created expression as the key (which, I would imagine, would be most of the time) you will at least need to hash the new expression.

Option 1: Local/Static Caching

An ideal solution would avoid all of this overhead. If it's feasible (i.e., if these expressions are not composed dynamically), your best bet is to simply cache the expression trees near their declaration sites. You should be able to store most (if not all) of these in static fields:

private static readonly Expression<Func<int, string>> _addOne =
    i => (i + 1).ToString();
private static readonly Expression<Func<int, string>> _addTwo =
    i => (i + 2).ToString();

public void SomeToStringCalls()
{
    ToString(_addOne, 1);
    ToString(_addOne, 2);
    ToString(_addTwo, 3);
    ToString(_addTwo, 4);
}

The downside is that you may end up with duplicate expressions across various types, but if you are not generating a very large number of expressions, this may not be an issue.

Option 2: Centralized Caching

If that's not an option for you, you'll have to implement a centralized cache and the hashing and equality algorithms required to do so. The hashing algorithm will help you narrow down your candidate set, so it's important that it work reasonably well, i.e., produce very few collisions in practice. However, given the complex nature of expression trees, you want to keep the costs down. Therefore, I would advise you not aim for a perfect hashing function, but one that is reasonably cheap, yet effective. Perhaps something like this:

internal static class ExpressionHasher
{
    private const int NullHashCode = 0x61E04917;

    [ThreadStatic]
    private static HashVisitor _visitor;

    private static HashVisitor Visitor
    {
        get
        {
            if (_visitor == null)
                _visitor = new HashVisitor();
            return _visitor;
        }
    }

    public static int GetHashCode(Expression e)
    {
        if (e == null)
            return NullHashCode;

        var visitor = Visitor;

        visitor.Reset();
        visitor.Visit(e);

        return visitor.Hash;
    }

    private sealed class HashVisitor : ExpressionVisitor
    {
        private int _hash;

        internal int Hash
        {
            get { return _hash; }
        }

        internal void Reset()
        {
            _hash = 0;
        }

        private void UpdateHash(int value)
        {
            _hash = (_hash * 397) ^ value;
        }

        private void UpdateHash(object component)
        {
            int componentHash;

            if (component == null)
            {
                componentHash = NullHashCode;
            }
            else
            {
                var member = component as MemberInfo;
                if (member != null)
                {
                    componentHash = member.Name.GetHashCode();

                    var declaringType = member.DeclaringType;
                    if (declaringType != null && declaringType.AssemblyQualifiedName != null)
                        componentHash = (componentHash * 397) ^ declaringType.AssemblyQualifiedName.GetHashCode();
                }
                else
                {
                    componentHash = component.GetHashCode();
                }
            }

            _hash = (_hash * 397) ^ componentHash;
        }

        public override Expression Visit(Expression node)
        {
            UpdateHash((int)node.NodeType);
            return base.Visit(node);
        }

        protected override Expression VisitConstant(ConstantExpression node)
        {
            UpdateHash(node.Value);
            return base.VisitConstant(node);
        }

        protected override Expression VisitMember(MemberExpression node)
        {
            UpdateHash(node.Member);
            return base.VisitMember(node);
        }

        protected override MemberAssignment VisitMemberAssignment(MemberAssignment node)
        {
            UpdateHash(node.Member);
            return base.VisitMemberAssignment(node);
        }

        protected override MemberBinding VisitMemberBinding(MemberBinding node)
        {
            UpdateHash((int)node.BindingType);
            UpdateHash(node.Member);
            return base.VisitMemberBinding(node);
        }

        protected override MemberListBinding VisitMemberListBinding(MemberListBinding node)
        {
            UpdateHash((int)node.BindingType);
            UpdateHash(node.Member);
            return base.VisitMemberListBinding(node);
        }

        protected override MemberMemberBinding VisitMemberMemberBinding(MemberMemberBinding node)
        {
            UpdateHash((int)node.BindingType);
            UpdateHash(node.Member);
            return base.VisitMemberMemberBinding(node);
        }

        protected override Expression VisitMethodCall(MethodCallExpression node)
        {
            UpdateHash(node.Method);
            return base.VisitMethodCall(node);
        }

        protected override Expression VisitNew(NewExpression node)
        {
            UpdateHash(node.Constructor);
            return base.VisitNew(node);
        }

        protected override Expression VisitNewArray(NewArrayExpression node)
        {
            UpdateHash(node.Type);
            return base.VisitNewArray(node);
        }

        protected override Expression VisitParameter(ParameterExpression node)
        {
            UpdateHash(node.Type);
            return base.VisitParameter(node);
        }

        protected override Expression VisitTypeBinary(TypeBinaryExpression node)
        {
            UpdateHash(node.Type);
            return base.VisitTypeBinary(node);
        }
    }
}

It's not perfect, but it should get you pretty good results:

  1. It drills down and includes every expression in the tree.
  2. At minimum, the NodeType of every sub-expression is included in the hash. One obvious (but potentially costly) improvement would be to also include the Type; try it if you find you are getting too many collisions.
  3. Members and types referenced in the expression are included.
  4. Constant values appearing in the tree are included.
  5. It doesn't require a heap allocation to run, at the expense of being non-reentrant (since you're only analyzing a top-level expression, this is fine). You can run it concurrently on multiple threads.

Since you're not actually overriding GetHashCode() for any of the expression types, the imperfections of the hashing function will not leak out and affect external code. This gives us a degree of freedom in bending the rules of hash functions.

Your equality comparison will need to be more comprehensive. Whereas the hashing function is effectively an 'estimate' used to minimize your candidate set, the equality comparison performs the actual matching, and it needs to be perfect. You could certainly use my proposed hashing solution as a template for how to approach the problem, but bear in mind you must perform an exhaustive comparison of all an expression's attributes. One thing to bear in mind is that you probably don't want to compare the names of the ParameterExpression nodes in the tree, but you'll want to maintain a mapping of the parameters/variables in the two trees you are comparing to make sure they represent the "same" value in the context of their parent expression tree.

With the exception of ignoring parameter/variable names, do not bother attempting to resolve "semantic equivalence", i.e., expressions which produce the same results and side effects but are not structurally identical. It cannot be done efficiently, and it's not worth it to try.

Lastly, you might be able to speed things up by implementing a two-level lookup: first, choose the correct cache based on the parameter type(s), then look for a match within that cache. This approach would be most effective if it is guaranteed that each lambda expression will have exactly one argument. The benefits would degrade with more arguments.



Related Questions





Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow