Compiled C# lambda expression performance with imbrication

c# expression-trees lambda linq

Question

Considering this class:

/// <summary>
/// Dummy implementation of a parser for the purpose of the test
/// </summary>
class Parser
{
    public List<T> ReadList<T>(Func<T> readFunctor)
    {
        return Enumerable.Range(0, 10).Select(i => readFunctor()).ToList();
    }

    public int ReadInt32()
    {
        return 12;
    }

    public string ReadString()
    {
        return "string";
    }
}

I try to generate the following call with a compiled lambda expression tree:

Parser parser = new Parser();
List<int> list = parser.ReadList(parser.ReadInt32);

However, the peformance is not quite the same...

class Program
{
    private const int MAX = 1000000;

    static void Main(string[] args)
    {
        DirectCall();
        LambdaCall();
        CompiledLambdaCall();
    }

    static void DirectCall()
    {
        Parser parser = new Parser();
        var sw = new Stopwatch();
        sw.Start();
        for (int i = 0; i < MAX; i++)
        {
            List<int> list = parser.ReadList(parser.ReadInt32);
        }
        sw.Stop();
        Console.WriteLine("Direct call: {0} ms", sw.ElapsedMilliseconds);
    }

    static void LambdaCall()
    {
        Parser parser = new Parser();
        var sw = new Stopwatch();
        sw.Start();
        for (int i = 0; i < MAX; i++)
        {
            List<int> list = parser.ReadList(() => parser.ReadInt32());
        }
        sw.Stop();
        Console.WriteLine("Lambda call: {0} ms", sw.ElapsedMilliseconds);
    }

    static void CompiledLambdaCall()
    {
        var parserParameter = Expression.Parameter(typeof(Parser), "parser");

        var lambda = Expression.Lambda<Func<Parser, List<int>>>(
            Expression.Call(
                parserParameter,
                typeof(Parser).GetMethod("ReadList").MakeGenericMethod(typeof(int)),
                Expression.Lambda(
                    typeof(Func<int>),
                    Expression.Call(
                        parserParameter,
                        typeof(Parser).GetMethod("ReadInt32")))),
            parserParameter);
        Func<Parser, List<int>> func = lambda.Compile();

        Parser parser = new Parser();
        var sw = new Stopwatch();
        sw.Start();
        for (int i = 0; i < MAX; i++)
        {
            List<int> list = func(parser);
        }
        sw.Stop();
        Console.WriteLine("Compiled lambda call: {0} ms", sw.ElapsedMilliseconds);
    }
}

These are the results in milliseconds on my computer :

Direct call:          647 ms
Lambda call:          641 ms
Compiled lambda call: 5861 ms

I don't understand why the compiled lambda call is so slow.

And I forgot to say that my test is run in release mode with the "Optimize Code" option enabled.

Update: Changed benchmarking based on DateTime.Now to Stopwatch.

Does anyone know how to tweak the lambda expression to obtain a better performance in the compiled lambda call ?

Accepted Answer

The test is invalid for two reasons:

DateTime.Now isn't accurate enough for micro-benchmarking short tests.

Use the Stopwatch class instead. When I do so, I get the following results (using MAX = 100000), in milliseconds:

Lambda call: 86.3196
Direct call: 74.057
Compiled lambda call: 814.2178

Indeed, the "direct call" is faster than the "lambda call", which makes sense - the "direct call" involves calls to a delegate that refers directly to a method on a Parser object. The "lambda call" requires a call to a delegate that refers to a method on a compiler-generated closure object, which in turn calls the method on the Parser object. This extra indirection introduces a minor speed-bump.


The "Compiled lambda call" isn't the same as the "Lambda call"

The "Lambda" looks like this:

() => parser.ReadInt32()

whereas the "Compiled lambda" looks like this:

parser => parser.ReadList(() => parser.ReadInt32())

There's an extra step in there: To create the embedded delegate for the inner lambda. In a tight loop, this is expensive.

EDIT:

I went ahead and inspected the IL of the "lambda" vs the "compiled lambda" and decompiled them back to "simpler" C# (see: Viewing the IL code generated from a compiled expression).

For the "non compiled" lambda, it looks like this:

for (int i = 0; i < 100000; i++)
{
    if (CS$<>9__CachedAnonymousMethodDelegate1 == null)
    {
        CS$<>9__CachedAnonymousMethodDelegate1 = new Func<int>(CS$<>8__locals3.<LambdaCall>b__0);
    }

    CS$<>8__locals3.parser.ReadList<int>(CS$<>9__CachedAnonymousMethodDelegate1);
}

Note that a single delegate is created once and cached.

Whereas for the "compiled lambda", it looks like this:

Func<Parser, List<int>> func = lambda.Compile();
Parser parser = new Parser();
for (int i = 0; i < 100000; i++)
{
    func(parser);
}

Where the target of the delegate is:

public static List<int> Foo(Parser parser)
{
    object[] objArray = new object[] { new StrongBox<Parser>(parser) };
    return ((StrongBox<Parser>) objArray[0]).Value.ReadList<int>
      (new Func<int>(dyn_type.<ExpressionCompilerImplementationDetails>{1}lambda_method));
}

Note that although the "outer" delegate is created only once and cached, a new "inner" delegate is created on every iteration of the loop. Not to mention other allocations for the object array and the StrongBox<T> instance.


Popular Answer

  1. The primary reason the compiled lambda is slower is because the delegate is created over and over again. Anonymous delegates are a special breed: they are only used in one location. So the compiler can do some special optimizations, like caching the value the first time the delegate is called. This is what is happening here.

  2. I was not able to reproduce the large difference between the direct call and the lambda call. In fact, in my measurements the direct call is slightly faster.

When doing benchmarks like this, you may want to use a more accurate timer. The Stopwatch class in System.Diagnostics is ideal. You may also want to increase your number of iterations. The code as is only runs for a few milliseconds.

Also, the first of the three cases will incur a slight overhead from JIT'ing the Parser class. Try running the first case twice and see what happens. Or better still: use the number of iterations as a parameter in each method, and call each method first for 1 iteration, so they all start on a level playing field.




Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why