Performance

Aug 15, 2010 at 10:11 PM
Edited Aug 15, 2010 at 10:18 PM

I recently found your project and love it! So much nicer to have Linq in Javascript.

I was curious about the performance though and for the All method I discovered in my test it was approximately 308% slower than the standard for loop. I dug around some and discovered some areas for improvement. The ArrayEnumerable.GetEnumerator method I changed to this and in my test my query was only 144% slower than the for loop:

ArrayEnumerable.prototype.GetEnumerator = function ()
{
    var source = this.source;
    var index = -1;

    return {
        Current : function() { return source[index] },
        MoveNext : function () { return ++index < source.length; },
        Dispose : Functions.Blank
    };
}

I was able to get the All query down to only 60% slower by making it this:

All: function (predicate)
{
    predicate = Utils.CreateLambda(predicate);

    var result = true;
    if(this instanceof ArrayEnumerable)
    {
        for(var i = 0; i < this.Count(); i++)
        {
            if(!predicate(this.ElementAt(i)))
            {
                result = false;
                break;
            }
        }
    }
    else
    {
        this.ForEach(function (x)
        {
            if (!predicate(x))
            {
                result = false;
                return false; // break
            }
        });
    }
    return result;
},

I didn't spend any time examining the other methods for possible speed up, but thought I'd at least pass along what I had done to improve performance.

 

Coordinator
Aug 16, 2010 at 5:04 PM
Edited Aug 16, 2010 at 5:31 PM

Thank you!

your ArrayEnumerable.prototype.GetEnumerator code is good!
Dispose is not necessary to ArrayEnumerable, certainly.
and ArrayEnumerable can avoid some overheads.

All is not so good.
override ArrayEnumerable.prototype.All

// ArrayEnumerable inheritance Enumerable
// .prototype.Xxx is override
ArrayEnumerable.prototype.All = function (predicate)
{
    predicate = Utils.CreateLambda(predicate);
    var array = this.source;
    for (var i = 0, len = array.length; i < len; i++)
    {
        if (!predicate(array[i])) return false;
    }
    return true;
}

this code is called directly, not use GetEnumerator.

ArrayEnumerable should do overriding of all methods ideally.
But, balance with the enlargement of the cord size is hard.

//

performance...
I tested some benchmarks -> http://act.neue.cc/bench/

Aug 20, 2010 at 5:41 AM
Edited Aug 20, 2010 at 5:45 AM

Hey, glad you found some of my code useful. :) Your All method is better, admittedly when I did that I was still new to your code and where to put it; your way is cleaner.

I'm writing again because I was doing sorting in my app, a Firefox extension actually, and got curious about performance again! I'll say upfront it's not slow at all when doing one iteration of a query, but for my tests I did them 1000 or 10000 times to magnify any differences. So when I say the Linq way is x % slower that's how I'm getting my numbers. Going from 1ms to 1ms isn't a perceptible gain, but being a tool that'll be used a lot I think any performance gains are worth it.

For my test cases I did a Where().OrderBy() and I went from (times in milliseconds):

Loop: 193
Linq (old): 866 (349% slower)

to:

Loop: 194
Linq (new): 454 (134% slower)

I could really tell a difference when I just did an OrderBy.

Loop: 37
Linq (old): 770 (1981% slower)

to:

Loop: 36
Linq (new): 193 (436% slower)

The general gist of what I did was to modify the OrderedEnumerable.GetEnumerator method. Instead of making and sorting an index I sort the source array in place. I wanted to eliminate as many loops as possible. An array of keys also isn't generated for each sort field, instead I construct a function to pass to the Array.sort() method that contains the keySelector function. The advantage I saw is that you could eliminate generating the keys for secondary sort fields until you actually needed them.

I did some optimizations to this general idea based upon what I saw that the fewer number of statements the better the execution time. Probably the biggest optimization is based upon recognizing that a keySelector passed as a string can't have a closure and so I included the text of the keySelector into the function itself. The way I helped this process was to add some properties to the keySelector function at its creation time. I did it this way since the function arguments and body were already parsed and so was less work than trying to parse it out within OrderedEnumerable.GetEnumerator. Overall this optimization resulted in about 100ms improvement versus a keySelector passed directly as a function.

So enough talking, here's my code:

OrderedEnumerable.prototype.GetEnumerator = function ()
{
    var self = this;
    var buffer;
    var indexes;
    var index = -1;

    return {
        _Initialized : false,
        Current : function() { return buffer[index]; },
        MoveNext : function() {
            if(!this._Initialized) {
                this._Initialized = true;

                var sortContext = SortContext.Create(self, null);
                var func;
                var context;
                var isConstructedFunction = false;
                var isMultipleSort = sortContext.child != null;

                buffer = self.source instanceof ArrayEnumerable ? self.source.source : self.source.ToArray();

                context = sortContext;
                while(context != null) {
                    if(!context.keySelector.Constructed)
                        break;
                    context = context.child;
                    if(context == null)
                        isConstructedFunction = true;
                }

                context = sortContext;
                if(isConstructedFunction) {
                    func = isMultipleSort ? "var result = " : "return ";
                    while(context != null) {
                        func += "Enumerable.Utils.Compare("
                        + context.keySelector.Body.replace(context.keySelector.Arguments[0], "a")
                        + ", " + context.keySelector.Body.replace(context.keySelector.Arguments[0], "b")
                        + ")"
                        + (context.descending ? " * -1" : "")
                        + ";"
                        ;
                        if(context.child != null)
                            func += "if(result == 0) result = "
                        context = context.child;
                    }
                    if(isMultipleSort)
                        func += "return result;"

                    func = new Function("a", "b", func);
                }
                else {
                    if(isMultipleSort) {
                        func = function(a, b) {
                            var result;
                            while(sortContext != null) {
                                result = Enumerable.Utils.Compare(sortContext.keySelector(a), sortContext.keySelector(b)) * (sortContext.descending ? -1 : 1);
                                if(result != 0)
                                    break;
                                sortContext = sortContext.child;
                            }
                            return result;
                        }
                    }
                    else
                        func = function (a, b) { return Enumerable.Utils.Compare(sortContext.keySelector(a), sortContext.keySelector(b)) * (sortContext.descending ? -1 : 1) }
                }
                buffer.sort(func);
            }
            return ++index < buffer.length;
        },
        Dispose : Functions.Blank
    };
}

CreateLambda: function (expression) { if (expression == null) return Functions.Identity; if (typeof expression == Types.String) { if (expression == "") { return Functions.Identity; } else if (expression.indexOf("=>") == -1) { var func = new Function("$,$$,$$$,$$$$", "return " + expression); func.Constructed = true; func.Arguments = ["$", "$$", "$$$", "$$$$"]; func.Body = expression; return func; } else { var expr = expression.match(/^[(\s]*([^()]*?)[)\s]*=>(.*)/); var func = new Function(expr[1], "return " + expr[2]); func.Constructed = true; func.Arguments = expr[1].split(/,\s*/); func.Body = expr[2]; return func; } } return expression; },

I had to expose the Utils class to access the Compare method so I added the following:

Enumerable.Utils = Utils;

I put it directly after the Utils class definition for anyone else reading this.

 

Coordinator
Aug 23, 2010 at 5:03 PM
Edited Aug 23, 2010 at 5:18 PM

Thanks, this is interesting idea!

but several points to be worried about.
Linq's OrderBy must be Stable Sort.
Opera and Chrome's array.sort is not stable.
for example

var ar = Enumerable.Range(1, 100).ToArray();
ar.sort(function (x, y) { return 0 });
alert(ar); // 1,2,3,4,98,77,83,58,5,...(Opera/Chrome)

// check Opera/Chrome
Enumerable.Range(1, 100)
    .OrderBy(function (x) { return 0 })
    .WriteLine();

Therefor, indexes is necessary to OrderBy for stable sort.

and one.

buffer = self.source instanceof ArrayEnumerable ? self.source.source : self.source.ToArray();
// snip...
buffer.sort(func);

This line is not so good.
for example

var array = [5, 3, 9];
Enumerable.From(array).OrderBy().Force();
alert(array); // 3, 5, 9(side-effects!)

Linq always must be non-destructive to original source.

//

benchmark C# Linq - C# for loop.
http://ideone.com/Hzfzh
linq is slow:)
if pursue highest-performance then can't use linq, both C# and JavaScript.
but, perfomance is a matter in particular in the library level.
There are a lot of places that should renew.
so OrderBy

indexes = Enumerable.Range(0, buffer.length).ToArray();
= Enumerable.ToInfinity(0, 1).Take(buffer.length).ToArray()

// this line is so simple but much expensive costs...
// replace to

var indexes = new Array(buffer.length);
for (var i = 0, len = buffer.length; i < len; i++)
{
indexes[i] = buffer[i];
}

Even this seems to change considerably.

The measures of tricks do not extend to the evolution of the JavaScript engine after all.
e.g. Google Chrome vs IE8...
I sometimes think that every JS engine should be V8.

Aug 24, 2010 at 4:16 AM

Okay, fair points about stable non-destructive sorting. :) How about this:

OrderedEnumerable.prototype.GetEnumerator = function ()
{
    var self = this;
    var buffer;
    var indexes;
    var index = 0;

    return {
        _Initialized : false,
        Current : function() { return buffer[index].item; },
        MoveNext : function() {
            if(!this._Initialized) {
                this._Initialized = true;
                index = -1;

                var sortContext = SortContext.Create(self, null);
                var isConstructedFunction = false;
                var isMultipleSort = sortContext.child != null;
                var func;
                var context;

                buffer = [];
                self.source.ForEach(function(item, index) { buffer.push({ index : index, item : item }) });

                context = sortContext;
                while(context != null) {
                    if(!context.keySelector.Constructed)
                        break;
                    context = context.child;
                    if(context == null)
                        isConstructedFunction = true;
                }

                context = sortContext;
                if(isConstructedFunction) {
                    func = "var result = ";
                    while(context != null) {
                        func += "Enumerable.Utils.Compare("
                        + context.keySelector.Body.replace(context.keySelector.Arguments[0], "a.item")
                        + ", " + context.keySelector.Body.replace(context.keySelector.Arguments[0], "b.item")
                        + ")"
                        + (context.descending ? " * -1" : "")
                        + ";"
                        ;
                        if(context.child != null)
                            func += "if(result == 0) result = "
                        context = context.child;
                    }
                    func += "if(result == 0) result = Enumerable.Utils.Compare(a.index, b.index); return result;";
                    func = new Function("a", "b", func);
                }
                else {
                    func = function(a, b) {
                        var result;
                        while(sortContext != null) {
                            result = Enumerable.Utils.Compare(sortContext.keySelector(a.item), sortContext.keySelector(b.item)) * (sortContext.descending ? -1 : 1);
                            if(result != 0)
                                break;
                            sortContext = sortContext.child;
                        }
                        if(result == 0)
                            result = Enumerable.Utils.Compare(a.index, b.index);
                        return result;
                    }
                }
                buffer.sort(func);
            }
            return ++index < buffer.length;
        },
        Dispose : Functions.Blank
    };
}

There's still only one loop through the source collection but this time I'm adding an object to the buffer that has the index and item to be sorted, I modified the rest of the code appropriately. Sorting this way is stable and non-destructive to the source collection. For my tests it was about 110ms slower than my previously suggested way -- average times of 304ms -- but it still comes out faster than the original algorithm.

The speed of Linq is definitely dependent on whatever it is you're doing, even the results I've posted in an earlier query show this. I personally still prefer to use it in my programs, C#, because of the expressiveness. It is much much clearer to read "Where something, select some items" than deciphering a loop. So even though it's slower than a straight loop I'd like to help speed up your implementation to help adoption for JavaScript programs.

I'll tell you another thing too. Since discovering your work I've been preferring to sort via a keySelector than specifying a sort function. OrderBy("$.Name") is much clearer than sort(function(a, b) { return Utility.Compare(a.Name, b.Name) }). :P

Coordinator
Aug 31, 2010 at 7:12 PM

Sorry, one what I forgot to say.
OrderBy is like Schwartzian transform.
http://en.wikipedia.org/wiki/Schwartzian_transform

Schwartzian transform has some trade-off.
and more, there is some influence.
for example, in C# this is simply shuffle pattern.

var rand = new Random();
var shuffled = source.OrderBy(_ => rand.Next());

confirm it.

<script src="linq.js" type="text/javascript"></script>
<script type="text/javascript">
    // shuffle and extract first value 10000 times, and check clustered.
    var RandomizeTest = function (generator)
    {
        Enumerable.Generate(function () { return generator() }, 10000)
            .GroupBy()
            .OrderBy("$.Key()")
            .WriteLine(function (g) { return g.Key() + ":" + g.Count() });
    }

    // 0:6609, 1:967, 2:1458, 3:98, ...
    document.writeln("sort - math.random" + "<br />");
    RandomizeTest(function ()
    {
        var array = Enumerable.RangeTo(0, 9).ToArray();
        array.sort(function () { return Math.random() < 0.5 });
        return array[0];
    });

    // 0:968, 1:1007, 2:974, 3:1053, ...
    document.writeln("shuffle" + "<br />");
    RandomizeTest(function ()
    {
        return Enumerable.RangeTo(0, 9)
            .Shuffle()
            .First();
    });

    // 0:985, 1:1016, 2:946, 3:989, ...
    document.writeln("orderby rand" + "<br />");
    RandomizeTest(function ()
    {
        return Enumerable.RangeTo(0, 9)
            .OrderBy(function (x) { return Math.random() })
            .First();
    });
</script>
<script src="linq_mod_by_zzhumphreyt.js" type="text/javascript"></script>
<script type="text/javascript">
    // 0:500, 1:919, 2:880, 3:1367, ...
    document.writeln("modified:orderby rand" + "<br />");
    RandomizeTest(function ()
    {
        return Enumerable.RangeTo(0, 9)
            .OrderBy(function (x) { return Math.random() })
            .First();
    });
</script>

all appearance number of times must be around 1000.

//

I am worried about the slightly complicated thing.
The complexity is easy to produce a mistake.
I think that the expansion of adhoc Lambda only for OrderBy is not good thing.

However, your suggestion had many splendid points!
Next release, i will replace ArrayEnumerable.GetEnumerator
and OrderedEnumerable.GetEunmerator for improve performance.

... simply fixed(only replaced a part with a loop...)

OrderedEnumerable.prototype.GetEnumerator = function ()
{
    var self = this;
    var buffer;
    var indexes;
    var length;
    var initialized = false;
    var index = -1;

    return {
        Current: function () { return buffer[indexes[index]]; },
        MoveNext: function ()
        {
            if (!initialized)
            {
                initialized = true;
                buffer = [];
                indexes = [];
                // thanks idea!
                self.source.ForEach(function (item, index)
                {
                    indexes.push(index);
                    buffer.push(item);
                });
                length = buffer.length;
                sortContext = SortContext.Create(self, null);
                sortContext.GenerateKeys(buffer);

                indexes.sort(function (a, b) { return sortContext.Compare(a, b); });
            }
            return ++index < length;
        },
        Dispose: Functions.Blank
    };
}

// modified : source is always array
SortContext.prototype.GenerateKeys = function (source)
{
    var len = source.length;
    var keySelector = this.keySelector;
    var keys = new Array(len);
    for (var i = 0; i < len; i++) keys[i] = keySelector(source[i]);
    this.keys = keys;

    if (this.child != null) this.child.GenerateKeys(source);
}

and benchmark

<script src="linq.js" type="text/javascript"></script>
<script type="text/javascript">
    var bench = function (action)
    {
        var now = new Date();
        for (var i = 0; i < 100; i++) action();
        return (new Date() - now) + "ms";
    }

    var data = Enumerable.Range(1, 100)
        .Shuffle()
        .Select(function (x) { return { Key: x, Value: x + "aaa"} })
        .ToArray();

    var original = bench(function () { Enumerable.From(data).OrderBy("$.Key").Force() });
    document.writeln("Original:" + original + "<br />");
</script>
<script src="linq_mod_by_zzhumphreyt.js" type="text/javascript"></script>
<script type="text/javascript">
    var zzhumphreyt = bench(function () { Enumerable.From(data).OrderBy("$.Key").Force() });
    document.writeln("mod zzhumphreyt:" + zzhumphreyt + "<br />");   
</script>
<script src="linq_mod_by_neuecc.js" type="text/javascript"></script>
<script type="text/javascript">
    var neuecc = bench(function () { Enumerable.From(data).OrderBy("$.Key").Force() });
    var tes = Enumerable.From(data).OrderBy("$.Key").Select("$.Key").ToArray();
    document.writeln("mod neuecc:" + neuecc + "<br />");
</script>

result___

Original:433ms
mod zzhumphreyt:170ms
mod neuecc:253ms

This fix is not best at all, but the characteristic is the same as C#.
and faster than the original.

//

I thank you very much.
I am very glad that watched source code.
and useful advice!