Even faster String.prototype.trim() implementation in JavaScript.

I’ll begin with a quote:

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

Those are the words of one of my favorite hackers, Jamie Zawinski.

Most JavaScript trim() implementations you see around are based on regular expressions and work fairly well for infrequent use on short strings. Some of them are well-crafted, but I think plain old loops can do better. Back in 2007, Steve Levithan covered some really fast implementations of trim() followed by a few more articles by others, including Luca Guidi.

Many of these implementations were based on regular expressions and a few were exceptionally brilliant. But did they answer the question of speed? In one of the comments posted on Steve Levithan’s blog there was a little gem written by Michael Lee Finney, and truly that little function proves to be the fastest of all of the implementations mentioned in those articles. Finney claimed to better previous implementations by 20x, and he was right. His code totally smoked the regular-expression based implementations.

The Details

Arrays in JavaScript behave like hash-tables and hence exhibit the property of being sparse.  Finney’s implementation exploits this feature of JavaScript to its fullest.  Here is a lookup table he uses to mark characters as whitespace:

String.whiteSpace = [];
String.whiteSpace[0x0009] = true;
String.whiteSpace[0x000a] = true;
String.whiteSpace[0x000b] = true;
String.whiteSpace[0x000c] = true;
String.whiteSpace[0x000d] = true;
String.whiteSpace[0x0020] = true;
String.whiteSpace[0x0085] = true;
String.whiteSpace[0x00a0] = true;
String.whiteSpace[0x1680] = true;
String.whiteSpace[0x180e] = true;
String.whiteSpace[0x2000] = true;
String.whiteSpace[0x2001] = true;
String.whiteSpace[0x2002] = true;
String.whiteSpace[0x2003] = true;
String.whiteSpace[0x2004] = true;
String.whiteSpace[0x2005] = true;
String.whiteSpace[0x2006] = true;
String.whiteSpace[0x2007] = true;
String.whiteSpace[0x2008] = true;
String.whiteSpace[0x2009] = true;
String.whiteSpace[0x200a] = true;
String.whiteSpace[0x200b] = true;
String.whiteSpace[0x2028] = true;
String.whiteSpace[0x2029] = true;
String.whiteSpace[0x202f] = true;
String.whiteSpace[0x205f] = true;
String.whiteSpace[0x3000] = true;

The implementation of the function simply indexes into this array to check whether any character encountered while traversing the string is valid whitespace. The function runs noticeably faster. Here’s his code:

function trim14(str) {
    var len = str.length, whiteSpace, i;

    if (!len) {
        return str;
    }

    whiteSpace = String.whiteSpace;

    if (len && whiteSpace[str.charCodeAt(len-1)])
    {
        do{
            --len;
        }
        while (len && whiteSpace[str.charCodeAt(len - 1)]);

        if (len && whiteSpace[str.charCodeAt(0)]){
            i = 1;
            while (i < len && whiteSpace[str.charCodeAt(i)]){
                ++i;
            }
        }
        return str.substring(i, len);
    }
    if (len && whiteSpace[str.charCodeAt(0)]) {
        i = 1;
        while (i < len && whiteSpace[str.charAt(i)]){
            ++i;
        }
        return str.substring(i, len);
    }
    return str;
};

Neatly written. I did spot a few unnecessary checks and removed them for a retest. The following code shaved off a couple more milliseconds sometimes.

function trim16(str) {
    var len = str.length, whiteSpace = String.whiteSpace, i = 0;

    if (len) {
        if (whiteSpace[str.charCodeAt(len - 1)]) {
            // Remove from the end.
            while (--len && whiteSpace[str.charCodeAt(len - 1)]);

            // Remove from the beginning.
            if (len && whiteSpace[str.charCodeAt(0)]){
                i = 1;
                while (i < len && whiteSpace[str.charCodeAt(i)]){
                    ++i;  // Keep this here.
                }
            }
            return str.substring(i, len);
        }

        // Remove from the beginning.
        if (whiteSpace[str.charCodeAt(0)]) {
            i = 1;
            while (i < len && whiteSpace[str.charAt(i)]){
                ++i;
            }
            return str.substring(i, len);
        }
    }

    return str;
};

Can this be any better and faster?

I think it can. There’s repetition in the code that could be removed as well.  I rewrote the function to use Finney’s look up table and ended up with this:

function trim17(str){
    var len = str.length;
    if (len){
        var whiteSpace = String.whiteSpace, i = 0;
        while (whiteSpace[str.charCodeAt(--len)]);
        if (++len){
            while (whiteSpace[str.charCodeAt(i)]){ ++i; }
        }
        str = str.substring(i, len);
    }
    return str;
}

Guess what? Here are the benchmark results from a Chrome developer build running on Linux with 100000 iterations:

Original length: 27663
trim10: 363ms (length: 27656)
trim11: 4668ms (length: 27656)
trim12: 4671ms (length: 27656)
trim13: 322ms (length: 27656)
trim14: 197ms (length: 27656)
trim15: 195ms (length: 27656)
trim16: 197ms (length: 27656)
trim17: 187ms (length: 27656)

trim17 is indeed even faster and shorter.

Updates

Nikolay “MadRabbit” V. Nemshilov, the author of RightJS (a beautiful JavaScript library), provides another implementation of the function:

function trim19(str){
    var str = str.replace(/^\s\s*/, ''),
        ws = /\s/,
        i = str.length;
    while (ws.test(str.charAt(--i)));
    return str.slice(0, i + 1);
}

which is faster than many other regexp-based implementations.

I have included that in the benchmark as well and generated a couple graphs.

Average performance of all trim() implementations across major platforms and browsers.

Average performance of all trim() implementations across major platforms and browsers (Smaller is better).

Average performance of the latter half of the set of trim() implementations.

Average performance of the latter half of the set of trim() implementations (Smaller is better).

Feel free to use my code under the terms of the MIT license. As usual, suggestions and constructive criticism are most welcome.

About these ads

About this entry