Dec 272012
 
Share...Tweet about this on Twitter2Share on Facebook0Share on Google+0Share on StumbleUpon0Share on LinkedIn0Flattr the authorPin on Pinterest0Share on Reddit0Share on TumblrDigg this

In the previous post I’ve described how to use the library and what are the main characteristics. For sure there is more to come and follow up, so stay tuned.

In this post, I would like to describe the big performance improvements made in the meantime, as I was not fully happy with the serialization performance in the first version.

As I have just submitted a new version of the source code as part of the change-set 18219 I would like to show you here some statistics regarding both object creation and serialization of those objects in JSON format expressed with concrete numbers.

Let’s start with some statistics

The following three charts show the creation and serialization of 1 million rows with the previous (v1) and current (v2) version of the Google DataTable .Net Wrapper library. I have run the same benchmark code 5 times in order to obtain more or less some sort of a significant average time. So, in all charts you see the time expressed in milliseconds for each of the runs.

Object Creation

Values in the below chart (smaller values are better) show that the new version improves quite a lot the object creation speed. The average time spent for creating objects for 5 runs in Version 1 was 3123 milliseconds while in the Version 2 it has been reduced to 2209 milliseconds, which is about 44% of speed improvement.

Google_DataTable_Object_Creation_Time

The code used for this benchmark:

DataTable dt = new DataTable();
var columnYear = new Column(ColumnType.Number, "Year");
var columnCount = new Column(ColumnType.String, "Count");

dt.AddColumn(columnYear);
dt.AddColumn(columnCount);

int ONE_MIO_ROWS = 1000000;

for (int i = 0; i < ONE_MIO_ROWS ; i++)
{
    var row = dt.NewRow();
    row.AddCellRange(new Cell[]
        {
        new Cell() {Value = 2012, Formatted = "2012"},
        new Cell() {Value = 1, Formatted = "1"}
    });
        
    dt.AddRow(row);
}


Object Serialization

In the first version of the library most of the time spent was to shape and obtain the correct object model. This is not yet perfect as by following the Google specification, the Property Map should be expressed as a NameValue pair, while in the library this is expressed as a simple string. This is definitively one additional thing to be changed in the future, but let’s speak about the present:)

I’m really proud to say that the average speed improvement is about 183%. Serializing to JSON 1 million rows now takes in average 2291 milliseconds in comparison to the previous 6467 milliseconds.

Google_DataTable_Object_Serialization_Time

Overall improvement

If we sum up all together both the serialization and the object creating we will get the overall improvement of 114%. From the average of 9590 milliseconds in the first version we have doubled the speed to 4500 milliseconds, as you may see here below.

Google_DataTable_Overall_Improvement

What has been changed?

Now the most interesting part. What has been changed in order to get (for me) so impressive speed improvement?

Here is a list of code changes:

StringBuilder vs StreamWriter (MemoryStream)

In the first version, in order to concatenate strings, that then would become the serialized Json, I used StringBuilder with few Extensions in order to “append” data in some specific cases, like if the property was null etc…
I’ve seen that the performance could have been improved if MemoryStream and StreamWriter combination vs StingBuilder were used. I’ve conducted some extra tests, and I have to say that theoretically the performance improvements would have been minimal, but as the goal was to squeeze as much milliseconds as possible i have anyway chosen to go with this approach.

So the code now looks something like this:

public string GetJson()
{
    string rowsJson;
    using (var ms = new MemoryStream())
    {
        var sw = new StreamWriter(ms);

        //... concatenating strings here.. omitted for brevity
        sw.Flush();
        ms.Position = 0;
        
        using (var sr = new StreamReader(ms))
        {
            rowsJson = sr.ReadToEnd();
            sr.Close();
        };
    }

    return "{" + rowsJson + "}";
}

In addition to this, I tried not to call the StreamWriter.Write() as less as possible, so any time I had a predefined string I would put it into one Write() call.

example:

StringWriter sw = new StringWriter(stream);

//don't do this
sw.Write("rows");
sw.Write("[");
sw.Write("...");

//do this
sw.Write("rows [ ...");


Using for(...) rather than foreach

foreach is a great syntactic sugar, and I have to admit that I always tend to use it in almost every application I write. It’s very easy to write and to understand and it makes the code prettier.
In general, “The for-loop is faster than the foreach-loop if the array must only be accessed once per iteration.“.
I usually access the variable only once per iteration, so for loop was working better for me.

Inline Code

In general, I tend to follow the good object oriented design where every object has it’s own role and behavior. But, I’ve noticed that instead of accessing the code by calling a function of the object itself, for instance Row.GetJson(), when calling everything inline in one function, the speed would improve. I didn’t have too much time to analyze how much indeed the difference was, but one thing is for sure, now the code is much more difficult to read :)

General small changes

Here and there I’ve seen several bottlenecks, and code that could be improved, so that instead of always calculating certain values (that are known) to reuse the cached ones etc…

Final thoughts

It was really a joy working on the performance improvements. The amount of things that one can discover during the process is incredible. A lot of things that one would consider “fast” suddenly becomes slow once a deeper analysis has been made.
One very important thing that I always follow is that the performance improvement is always left as a last thing to do because you really never know how long would it take, and usually this means that the original design is going to be a bit reconsidered. I personally consider a good design and maintainability of the code one of the most important things, the “really fast software” sometimes makes sense to spend time into, but sometimes don’t.
Happy coding.

Share...Tweet about this on Twitter2Share on Facebook0Share on Google+0Share on StumbleUpon0Share on LinkedIn0Flattr the authorPin on Pinterest0Share on Reddit0Share on TumblrDigg this

My name is Zoran Maksimovic a Software Developer and Solution Architect. I'm interested in Software Development, Object-Oriented Design and Software Architecture all this especially bound to the Microsoft.NET platform. Feel free to contact me or know more in the about section

Leave a Reply

blicker.carli@mailxu.com