Part 2: The Swiss Army Knife of Data Structures … in C#

Update: In a personal communication with Eric Lippert it was revealed that, in his words:
 
"I worked up a full implementation as well but I decided that it was too complicated to post in the blog. What I was really trying to get across was that immutable data structures were possible and not that hard; a full-on finger tree implementation was a bit too much for that purpose."
 
I am happy to correct my post and acknowledge that a previous C# implementation did exist. Hopefully, Eric will publish his implementation.
So, the work discussed here is most probably the first published Finger Tree  implementation in C#.
 
 
In Part 1 I described my experience in implementing the Finger Tree data structure in C#.
 
Here is a comparison of the performance of some common operations on sequences of length 100,000 as implemented by the C# Finger Tree and the .NET 3.5 Enumerable extension methods (applied on a List<UInt32> object).
 
Disclaimer: This comparison was carried out on my 4-year old PC, having CPU speed of 3GHz and RAM of 2GB and running Windows XP.  Results will largely vary on other configurations, depending on CPU and memory speed, available memory, cache size, …, etc. Another factor to consider is the time necessary for jitting the .NET IL code on initial method execution. However the observed ratios (speedup) should remain closely the same. Below is a table, summarizing the results:
 

Performance Comparison:
FSeq   vs.
 .NET Collection with 100,000 elements

IEnumerable<uint>         vs.                 FSeq

 

Method

.NET Av. Time

FSeq Av. time

FSeq SpeedUp

Concat()

5268 µs

56 µs

94.07

ElementAt(),

[ ]

1 µs

46 µs

Skip()

2022 µs

49 µs

41.27

Take()

1513 µs

50 µs

30.26

Reverse()

4859 µs

93 µs

52.25

 

 

 
As seen from this comparison, using a Finger Tree as a general sequence data structure in .NET may speed up sequence operations (roughly) 30 to 94 times (extreme cases to more than 100 times). The only exception is element access, and only in the case of an array (note that the List<T> class in .NET is actually an ArrayList). If the instance on which the Enumerable methods are called is of some other type, say LinkedList<T>, then the speedup for all operations would be even bigger. Because the tested methods of Enumerable are using deferred execution, it was necessary to add an additional operation that references an item located the end of the result-sequence. 
 
Another problem I had to deal with was how to test for "typical" or "average" operations. In other words, how to make the tests representative. I ended up with the decision to include in the Concat(), Insert(), Remove() and Substr() tests strings with a maximum variety of lengths. What in contrast I call extreme cases is when the sequences/strings to be inserted/concatenated/removed/extracted are all with the same maximum length (close to the length of the first collection/string, on which the operation is performed — in this case close to 100,000)
 
  

 

Another interesting comparison can be made if a string is represented as a sequence of characters. As the table below shows, for long strings the potential speedup of using a Finger-tree-based string implementation is 3 to 5 times (extreme cases 5 to 25 times):
 

Performance Comparison:
FString   vs.
 .NET string with 100,000 elements

string               vs.            FString

 

Method

.NET Av. Time

FString Av. time

FSeq SpeedUp

Concat(),  +

655 µs

129µs

5.08

Insert()

653 µs

220µs

2.97

Remove()

 

176 µs

55 µs

3.20

Substring()

167 µs

54 µs

3.09

 

 

 
The source code of the performance tests is here. Take notice that in its present state this code doesn’t produce any output at all. To see the timing results, run under the VS2008 Debugger and put breakpoints at suitable locations.
 
In the third part of my Finger-Tree series I will present a brief performance comparison between a Finger-Tree-based sequence and XPath sequences, as implemented in existing XSLT 2.0 processors.
Advertisements
This entry was posted in functional data structures. Bookmark the permalink.

3 Responses to Part 2: The Swiss Army Knife of Data Structures … in C#

  1. BTW, I ran your benchmarks on different data sizes (.NET 4 Release), and found that while FT outperforms Lists only after about 5000 elements, .NET strings are insanely fast, and up to 150000 chars they consistently outperform FSeq – and most strings are shorter than 150 KB :)

    • Alexander, My tests were performed 4 years ago when there was no .NET 4. As per the table above, they show,for string length 100 000 (not 150 000 as you claim) in typical cases FString outperforms the .NET strings 3 to 5 times on the following operations: concat(), insert(), remove() and substring.

      Nevertheless, I’d be surprized (pleasently) if this results are now substantially different for .NET 4.0. The different results are probably due to you and I using different tests and methodology. You may contact me via email so that we could compare our tests and methodologies.

      Anyway, the only difference in our results:
      According to you, one should prefer FString for lengths > 150K.
      According to me, one should prefer using FString for length > 100K.

      This isn’t a very big difference — is it?

      I included the string tests results not to claim that people should use an FString always. It is more of a know-it fact, that for large strings an FString is more performant.

      Where can this have effect? Clearly in text editors for medium to large documents. As we all know, inserting/deleting/appending are operations that are massively carried out in text editing — and there are certain editors that become extremely unresponsive after performing such operations repeatedly on even medium-length text.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s