Part 3: The Swiss Army Knife of Data Structures … in C#

In Part 1 of these series I described an implementation of the Finger Tree data structure in C#.

Part 2 reported the results of performance tests with the standard .NET Enumerable class and the finger-tree-based sequence

The new data provided here is the results of similar performance tests, this time between the finger-tree-based sequence and XPath 2.0 fuctions on sequences as implemented by a group of three XSLT 2.0 processors.

I apologize for not specifying the names of the XSLT 2.0 processors used in this experiment. This was done intentionally, since the purpose of the experiment was only to compare the current efficiency of the XPath sequences and strings as implemented by these XSLT processors to the efficiency of a Finger Tree – based implementation. This experiment did not have the goal of comparing the XSLT processors with each other. The results were consistent with and did not alter my prior perception of each XSLT processor’s efficiency.

Also, the usual warning:

Disclaimer: This comparison was carried out on my 4-year old PC, having CPU speed of 3GHz and RAM of 2GB and running Windows XP.  Results will largely vary on other configurations, depending on CPU and memory speed, available memory, cache size, …, etc. Another factor to consider is the time necessary for jitting the .NET IL code on initial method execution or the JVM startup time. However the observed ratios (speedup) should remain closely the same.

The following XPath 2.0 functions on sequences were tested:

fn:insert-before()

op:concatenate()

fn:remove()

fn:subsequence()

op:item-access()  (the position() function with integer argument)

fn:reverse()

The following XPath 2.0 functions on strings were tested:

fn:string-length()

fn:concat()     (with 2 arguments)

fn:substring()

The finger-tree-based sequence was implemented in such a way that an instance of it was instantiated as an extension object of the Saxon 9.1 (for .NET) XSLT processor and then the methods were referenced as extension functions.

Any of these extension functions was executed 100 000 times and the average time was recorded. Unfortunately, two of the three tested XSLT processors were very slow, which necessitated that during the tests of these XSLT processors many functions had to be executed 10 000 times or even only 1000 or, in a very few, exceptional circumstances, only 100 times.

Writing such code can be tricky, especially due to the very aggressive optimization performed by one of the tested XSLT processors.  Thus, in order to obscure from such a clever XSLT processor the fact that a variable had been defined outside of an <xsl:for-each> instruction, something like the following code had to be used:

<xsl:for-each select=”1 to 100000“><xsl:variable name=”vFSeq3
select=”if(. mod 2 eq 0)
then $vFSeq1
else $vFSeq2
“/>

<xsl:variable name=”vRem” as=”xs:integer
select=”. mod 10“/>

<xsl:variable name=”vFSeqY
select=”if($vRem eq 1)
then $vFSeqY1
else if($vRem eq 2)
then $vFSeqY2
else if($vRem eq 3)
then $vFSeqY3
else if($vRem eq 4)
then $vFSeqY4
else if($vRem eq 5)
then $vFSeqY5
else if($vRem eq 6)
then $vFSeqY6
else if($vRem eq 7)
then $vFSeqY7
else if($vRem eq 8)
then $vFSeqY8
else if($vRem eq 9)
then $vFSeqY9
else $vFSeqY10
“/>

<xsl:variable name=”vFSeq5” select=
ext:Merge($vFSeq3, $vFSeqY)“/>

<xsl:variable name=”vLen” select=
number(ext:Length($vFSeq5))“/>

<xsl:value-of select=”(1,2,3)[$vLen]“/>
</xsl:for-each>

Note how the last instruction within the <xsl:for-each> above causes a lazy-evaluating XSLT processor to finish its work and produce the complete sequence.

The sequences (and character strings) on which the above functions were performed had length of 10 000.

The code for these tests is here.

As in the .NET comparison test (reported in Part 2), there were two approaches:

  • Testing “the worst or extreme case”, when the argument to be inserted, concatenated, removed, used as index, … was a value equal or very close to 10000.
  • Testing “the average case”: with many different values for this argument, uniformly increasing in the interval [1000, 10000].

As can be seen from these tables (my weblog host allows only limited length for an entry and could not accommodate them inline), which summarize the results of the “uniform” or “average” testing of the sequence implementations, using a Finger Tree as a general sequence data structure in XPath 2.0 may speed up sequence operations (roughly) 5 to 40 times even when only the best of the three XSLT processors’ results are compared, 5 to 4500 times if all results are taken into account.

XPath Sequence Performance Comparison.html

The XPath string implementation of the three XSLT 2.0 processors was tested against a finger-tree based string object and the results from the “average” tests are summarized in these tables:

XPath String Performance Comparison.html

As can be seen, using a Finger Tree to represent a string in XPath 2.0 may speed up string operations (roughly) 3 to 9 times even when only the best of the three XSLT processors’ results are compared, 3 to 40 times if all results are taken into account.

I will conclude the series on finger trees with the following, quite

Obvious Conclusion:
Existing sequence and string implementations in .NET and in XSLT 2.0 processors can be speeded up significantly by using the Finger Tree data structure. Additionally, .NET will benefit from having a truly immutable sequence implementation.

Certainly, the question is not “if” such implementations will be produced, but only “when”.

About these ads
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s