Update: Minor code cleanup.
Prepending a list xs with a list ys to obtain the concatenation (of ys ++ xs ) of the two is usually a cheap operation which, when done non-destructively, requires only to copy the list ys to a new list y1s and link the last item of y1s to the head of xs.
Thus if we have to prepend K such lists:
y1s, y2s, …, yks
starting with the prepend of y1s to xs, and prepending each of the following lists above to the result of the last prepend operation, the result will be a list consisting of the items of:
yks, …, y2s, y1s, xs
in that order. Every list will be copied only once and the total operation of K prepends will require copying the items of all the lists. So, prepending a number of lists is a linear operation – proportional to the total number of items in those lists.
On the other side, continuous appending of lists may be O(N^2) when done in a naive way, because the growing list will have to be copied again and again in each append operation.
In the XSLT 2.0 processors available today, the cost of repetitive prepending and appending are different than the above. I have studied two XSLT 2.0 processors, P1 and P2.
P1 optimizes repetitive appends, by achieving linear complexity. Its time complexity for repetitive prepending is O(N^2) — square.
P2 doesn’t optimize either way.
The simple transformation below:
|
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:my="my:namespace" exclude-result-prefixes="my" > <xsl:output method="text"/>
<xsl:variable name="vRepetitions" as="xs:integer" select="100000"/>
<xsl:variable name="vAppends" as="element()*"> <my:node/><my:node/><my:node/><my:node/><my:node/> <my:node/><my:node/><my:node/><my:node/><my:node/> </xsl:variable>
<xsl:template match="/"> <xsl:variable name="vNodes" as="element()*"> <xsl:sequence select="my:iter($vRepetitions, ())"/> </xsl:variable>
<xsl:value-of select="name($vNodes[last()])"/> </xsl:template>
<xsl:function name="my:iter" as="element()*"> <xsl:param name="pNumTimes" as="xs:integer"/> <xsl:param name="pstartSeq" as="element()*"/>
<xsl:sequence select= "if($pNumTimes gt 0) then my:iter($pNumTimes -1, ($pstartSeq, $vAppends) ) else $pstartSeq "/> </xsl:function> </xsl:stylesheet>
|
when run with P1 with different number of repetitions, takes the following times:
|
Appends
|
Time, ms
|
Remarks
|
|
100
|
42
|
Hundred appends.
|
|
1000
|
47
|
Thousand appends.
|
|
10000
|
85
|
Ten thousand appends
|
|
100000
|
458
|
One hundred thousands appends.
|
|
1000000
|
7141
|
One million appends. |
We see that the claims of P1′s linear or better performance are true.
P2 behaves much worse and has O(N^2) results:
|
Appends
|
Time, ms
|
Remarks
|
|
100
|
21
|
Hundred appends.
|
|
1000
|
7420
|
Thousand appends.
|
|
5000
|
38100
|
Failed after 38 seconds. Five thousand appends
|
|
10000
|
——
|
Failed. Ten thousands appends. |
With repetitive prepends both P1 and P2 have square time complexity. This was tested with the following transformation:
<xsl:stylesheet version="2.0"
xmlns:xsl="
http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:my="my:namespace"
exclude-result-prefixes="my"
>
<xsl:output method="text"/>
<xsl:variable name="vRepetitions" as="xs:integer"
select="1000"/>
<xsl:variable name="vPrepends" as="element()*">
<my:node/><my:node/><my:node/><my:node/><my:node/>
<my:node/><my:node/><my:node/><my:node/><my:node/>
</xsl:variable>
<xsl:template match="/">
<xsl:variable name="vNodes" as="element()*">
<xsl:sequence select="my:iter($vRepetitions, ())"/>
</xsl:variable>
<xsl:value-of select="name($vNodes[last()])"/>
</xsl:template>
<xsl:function name="my:iter" as="element()*">
<xsl:param name="pNumTimes" as="xs:integer"/>
<xsl:param name="pstartSeq" as="element()*"/>
<xsl:sequence select=
"if($pNumTimes gt 0)
then
my:iter($pNumTimes -1, insert-before($pstartSeq, 1, $vPrepends))
else
$pstartSeq
"/>
</xsl:function>
</xsl:stylesheet>
P1′s prepend results were the following:
|
Prepends
|
Time, ms
|
Remarks
|
|
100
|
62
|
Hundred prepends.
|
|
1000
|
1729
|
Thousand prepends.
|
|
10000
|
204682
|
Ten thousand prepends
|
P2′s prepend results were again O(N^2) and much worse than P1′s.
The problem I am trying to solve here is the following:
Can we implement an algorithm for repetitive prepends that will be run by P1 in linear time, in addition to its excellent linear performance of repetitive appends?
P1′s O(N^2) repetitive prepends performance has stayed the same for years so maybe its developer thought it could not be improved or an improvement would be not too important.
At first, it may seem that repetitive prepends are not so important (as repetitive appends — the process via which we create every output). However, some of the most important data structures, such as the stack, often undergo a long series of prepends (the "push" operation). Another important use-case is any attempt to port an existing application that uses repetitive list prepends.
The answer to this question is contained in my next post.