Wide Finder in XSLT –> deriving new requirements for efficiency in XSLT processors.

With his twelve posts under the common title of "The Wide Finder Project" Tim Bray formulated a problem, which obviously has since then stirred some people, anxious to prove that their programming language of choice fares well in implementing solutions for this class of problems.

While I am not aware of any XSLT implementation that provides explicit or implicit support for parallel processing (with the obvious goal to take advantage of the multi-core processors that have almost reached a "prevalent" status today), I was curious to determine at least two things:

  • How well a good XSLT 2.0 processor and a straightforward solution fare against other languages/solutions?
  • Where is the XSLT code on the scale of "beautiful code"?


Before going further let me remind that there is a popular (and as we’ll see unfounded!) belief that any XSLT solution must be hugely inefficient and that any XSLT code can only be ugly. In fact, the following comment to one of Tim’s posts reflects exactly this mindset:


"From: Rornan (Sep 25 2007, at 08:31)


If you had anything at all to do with creating XSLT, then you have no right at all to comment on any other language deficency ever ever again."


My initial XSLT solution to Tim’s problem is below. No optimization attempts have been attempted, not only because I don’t have an XSLT processor that utilizes multi-core processor, but also because it seems there’s only a limited possibility for optimization (adding more CPU’s is not likely to speed up the reading of a huge file from a single drive).


<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  <xsl:output method="text"/>
<xsl:variable name="vLines" as="xs:string*" select= "tokenize(unparsed-text('file:///C:/Log1000000.txt'),'\n')"/>
<xsl:variable name="vRegEx" as="xs:string" select= "'^.*?GET /ongoing/When/[0-9]{3}x/([0-9]{4}/[0-9]{2}/[0-9]{2}/[^ .]+) .*$|^.+$'"/>
<xsl:template match="/"> <xsl:for-each-group group-by="." select= "for $line in $vLines return replace($line,$vRegEx,'$1')[.]"> <xsl:sort select="count(current-group())" order="descending" />
<xsl:if test="not(position() > 10)"> <xsl:value-of select=
"concat(count(current-group()),':',current-grouping-key(),' ')"/> </xsl:if> </xsl:for-each-group> </xsl:template> </xsl:stylesheet>

This 22-line-transformation is performed by Saxon 9.0 on a 3GHz DELL desktop. The file Log1000000.txt was constructed using the 10000 lines file provided by Tim and copying it into another file 100 times. The size of this file is about 200MB.

The results were produced in 36.175 seconds using about 929MB of RAM. Saxon should be timed using the -repeat:3 command-line option, which performs the transformation three times. Using only the results from the first/single transformation will be misleading, as they include the time for the Java VM startup.




To answer the questions:

1. How well a good XSLT 2.0 processor and a straightforward solution fare against other languages/solutions?

The non-optimized, uniprocessor version of this solution has a time of 36 seconds for processing about 200MB log file. It is likely the timing for the full , five times bigger log file used by Tim Bray, will be about 5 times bigger: 180 sec, or about 3 minutes. 3 minutes for processing 1GB of data is not a bad time for an XSLT transformation, considering that sizes even of a few hundreds of megs are still considered monstrous.

XSLT could be in fact one of the winners in the WF competition, had its creators stepped just a little bit further exploiting the natural, non-sequential XSLT processing model. Here I am talking about specifying a single f:parMap() function, which can be defined exactly as the current FXSL f:map() function:

   <xsl:function name="f:map" as="item()*">
      <xsl:param name="pFun" as="element()"/>
      <xsl:param name="pList1" as="item()*"/>

      <xsl:sequence select=
       "for $this in $pList1 return
          f:apply($pFun, $this)"


The only difference is that f:parMap() adds to the semantics of f:map() the hint to use as many available CPU processors as appropriate when evaluating the "for" expression in the code of the function.

Yes, this can be done for any "/" operator or for any "for" expression such as the one used in lines 13 – 14 in our XSLT solution:

"for $line in $vLines
     return replace($line,$vRegEx,’$1′)[.]"


and for any <xsl:apply-templates …/> instruction, and for any <xsl:for-each …/> instruction, and for any <xsl:for-each-group …> instruction and … and … and …


Judging from the published results, this straightforward, non-optimized uniprocessor XSLT solution comes not too-far behind some of the optimized for multi-processing solutions. I hope that in the not so distant future there will be XSLT processors exploiting the inherent non-sequential nature of XSLT processing to implement highly-optimized multi-processing solutions.

2. Where is the XSLT code on the scale of "beautiful code"?

By its compactness (22 lines) it is in 3rd place and rivals the 2nd place (17 lines), as we could easily remove 4 lines (3 of them blank) used to add readability. One of Tim’s criteria for "beautiful code" is avoiding awkwardness.He speaks about Ruby not needing ending semicolons or variable type declarations. However, Tim’s solution still had to use two lines for defining a hash and setting its default to 0. By contrast, the XSLT solution above does not require the programmer to introduce and initialize a special data structure — the support for grouping is right there in the core of the language. There is simply no way the programmer can do something wrong in declaring or using a hash table.


What could the XSLT processor do better?

Both the timing and especially the amount of RAM used indicate how the XSLT processor did its work:

  • It read all the text file in memory.
  • Then it produced all $vLines strings.
  • Then it produced the line replacements.
  • Then it did the grouping/sorting on the line replacements.

Imagine that the XSLT Processor is really lazy:

  • It will not read any contents of the file and will not do any tokenization, until absolutely necessary.
  • Whenever it is really necessary to use the next token (line), only the text necessary to determine that line shall be read.
  • Whenever a token/line is produced, all previously used / unused text shall be marked-for-garbage-collection/discarded/reused.
  • Whenever a string from $vLines is consumed and processed by the <xsl:for-each-group …/> instruction, this string is no-longer used and shall be marked-for-garbage-collection/discarded/reused.

Using these rules a truly lazy XSLT processor will only need space large enough for the longest line, and for a hash table to keep the keys and counters for all different articles being grouped. In this case there are just 562 unique values extracted from the strings of $vLines.

In this way, the processing — reading a line, finding the article it contains and feeding this to the hash table —  could be accomplished in streaming mode while reading the text file. Upon reaching the end of the file, there would be very little left to do, and thus almost nothing to be further optimized.



I truly believe that the described improvements can be implemented by at least some of the best XSLT 2.0 processors around here. For many years I have been using Saxon and am very grateful to its developer Dr. Michael Kay. The performance efficiency has been constantly growing — a very good example to follow by all other XSLT vendors and if they do there will be competition, and competition is good for us all.

This entry was posted in XSLT 2.0. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s