Recursion with anonymous (inline) functions in XPath 3.0 — Part II

In my first post about implementing recursion with anonymous functions I provided the following example:

let $f := function($n as xs:integer,
                   $f1 as function(xs:integer,
                   function()) as xs:integer
                   ) as xs:integer
             {if($n eq 0)
                 then 1
                 else $n * $f1($n -1, $f1)
              },
    $F := function($n as xs:integer) as xs:integer
            {
                $f($n, $f)
            }

   return
           $F(5)

While this technique works well with relatively small values for n, we run into problems when n becomes bigger.

Let’s see this based on another example:

  let $f := function($nums as xs:double*,
                   $f1 as function(xs:double*,
                   function()) as xs:double
                   ) as xs:double
             {
              if(not($nums[1]))
                 then 0
                 else $nums[1] + $f1(subsequence($nums,2), $f1)
              },
    $F := function($nums as xs:double*) as xs:double
            {
                $f($nums, $f)
            }
   return
           $F(1 to 10)

This calculates correctly the sum of the numbers 1 to 10 – the result is:

55

However, if we try:

$F(1 to 100)

the result is the following Saxon 9.4.6EE exception:

Error on line 22
Too many nested function calls. May be due to infinite recursion.
Transformation failed: Run-time errors were reported

So, what happens here? Most readers would have guessed by now — our old Stack Overflow (not the site) exception.

Is there any way to avoid this exception?

One could rely on the smartness of the XSLT processor to do this. A slight fraction of XSLT processors recognize a limited kind of tail recursion and implement it using iteration, thus avoiding recursion.

The above code isn’t tail-recursive. Let us refactor it into the following tail-recursive code:

let $f := function($nums as xs:double*,
                   $accum as xs:double,
                   $f1 as  
                     function(xs:double*, xs:double, function())
                              as xs:double
                  ) as xs:double
             {
              if(not($nums[1]))
                 then $accum
                 else $f1(subsequence($nums,2), $accum+$nums[1], $f1)
              },

        $F := function($nums as xs:double*) as xs:double
              {
                     $f($nums, 0, $f)
              }
    return
      $F(1 to 100)

Saxon is one of the few intelligent processors that recognizes tail recursion, however it still raises the stack-overflow exception — even for the above tail-recursive code . Why?

Here is the Wikipedia definition of tail recursion:

In computer science, a tail call is a subroutine call that happens inside another procedure as its final action; it may produce a return value which is then immediately returned by the calling procedure. The call site is then said to be in tail position, i.e. at the end of the calling procedure. If any call that a subroutine performs, such that it might eventually lead to this same subroutine being called again down the call chain, is in tail position, such a subroutine is said to be tail-recursive.

At present, the XSLT/XPath processors that do recognize some kind of tail recursion only do so if a function/template calls itself by name.

None of them handles the case when the tail call is to another function (Michael Kay, author of Saxon, shared on the Saxon mailing list that Saxon correctly handles any tail calls for templates, but not for functions).

So, what can we do in this situation? One decision is to wait until some processor starts handling any type of tail call inside functions.

Another, is to use the DVC (Divide and Conquer) technique for minimizing the maximum depth of nested recursion calls.

The idea is to split the sequence into subsequences (usually two) of roughly the same length, recursively process each subsequence, and then combine the results of processing each individual subsequence.

Here is the above code, re-written to use DVC:

let $f := function($nums as xs:double*,
                   $f1 as function(xs:double*, function()) 
                             as xs:double
                   ) as xs:double
             {if(not($nums[1]))
                 then 0
                 else if(not($nums[2]))
                        then $nums[1]
                        else
                         let $half := count($nums) idiv 2
                          return
                            $f1(subsequence($nums,1, $half), $f1)
		           +
		            $f1(subsequence($nums, $half+1), $f1)
              },
    $F := function($nums as xs:double*) as xs:double
            {
                $f($nums, $f)
            }

   return
           $F(1 to 10000)

Sure enough, this time we get the result:

5.0005E7

Using this technique, the maximum recursion depth is Log2(N) — thus for processing a sequence with 1M (one million elements) the maximum recursion depth is just 19.

Conclusion: The DVC technique is a tool that can be immediately used to circumvent the lack of intelligence of current XPath 3.0 processors when dealing with tail-call optimization.

About these ads
This entry was posted in Higher Order Functions, Performance Optimization, XPath 3.0. Bookmark the permalink.

7 Responses to Recursion with anonymous (inline) functions in XPath 3.0 — Part II

  1. Your post stimulated me to take another look at this, in particular the question of whether there are any languages generating Java bytecode that implement cross-function tail call optimization. This post: http://stackoverflow.com/questions/105834/does-the-jvm-prevent-tail-call-optimizations is not encouraging, for example it appears Scala has the same problem of only handling self-calls. It can be done, of course, but at what cost? (We do it for template calls after all, but that’s possible because we implement our own stack for template calling rather than using the JVM stack).

    • From the comments to the answers of that SO question:

      ” Tail call elimination can and has been implemented on the JVM (Arnold Schaighofer did in in OpenJDK, and also LLVM) so there is no question as to whether or not it can be done. Microsoft’s CLR has, of course, supported tail call elimination for 10 years and the release of F# has demonstrated that it is a game changer. I think the answer is that the JVM has long since stagnated”

      “IBM has supported some form of TCO in their JVM implementation (as an optimization, so no guarantees). Maybe the authors of Java Performance tuning thought this feature would eventually be implemented by all JVMs. ibm.com/developerworks/java/library/j-diag8.html ”

      So, future developers may decide to use .NET (C#, F#), or IBM’s JVM implementation.

    • BaseX is also written in Java and they should have the same difficulties. However, almost a year ago, they implemented their support for full optimization of any tail-recursive calls.

  2. John Snelson says:

    Your function isn’t tail recursive – so no XSLT processor would tail call optimize it. To make a tail recursive version of that function, you’d need to use an accumulator argument, ie:

    let $f := function($n,$acc,$f1) {
    if($n eq 0) then $acc
    else $f1($n – 1, $n * $acc, $f1)
    }
    let $F function($n) { $f($n,1,$f) }
    return $F(5)

  3. guest says:

    Hi Dimitre, just wanted to drop a quick thanks for posting this

    http://stackoverflow.com/questions/3360017/why-does-xslt-output-all-text-by-default

    My head almost exploded yesterday trying to figure out where all that extra text was coming from.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s