The Binary Search Tree Data Structure–having fun with XPath 3.0

For a long time I have wanted to play with XSLT 3.0 and XPath 3.0. Despite these being in their WD status, the new features are so powerful and overwhelming.

Take just these: Higher Order Functions and the ability to create new anonymous functions in XPath.

In my quest to accomplish what no one has ever done before with XPath I was helped by the existence of an early Saxon implementation – Saxon 9.3.04 offers these in its XQuery 3.0 implementation and the XSLT 3.0 implementation will hopefully be soon fully usable, after fixing a few bugs. The beautifully high-lighted XQuery code below was copied from oXygen 12.1 and pasted onto Word, then to Windows Live Writer.

I decided to start with something not too-complex (are you dreaming of a finger tree written entirely in XPath ?) and the Binary Search tree persistent functional data structure fits well this requirement.

So, let’s start:

As always:

declare namespace tree =http://fxsl.sf.net;

A tree is a sequence (triple) of functions. Here is the definition of an empty tree:

declare functiontree:tree()
as (function() as item()*)+

{

(function() {()},   (: top() :)
    function() {()},   (: left() :)
    function() {()}    (: right() :)
)
};

The first function produces the top of the tree, which is the value ( item() ) of its top node.

The second function produces the left sub-tree, which is another sequence (triple) of functions.

The third function produces the right sub-tree, which is another sequence (triple) of functions.

Do you see the problem with the type of the tree:tree() function? What would happen if I accidentally returned only a pair of functions? Nothing! No static or runtime error would occur and I could spend a lot of time trying to find this simple error. Obviously, a tuple datatype would solve this problem completely.

Now, the three functions top(), left() and right() :

declare function

tree:top
         ($pTree as (function() as item()*)*)
         as item()?
{
if(empty($pTree))
   then ()
   else
     $pTree[1]()
};

declare function

tree:left
         ($pTree as (function() as item()*)*)
         as (function() as item()*)*
{
if(empty($pTree[2]))
then ()
else
   $pTree[2]()
};

declare function
tree:right
         ($pTree as (function() as item()*)*)
         as (function() as item()*)*
{
if(empty($pTree[3]))
   then ()
   else
     $pTree[3]()
};

A tree is empty when:

declare function tree:empty
($pTree as (function() as item()*)*)
as xs:boolean
{
empty($pTree) orempty($pTree[1]())
};

When does a tree contain an item?

declare function tree:contains
         ($pTree as (function() as item()*)*,
          $pItem as item()
         )
         as xs:boolean
{
if(tree:empty($pTree))
   then false()
   else
     let $top := tree:top($pTree)
      return
         ($pItem eq $top)
        or
         ($pItem lt $top
         and
          tree:contains(tree:left($pTree),$pItem)
          )
        or
         ($pItem gt $top
         and
          tree:contains(tree:right($pTree),$pItem)
          )
};

How to add a new node to a tree?

declare function tree:insert
         ($pTree as (function() as item()*)*,
          $pItem as item()
         )
         as (function() as item()*)+
{
   if(tree:empty($pTree))
      then
      (
       function() {$pItem},   (: top()   :)
       function() {tree:tree()},         (: left() :)
       function() {tree:tree()}          (: right() :)
       )
      else
       if($pItem lt tree:top($pTree))
         then
          (
           function() {tree:top($pTree)},
           function() {tree:insert(tree:left($pTree), $pItem)},
           function() {tree:right($pTree)}
           )
         else
          if($pItem gt tree:top($pTree))
           then
           (
            function() {tree:top($pTree)},
            function() {tree:left($pTree)},
            function() {tree:insert(tree:right($pTree), $pItem)}
           )
           else
            $pTree
};

How to present a tree?

declare function tree:print
        ($pTree as (function() as item()*)*)
        as element()?
{
  if(not(tree:empty($pTree)))
     then
      <treeNode>
           <value>{tree:top($pTree)}</value>
           {
            tree:print(tree:left($pTree)),
            tree:print(tree:right($pTree))
           }
     </treeNode>
     else ()
};

How to atomize a tree (and get a good sorting function as a side effect)?

(: tree:data()
Prints only the data — depth first.
   In effect this is sort() — quite good
   for random data.
:)
declare function tree:data
        ($pTree as (function() as item()*)*)
        as item()*
{
   if(not(tree:empty($pTree)))
     then
     (
      tree:data(tree:left($pTree)),
      data(tree:top($pTree)),
      tree:data(tree:right($pTree))
      )
     else ()
};

How to return a subtree and its depth?

declare function tree:findSubtree
         ($pTree as (function() as item()*)*,
          $pItem as item()
         )
         as (function() as item()*)*
{
if(tree:empty($pTree))
   then (tree:tree(), function() as item()* {–1})
   else
     let $depth := 1,
         $top := tree:top($pTree)
       return
         if($pItem eq $top)
           then ($pTree,function() as item()* {$depth})
           else
             if($pItem lt $top)
               then
                 let $lsubtree
                       := tree:findSubtree(tree:left($pTree),$pItem),
                     $ldepth := $lsubtree[4]()
                   return
                     if($ldepth eq –1)
                       then $lsubtree
                       else
                         (subsequence($lsubtree,1,3),
                          function() as item()* {1+$ldepth}
                          )
               else
                 let $rsubtree
                       := tree:findSubtree(tree:right($pTree),$pItem),
                     $rdepth := $rsubtree[4]()
                   return
                     if($rdepth eq –1)
                       then $rsubtree
                       else
                         (subsequence($rsubtree,1,3),
                          function() as item()* {1+$rdepth}
                          )

};

Finally, lets see how all this executes together:

let $vEmptyTree := tree:tree(),
    $vFilledTree := tree:insert($vEmptyTree,5) ,
    $vFilledTree2 := tree:insert($vFilledTree,3),
    $vFilledTree3 := tree:insert($vFilledTree2,7),
    $vFilledTree4 := tree:insert($vFilledTree3,1),
    $vFilledTree5 := tree:insert($vFilledTree4,9)
    return
      (tree:print($vFilledTree5),
       ‘ tree:contains($vFilledTree5,1): ‘,
       tree:contains($vFilledTree5,1),
       ‘ tree:contains($vFilledTree5,9): ‘,
       tree:contains($vFilledTree5,9),
       ‘ tree:contains($vFilledTree5,2): ‘,
       tree:contains($vFilledTree5,2),
       ‘ tree:contains($vFilledTree5,11): ‘,
       tree:contains($vFilledTree5,11),
       ‘ tree:findSubtree($vFilledTree5,9)[4](): ‘,
       tree:findSubtree($vFilledTree5,9)[4](),
       ‘ tree:findSubtree($vFilledTree5,1)[4](): ‘,
       tree:findSubtree($vFilledTree5,1)[4](),
       ‘ tree:findSubtree($vFilledTree5,3)[4](): ‘,
       tree:findSubtree($vFilledTree5,3)[4](),
       ‘ tree:findSubtree($vFilledTree5,7)[4](): ‘,
       tree:findSubtree($vFilledTree5,7)[4](),
       ‘ tree:findSubtree($vFilledTree5,5)[4](): ‘,
       tree:findSubtree($vFilledTree5,5)[4](),
       ‘ tree:findSubtree($vFilledTree5,12)[4](): ‘,
       tree:findSubtree($vFilledTree5,12)[4](),
       ‘ tree:data($vFilledTree5): ‘,
       tree:data($vFilledTree5)
      )

And the result:

<treeNode>
   <value>5</value>
   <treeNode>
      <value>3</value>
      <treeNode>
         <value>1</value>
      </treeNode>
   </treeNode>
   <treeNode>
      <value>7</value>
      <treeNode>
         <value>9</value>
      </treeNode>
   </treeNode>
</treeNode>
tree:contains($vFilledTree5,1): true
tree:contains($vFilledTree5,9): true
tree:contains($vFilledTree5,2): false
tree:contains($vFilledTree5,11): false
tree:findSubtree($vFilledTree5,9)[4](): 3
tree:findSubtree($vFilledTree5,1)[4](): 3
tree:findSubtree($vFilledTree5,3)[4](): 2
tree:findSubtree($vFilledTree5,7)[4](): 2
tree:findSubtree($vFilledTree5,5)[4](): 1
tree:findSubtree($vFilledTree5,12)[4](): -1
tree:data($vFilledTree5): 1 3 5 7 9

What next?

As you, my observant reader, probably have noticed, there is no tree:deleteItem() yet. This is a little bit more tricky and will be the topic for my next post.

Summary: A fully functional, persistent functional data structure – the Binary Search Tree has been implemented in pure XPath 3.0. We have observed how the new features of HOF and dynamically created anonymous function items make XPath 3.0 shine and how severely it lacks a simple yet most needed datatype – the tuple – to make our code more elegant and reliable.

15 Responses to The Binary Search Tree Data Structure–having fun with XPath 3.0

Pingback: Tweets that mention The Binary Search Tree Data Structure–having fun with XPath 3.0 | XSLT: Riding the challenge -- Topsy.com
Mads Hansen says:

February 8, 2011 at 8:29 am

Dimitre,

It looks as if the default settings for WordPress convert the end of XPATH comments into emoticons. It’s kind of distracting and messes the display of your code samples.

If you have access to the admin panel, it looks as if you can disable it:
http://codex.wordpress.org/Using_Smilies#To_Turn_off_Graphic_Smileys

- Dimitre Novatchev says:
  
  February 8, 2011 at 8:48 am
  
  Thank you, Mads, I am very new to wordpress and didn’t know how to turn this off. Done now.
  
  Dimitre.
  
Kurt Cagle says:

February 8, 2011 at 10:24 am

Dmitri,

And I thought I was on the bleeding edge!? Very cool! I need to spend some time working with the HOF in Saxon, been looking forward to using them for some time. What I find most intriguing here is how rapidly XQuery is converging into a truly powerful functional language.

Dimitre Novatchev says:

February 8, 2011 at 10:43 am

Kurt, this is actually pure XPath 3.0 — can be used both from XSLT or XQuery, or from any host of an XPath 3.0 engine.

Pingback: Tweets that mention The Binary Search Tree Data Structure–having fun with XPath 3.0 | XSLT: Riding the challenge -- Topsy.com
Alejandro says:

February 8, 2011 at 4:44 pm

Very good articule! For a functional language, tuples are an important missing feature… Although one could simulate tuple with mostly the same concept of this: think about a curryfied function of “n + 1” arguments for an “n” tuple, with last one argument for extraction.

- Dimitre Novatchev says:
  
  February 8, 2011 at 6:34 pm
  
  Alejandro, I think you mean a function of N arguments that returns as result a function of 0 arguments, that returns a sequence containing exactly the originally-specified N arguments. Yes, this can be done, but there is no way to specify that what the 0-argument function returns has exactly N members and the type of every member of the sequence. And this is essentially why we need a proper tuple type.
  
MIchel Hendriksen says:

February 9, 2011 at 2:30 am

This is totally cool. And very usefull for genetic programming. Tuples would be very nice there too! (And a good built in random number function that doesn’t need seeding all the time…)

Lars says:

February 10, 2011 at 4:03 am

Nice article… I was able to follow most of it, which is a testament to clear presentation, as I am no guru when it comes to functional programming. :-)

Another typing-related feature that seems to be missing is the ability to specify that e.g. the tree:tree() function returns a tree, not just any sequence of functions that return any type of items.

But I guess maybe the only way to declare and enforce that would be to add a system of classes or prototypes, which would be a huge addition to the current language design, and likely not worthwhile for the intended purposes of the language.

Lars

- Dimitre Novatchev says:
  
  February 10, 2011 at 5:37 am
  
  Excellent observation, Lars.
  
  Yes as our programming becomes more generic, we *badly* need “types as objects”. There is no way at the moment to specify that a tree is something more than just a seguence of functions — we even cannot say that it is actually a *tripple* of functions. And we cannot specify the exact type of the first of the second and third of these functions and are forced to use the type of the first function as the least common denominator… :(. Another issue is that the nodes of a binary search tree can only hold items from *ordered* sets — so that lt is definet on any two such values. There is no way currently in XPath to specify that the type of an item must be ORDERED.
  
Pingback: Part 2: The Binary Search Data Structure with XPath 3.0, Deleting a node | XSLT: Riding the challenge
cutlass2011 says:

January 9, 2012 at 11:14 am

not to steal your thunder, which is mighty indeed … John Snelson implemented red black tree
https://github.com/jpcs/rbtree.xq

a while ago in xquery 3.0 (xquery has maps now natively!)

both John and I have been playing with the fp fun n games in xquery/xpath 3.0 e.g. y combinators, etc and having a great time, though agreed xpath could make life easier I think there is just about enough to get by.

keep up the great work .. have you considered submitting a poster for XML Prague with this ?

http://www.xmlprague.cz/2012/call-for-papers.html

- Dimitre Novatchev says:
  
  January 9, 2012 at 11:35 am
  
  Thanks for the reminder, Jim.
  
  In fact this waork was done last summer and now I am refreshing the post to appear in XmlPlanet, after my blog address was actualized.
  
  As for native maps in XQuery 3.0 / XPath 3.0, that would be great, but I am afraid maps won’t be part of the final documents (judging from the XQuery 3.0 Last Call document).
  
cutlass2011 says:

January 9, 2012 at 1:00 pm

ah, very cool … I was probably on holidays when you did this, great stuff.

re maps oh well, at least having rbtree.xq means we don’t need native for now ;)

	Dimitre Novatchev on Fizz Buzz with XPath 2.0/…
	Vijay on Fizz Buzz with XPath 2.0/…
	Dimitre Novatchev on The Swiss Army Knife of Data S…
	Sven Grosen on The Swiss Army Knife of Data S…
	Dimitre Novatchev on Recursion with anonymous (inli…