The Binary Search Tree Data Structure–having fun with XPath 3.0

For a long time I have wanted to play with XSLT 3.0 and XPath 3.0. Despite these being in their WD status, the new features are so powerful and overwhelming.

Take just these: Higher Order Functions and the ability to create new anonymous functions in XPath.

In my quest to accomplish what no one has ever done before with XPath I was helped by the existence of an early Saxon implementation – Saxon 9.3.04 offers these in its XQuery 3.0 implementation and the XSLT 3.0 implementation will hopefully be soon fully usable, after fixing a few bugs. The beautifully high-lighted XQuery code below was copied from oXygen 12.1 and pasted onto Word, then to Windows Live Writer.

I decided to start with something not too-complex (are you dreaming of a finger tree written entirely in XPath ?) and the Binary Search tree persistent functional data structure fits well this requirement.

So, let’s start:

As always:

declare namespace tree =http://fxsl.sf.net;

A tree is a sequence (triple) of functions. Here is the definition of an empty tree:

declare functiontree:tree()
        
as (function() as item()*)+

{ 
  (function() {()},   (: top() :)
    function() {()},   (: left() :)
    function() {()}    (: right() :)
  )
};
 

The first function produces the top of the tree, which is the value ( item() ) of its top node.

The second function produces the left sub-tree, which is another sequence (triple) of functions.

The third function produces the right sub-tree, which is another sequence (triple) of functions.

Do you see the problem with the type of the tree:tree()  function? What would happen if I accidentally returned only a pair of functions? Nothing! No static or runtime error would occur and I could spend a lot of time trying to find this simple error. Obviously, a tuple datatype would solve this problem completely.

Now, the three functions top(), left() and right() :

declare function

 tree:top
         (
$pTree as (function() as item()*)*
)
         as
item()?

{
 
if(empty($pTree))
  
then
()
  
else

     $pTree[1]()
};

 

declare function
tree:left
         (
$pTree as (function() as item()*)*
)
        
as (function() as item()*)*

{
 
if(empty($pTree[2]))
 
then
()
 
else

   $pTree[2]()
};
  
 
declare function
tree:right
         (
$pTree as (function() as item()*)*
)
        
as (function() as item()*)*

{
 
if(empty($pTree[3]))
  
then
()
  
else

     $pTree[3]()
};

A tree is empty when:

declare function tree:empty
         (
$pTree as (function() as item()*)*
)
         as xs:boolean
{
 
empty($pTree) orempty($pTree[1
]())
};
 

When does a tree contain an item?

declare function tree:contains
         ($pTree as (function() as item()*)*
,
          $pItem as item
()
         )
         as xs:boolean
{
 if(tree:empty($
pTree))
   then false
()
   else

     let $top := tree:top($pTree)
      return

         ($pItem eq $top)
        or

         ($pItem lt $top
         and

          tree:contains(tree:left($pTree),$pItem)
          )
        or

         ($pItem gt $top
         and

          tree:contains(tree:right($pTree),$pItem)
          )
};

How to add a new node to a tree?

declare function tree:insert
         ($pTree as (function() as item()*)*
,
          $pItem as item
()
         )
         as (function() as item()*)+

{
  
if(tree:empty($pTree))
      then

      (
      
function() {$pItem},   (: top()   :)
       function() {tree:tree()},         (: left()  :)
       function() {tree:tree()}          (: right() :)
       )
     
else
       if($pItem lt tree:top($pTree))
         then

          (
          
function() {tree:top($pTree)},
           function() {tree:insert(tree:left($pTree), $
pItem)},
           function() {tree:right($
pTree)}
           )
         else

          if($pItem gt tree:top($pTree))
           then

           (
           
function() {tree:top($pTree)},
            function() {tree:left($
pTree)},
            function() {tree:insert(tree:right($pTree), $
pItem)}
           )
           else

            $pTree
};

How to present a tree?

declare function tree:print
        ($pTree as (function() as item()*)*
)
        as element()?

{
  if(not(tree:empty($pTree)))
     then

      <treeNode>
           <value>{tree:top($pTree)}</value>
           {
           
tree:print(tree:left($pTree)),
            tree:print(tree:right($
pTree))
           }
     </treeNode>

     else ()
};

How to atomize a tree (and get a good sorting function as a side effect)?

(: tree:data()
  Prints only the data — depth first.
   In effect this is sort() — quite good
   for random data.
 :)
declare function tree:data
        ($pTree as (function() as item()*)*
)
        as item()*

{
  
if(not(tree:empty($pTree)))
     then

     (
     
tree:data(tree:left($pTree)),
      data(tree:top($
pTree)),
      tree:data(tree:right($
pTree))
      )
     else
()
};

How to return a subtree and its depth?

declare function tree:findSubtree
         ($pTree as (function() as item()*)*
,
          $pItem as item
()
         )
         as (function() as item()*)*

{
 
if(tree:empty($pTree))
   then (tree:tree(), function() as item()* {1
})
   else

     let $depth := 1,
         $top := tree:top($
pTree)
       return

         if($pItem eq $top)
           then ($pTree,function() as item()* {$
depth})
           else

             if($pItem lt $top)
               then

                 let $lsubtree
                       := tree:findSubtree(tree:left($pTree),$
pItem),
                     $ldepth := $lsubtree[4
]()
                   return

                     if($ldepth eq 1)
                       then $
lsubtree
                       else

                         (subsequence($lsubtree,1,3),
                          function() as item()* {1+$
ldepth}
                          )
               else

                 let $rsubtree
                       := tree:findSubtree(tree:right($pTree),$
pItem),
                     $rdepth := $rsubtree[4
]()
                   return

                     if($rdepth eq 1)
                       then $
rsubtree
                       else

                         (subsequence($rsubtree,1,3),
                          function() as item()* {1+$
rdepth}
                          )

};

Finally, lets see how all this executes together:

let $vEmptyTree := tree:tree(),
    $vFilledTree := tree:insert($vEmptyTree,5
)  ,
    $vFilledTree2 := tree:insert($vFilledTree,3
),  
    $vFilledTree3 := tree:insert($vFilledTree2,7
),
    $vFilledTree4 := tree:insert($vFilledTree3,1
),
    $vFilledTree5 := tree:insert($vFilledTree4,9
)
    return

      (tree:print($vFilledTree5),
       ‘ tree:contains($vFilledTree5,1): ‘
,
       tree:contains($vFilledTree5,1
),
       ‘ tree:contains($vFilledTree5,9): ‘
,
       tree:contains($vFilledTree5,9
),
       ‘ tree:contains($vFilledTree5,2): ‘
,
       tree:contains($vFilledTree5,2
),
       ‘ tree:contains($vFilledTree5,11): ‘
,
       tree:contains($vFilledTree5,11
),
       ‘ tree:findSubtree($vFilledTree5,9)[4](): ‘
,
       tree:findSubtree($vFilledTree5,9)[4
](),
       ‘ tree:findSubtree($vFilledTree5,1)[4](): ‘
,
       tree:findSubtree($vFilledTree5,1)[4
](),
       ‘ tree:findSubtree($vFilledTree5,3)[4](): ‘
,
       tree:findSubtree($vFilledTree5,3)[4
](),
       ‘ tree:findSubtree($vFilledTree5,7)[4](): ‘
,
       tree:findSubtree($vFilledTree5,7)[4
](),
       ‘ tree:findSubtree($vFilledTree5,5)[4](): ‘
,
       tree:findSubtree($vFilledTree5,5)[4
](),
       ‘ tree:findSubtree($vFilledTree5,12)[4](): ‘
,
       tree:findSubtree($vFilledTree5,12)[4
](),
       ‘ tree:data($vFilledTree5): ‘
,
       tree:data($
vFilledTree5)
      )

And the result:

<treeNode>
   <value>5</value>
   <treeNode>
      <value>3</value>
      <treeNode>
         <value>1</value>
      </treeNode>
   </treeNode>
   <treeNode>
      <value>7</value>
      <treeNode>
         <value>9</value>
      </treeNode>
   </treeNode>
</treeNode>
tree:contains($vFilledTree5,1):  true
tree:contains($vFilledTree5,9):  true
tree:contains($vFilledTree5,2):  false
tree:contains($vFilledTree5,11):  false
tree:findSubtree($vFilledTree5,9)[4]():  3
tree:findSubtree($vFilledTree5,1)[4]():  3
tree:findSubtree($vFilledTree5,3)[4]():  2
tree:findSubtree($vFilledTree5,7)[4]():  2
tree:findSubtree($vFilledTree5,5)[4]():  1
tree:findSubtree($vFilledTree5,12)[4]():  -1
tree:data($vFilledTree5):  1 3 5 7 9

What next?

As you, my observant reader, probably have noticed, there is no tree:deleteItem() yet. This is a little bit more tricky and will be the topic for my next post.

Summary: A fully functional, persistent functional data structure – the Binary Search Tree has been implemented in pure XPath 3.0. We have observed how the new features of HOF and dynamically created anonymous function items make XPath 3.0 shine and how severely it lacks a simple yet most needed datatype – the tuple – to make our code more elegant and reliable.

This entry was posted in functional data structures, XPath, XPath 3.0, XSLT 3.0. Bookmark the permalink.

15 Responses to The Binary Search Tree Data Structure–having fun with XPath 3.0

  1. Pingback: Tweets that mention The Binary Search Tree Data Structure–having fun with XPath 3.0 | XSLT: Riding the challenge -- Topsy.com

  2. Mads Hansen says:

    Dimitre,

    It looks as if the default settings for WordPress convert the end of XPATH comments into emoticons. It’s kind of distracting and messes the display of your code samples.

    If you have access to the admin panel, it looks as if you can disable it:
    http://codex.wordpress.org/Using_Smilies#To_Turn_off_Graphic_Smileys

  3. Kurt Cagle says:

    Dmitri,

    And I thought I was on the bleeding edge!? Very cool! I need to spend some time working with the HOF in Saxon, been looking forward to using them for some time. What I find most intriguing here is how rapidly XQuery is converging into a truly powerful functional language.

  4. Kurt, this is actually pure XPath 3.0 — can be used both from XSLT or XQuery, or from any host of an XPath 3.0 engine.

  5. Pingback: Tweets that mention The Binary Search Tree Data Structure–having fun with XPath 3.0 | XSLT: Riding the challenge -- Topsy.com

  6. Alejandro says:

    Very good articule! For a functional language, tuples are an important missing feature… Although one could simulate tuple with mostly the same concept of this: think about a curryfied function of “n + 1” arguments for an “n” tuple, with last one argument for extraction.

    • Alejandro, I think you mean a function of N arguments that returns as result a function of 0 arguments, that returns a sequence containing exactly the originally-specified N arguments. Yes, this can be done, but there is no way to specify that what the 0-argument function returns has exactly N members and the type of every member of the sequence. And this is essentially why we need a proper tuple type.

  7. MIchel Hendriksen says:

    This is totally cool. And very usefull for genetic programming. Tuples would be very nice there too! (And a good built in random number function that doesn’t need seeding all the time…)

  8. Lars says:

    Nice article… I was able to follow most of it, which is a testament to clear presentation, as I am no guru when it comes to functional programming. :-)

    Another typing-related feature that seems to be missing is the ability to specify that e.g. the tree:tree() function returns a tree, not just any sequence of functions that return any type of items.

    But I guess maybe the only way to declare and enforce that would be to add a system of classes or prototypes, which would be a huge addition to the current language design, and likely not worthwhile for the intended purposes of the language.

    Lars

    • Excellent observation, Lars.

      Yes as our programming becomes more generic, we *badly* need “types as objects”. There is no way at the moment to specify that a tree is something more than just a seguence of functions — we even cannot say that it is actually a *tripple* of functions. And we cannot specify the exact type of the first of the second and third of these functions and are forced to use the type of the first function as the least common denominator… :(. Another issue is that the nodes of a binary search tree can only hold items from *ordered* sets — so that lt is definet on any two such values. There is no way currently in XPath to specify that the type of an item must be ORDERED.

  9. Pingback: Part 2: The Binary Search Data Structure with XPath 3.0, Deleting a node | XSLT: Riding the challenge

  10. cutlass2011 says:

    not to steal your thunder, which is mighty indeed … John Snelson implemented red black tree
    https://github.com/jpcs/rbtree.xq

    a while ago in xquery 3.0 (xquery has maps now natively!)

    both John and I have been playing with the fp fun n games in xquery/xpath 3.0 e.g. y combinators, etc and having a great time, though agreed xpath could make life easier I think there is just about enough to get by.

    keep up the great work .. have you considered submitting a poster for XML Prague with this ?

    http://www.xmlprague.cz/2012/call-for-papers.html

    • Thanks for the reminder, Jim.

      In fact this waork was done last summer and now I am refreshing the post to appear in XmlPlanet, after my blog address was actualized.

      As for native maps in XQuery 3.0 / XPath 3.0, that would be great, but I am afraid maps won’t be part of the final documents (judging from the XQuery 3.0 Last Call document).

  11. cutlass2011 says:

    ah, very cool … I was probably on holidays when you did this, great stuff.

    re maps oh well, at least having rbtree.xq means we don’t need native for now ;)

Leave a comment