The set datatype implemented in XPath 3.0

In my previous two posts I introduced the binary search tree datatype, implemented in XPath 3.0. In most cases the binary search tree operations find, insert and delete have efficiency of O(log(d)) and the print/serialize operation is O(N*log(d)), where d is the maximum depth of the tree and N is the total number of tree nodes.

Today, I will show a set implemented using a binary search tree. This implementation choice means that the set datatype operations have the same efficiency as the corresponding binary search tree operations. This implementation choice also results in the constraint that the set of values for any kind of item, that we want to put in such set, must have total ordering. In other words, the standard (or user-defined) lt operation must be applicable on any pair of items belonging to the set.

Here is the code:

import module namespace set-impl =http://fxsl.sf.net/data/set/implementation”
              at “implementations/set-implementation-as-binary-search-tree.xquery”;
             

declare namespace set =http://fxsl.sf.net/data/set

 

(:
  Create an empty set — (0x2205)
 :)
declare functionset:set()
         as
function() as item()*

{
  
set-impl:set()
};

declare functionset:empty($pSet as function() as item()*)
         as xs:boolean
{
  
set-impl:empty($
pSet)
};

declare functionset:size($pSet as function() as item()*)
         as xs:integer
{
  
set-impl:size($
pSet)
};

declare function set:equals
            (
$pSet1 as function() as item()*
,
            
$pSet2 as function() as item()*

            )
         as xs:boolean
{
   
set-impl:equals($pSet1, $pSet2)
};

declare function set:add
     (
$pSet as function() as item()*
,
     
$pItem as item
()
     )
         as
function() as item()*

{
  
set-impl:add($pSet, $pItem)
};

declare functionset:data($pSet as function() as item()*

)
         as
item()*

{
  
set-impl:data($pSet)
};

  The Set Theory belongs to (member of)   (0x22F2)
  operation
 :)
declare function set:member
     ($pItem as item()?,
     
$pSet as function() as item()*

      )
         as xs:boolean
{
  
set-impl:member($pItem, $pSet)
};

(:
 The Set Theory does not belong to (not member of)  (0x2209)
  operation
 :)

declare function set:not-member
     (
$pItem as item()?
,
     
$pSet as function() as item()*

      )
         as xs:boolean
{
  
not(set:member($pItem, $pSet))
};

declare function set:remove
     (
$pSet as function() as item()*
,
     
$pItem as item
()
     )
         as
function() as item()*

{
  
set-impl:remove($pSet[1], $pItem)
};

(:
  The classic set-theory U operation — union of two sets
:)
declare function set:U
     (
$pSet1 as function() as item()*
,
     
$pSet2 as function() as item()*

     )
         as
function() as item()*
{
  
set-impl:U($pSet1, $pSet2)
};

(:
  The classic set-theory set-difference operation \

:)
declare function set:diff
     (
$pSet1 as function() as item()*
,
     
$pSet2 as function() as item()*

     )
         as
function() as item()*
{
  
set-impl:diff($pSet1, $pSet2)
};

(: 
The classic set-theory ∩  (∩) /\
  set-intersection operation
:)
declare function set:intersect
     (
$pSet1 as function() as item()*
,
     
$pSet2 as function() as item()*

     )
         as
function() as item()*
{
 
set-impl:intersect($pSet1, $pSet2)
};
(:
The classic set-theory symmetric-difference operation
:)
declare function set:sym-diff
     (
$pSet1 as function() as item()*
,
     
$pSet2 as function() as item()*

     )
         as
function() as item()*
{
  
set-impl:sym-diff($pSet1, $pSet2)
};

(: 
The classic set-theory set (‘⊇’) operation
:)
declare function set:includes
     (
$pSet1 as function() as item()*
,
     
$pSet2 as function() as item()*

     )
         as xs:boolean
{
 
set-impl:includes($pSet1, $pSet2)
};

declare function set:print
     (
$pSet as function() as item()*
)
         as
element()?

{
 
set-impl:print($pSet)
};

declare function set:serialize
     (
$pSet as function() as item()*
)
         as
element()?

{
 
set-impl:serialize($pSet)
};

declare function set:deserialize
     (
$pSerialization as element()?
)
         as
function() as item()*

{
 
set-impl:deserialize($pSerialization)
};

Now, you would be right that this code shows very little, if anything, at all. However, it has at least two virtues:

  • Can serve as interface.
  • Is implementation independent – to use a different implementation just import another implementation module and bind the prefix set-impl to its namespace.

Having said that, here is the “true” implementation, which is contained in the file “implementations/set-implementation-as-binary-search-tree.xquery”.

This is a separate XQuery module with its own namespace. It imports and uses the module that implements the binary search tree:

module namespace set-impl =http://fxsl.sf.net/data/set/implementation”
              at “../../BinaryTree/bintreeModule.xquery”;

import module namespace tree =http://fxsl.sf.net/data/bintree;

The set implementation module “contains” a binary search tree. The “empty set” is nothing else than a zero-argument function that returns an empty tree:

(:
  Create an empty set — (0x2205)
 :)
declare functionset-impl:set()
         as
function() as item()*

{
  
function() {tree:tree()}   (: underlying bintree :)
};

Many of the basic set operations have their binary search tree counterparts. Here is what it means in our implementation for a set to be “empty” and how we compute the size of a set:

declare functionset-impl:empty($pSet as function() as item()*)
         as xs:boolean
{
  
tree:empty($pSet[1
]())
};

 

declare functionset-impl:size($pSet as function() as item()*)
         as xs:integer
{
  
tree:size($pSet[1
]())
};

Two sets are equal exactly when they have the same size and the sequences of their atomized values are deep-equal:

declare function set-impl:equals
            (
$pSet1 as function() as item()*
,
            
$pSet2 as function() as item()*

            )
         as xs:boolean
{
   
set-impl:size($pSet1) eqset-impl:size($pSet1)
  
and

    deep-equal(set-impl:data($pSet1), set-impl:data($pSet2))
};

Here is how we add an item to a set:

declare function set-impl:add
     (
$pSet as function() as item()*
,
     
$pItem as item
()
     )
         as
function() as item()*

{
  
function() {tree:insert($pSet[1](), $pItem)}
};

and how we atomize a set:

declare functionset-impl:data($pSet as function() as item()*)
         as
item()*

{
  
tree:data($pSet[1]())
};

When is an item member of a set:

(:
  The Set Theory belongs to (member of)   (0x22F2)
  operation
 :)
declare function set-impl:member
     (
$pItem as item()?
,
     
$pSet as function() as item()*

      )
         as xs:boolean
{
  
tree:contains($pSet[1](), $pItem)
};

and when it isn’t a member of a set:

(:
  The Set Theory does not belong to (not member of)   (0x2209)
  operation
 :)

declare function set-impl:not-member
     (
$pItem as item()?
,
     
$pSet as function() as item()*

      )
         as xs:boolean
{
  
not(set-impl:member($pItem, $pSet))
};

This is how we remove an item from a set:

declare function set-impl:remove
     (
$pSet as function() as item()*
,
     
$pItem as item
()
     )
         as
function() as item()*

{
  
function() {tree:deleteNode($pSet[1](), $pItem)}
};

The classic set theory U (union) operation returns the union of two sets. Making use of the new standard XPath 3.0 higher order function fold-left(), we add all items from the smaller of the two sets to the bigger, thus performing only the minimal possible number of add operations:

(:
  The classic set-theory U operation — union of two sets
:)
declare function set-impl:U
     (
$pSet1 as function() as item()*
,
     
$pSet2 as function() as item()*

     )
         as
function() as item()*
{
  
let$vBiggerSet :=
          
if(set-impl:size($pSet1) geset-impl:size($
pSet2))
            
then$
pSet1
            
else$
pSet2,
      
$vSmallerSet :=

           if(not(set-impl:size($pSet1) geset-impl:size($pSet2)))
            
then$
pSet1
            
else$
pSet2
   
return

       fold-left(set-impl:add#2, $vBiggerSet, set-impl:data($vSmallerSet))
};

Similarly, the classic set theory \ (set difference) operation:

(:
  The classic set-theory set-difference operation \
:)
declare function set-impl:diff
     (
$pSet1 as function() as item()*
,
     
$pSet2 as function() as item()*

     )
         as
function() as item()*
{
 
fold-left(set-impl:remove#2, $pSet1, set-impl:data($pSet2))
};

And the classic set theory  intersect operation:

(:
  The classic set-theory ∩  (∩) /\
  set-intersection operation
:)
declare function set-impl:intersect
     (
$pSet1 as function() as item()*
,
     
$pSet2 as function() as item()*

     )
         as
function() as item()*
{
  
let$vBiggerSet :=
          
if(set-impl:size($pSet1) geset-impl:size($
pSet2))
            
then$
pSet1
            
else$
pSet2,
      
$vSmallerSet :=

           if(not(set-impl:size($pSet1) geset-impl:size($pSet2)))
            
then$
pSet1
            
else$
pSet2,
      
$to-be-removed :=

          
filter(set-impl:not-member(?, $
vBiggerSet),
                     
set-impl:data($
vSmallerSet)
                     )
   
return

       fold-left(set-impl:remove#2, $vSmallerSet, $to-be-removed)
};

Two more set theory operations – symmetric difference:

(:
  The classic set-theory symmetric-difference operation
:)
declare function set-impl:sym-diff
     (
$pSet1 as function() as item()*
,
     
$pSet2 as function() as item()*

     )
         as
function() as item()*
{
 
set-impl:U(set-impl:diff($pSet1, $pSet2), set-impl:diff($pSet2, $pSet1))
};

and (non-strict) set inclusion:

(:
  The classic set-theory set (‘⊇’) operation
:)
declare function set-impl:includes
     (
$pSet1 as function() as item()*
,
     
$pSet2 as function() as item()*

     )
         as xs:boolean
{
 
empty(set-impl:data($pSet2)[set-impl:not-member(.,$pSet1)])
};

And finally, three infrastructural operations: print(), serialize() and deserialize() :

declare function set-impl:print
     (
$pSet as function() as item()*
)
         as
element()?

{
 
tree:print($pSet[1]())
};
 

declare function set-impl:serialize
     (
$pSet as function() as item()*
)
         as
element()?

{
 
set-impl:print($pSet)
};

declare function set-impl:deserialize
     (
$pSerialization as element()?
)
         as
function() as item()*

{
 
function() {tree:deserialize($pSerialization)}
};

Here is a test of our set implementation:

let$set0 :=set:add(set:set(), 2),
    
$set1 :=set:add(set:add(set:add(set:set(), 10), 5), 17
),
    
$set2 :=set:remove($set1, 5
),
    
$set3 :=set:remove($set1, 12
),
    
$set4 :=set:add(set:set(), 2
),
    
$set4 :=set:add(set:add(set:add(set:set(), 8), 4), 9
),
    
$set5 :=set:add(set:add(set:add(set:set(), 17), 10), 19
)
 
return

    (
           
‘ set:empty(set:set()): ‘,
             
set:empty(set:set
()),
             
‘ set:size(set:set()): ‘
,
             
set:size(set:set
()),
             
‘ set:size($set0): ‘
,
             
set:size($
set0),
             
‘ set:size($set1): ‘
,
             
set:size($
set1),
             
‘ set:data($set1): ‘
,
             
set:data($
set1),
             
‘ set:member(10,$set1): ‘
,
             
set:member(10,$
set1),
             
‘ set:member(17,$set1): ‘
,
             
set:member(17,$
set1),
             
‘ set:member(5,$set1): ‘
,
             
set:member(5,$
set1),
             
‘ set:member(2,$set1): ‘
,
             
set:member(2,$
set1),
             
‘ set:member(15,$set1): ‘
,
             
set:member(15,$
set1),
             
‘ set:data($set2): ‘
,
             
set:data($
set2),
             
‘ set:data($set3): ‘
,
             
set:data($
set3),
             
‘ set:data($set4): ‘
,
             
set:data($
set4),
             
‘ set:data(set:U($set1,$set4)): ‘
,
             
set:data(set:U($set1,$
set4)),
             
‘ set:print(set:U($set1,$set4)): ‘
,
             
set:print(set:U($set1,$
set4)),
             
‘ set:print(set:U($set4,$set1)): ‘
,
             
set:print(set:U($set4,$
set1)),
             
‘ set:print(set:U($set1,$set2)): ‘
,
             
set:print(set:U($set1,$
set2)),
             
‘ set:equals($set1, set:U($set1,$set2)): ‘
,
             
set:equals($set1, set:U($set1,$
set2)),
             
‘ set:data($set1): ‘
,
             
set:data($
set1),
             
‘ set:data($set4): ‘
,
             
set:data($
set4),
             
‘ set:data(set:U($set1,$set4)): ‘
,
             
set:data(set:U($set1,$
set4)),
             
‘ set:print(set:U($set1,$set4)): ‘
,
             
set:print(set:U($set1,$
set4)),
             
‘ set:data($set1): ‘
,
             
set:data($
set1),
             
‘ set:data($set2): ‘
,
             
set:data($
set2),
             
‘ set:data(set:diff($set1,$set2)): ‘
,
             
set:data(set:diff($set1,$
set2)),
             
‘ set:print(tree:print(set:diff($set1,$set2)): ‘
,
             
set:print(set:diff($set1,$
set2)),
             
‘ set:data($set1): ‘
,
             
set:data($
set1),
             
‘ set:data($set2): ‘
,
             
set:data($
set2),
             
‘ set:data(set:intersect($set1, $set2)): ‘
,
  
set:data(set:intersect($set1, $
set2)),
             
‘ set:data($set1): ‘
,
             
set:data($
set1),
             
‘ set:data($set5): ‘
,
             
set:data($
set5),
             
‘ set:data(set:sym-diff($set1, $set5)): ‘
,
  
set:data(set:sym-diff($set1, $
set5)),
             
‘ set:includes($set1, $set2): ‘
,
  
set:includes($set1, $
set2),
             
‘ set:includes($set1, $set5): ‘
,
  
set:includes($set1, $
set5),
             
‘ set:includes($set1, set:set()): ‘
,
  
set:includes($set1, set:set
()),
             
‘ set:member((), $set1): ‘
,
  
set:member((), $
set1),
             
‘ set:member((), set:set()): ‘
,
  
set:member((), set:set
()),
             
‘ set:serialize($set5): ‘
,
  
set:serialize($
set5),
             
‘ set:data(set:deserialize(set:serialize($set5))): ‘
,
  
set:data(set:deserialize(set:serialize($
set5))),
             
‘ set:size(set:deserialize(set:serialize($set5))): ‘
,
  
set:size(set:deserialize(set:serialize($
set5)))
             )

When this test is executed, the expected, correct results are produced:

set:empty(set:set()):  true
set:size(set:set()):  0
set:size($set0):  1
set:size($set1):  3
set:data($set1):  5 10 17
set:member(10,$set1):  true
set:member(17,$set1):  true
set:member(5,$set1):  true
set:member(2,$set1):  false
set:member(15,$set1):  false
set:data($set2):  10 17
set:data($set3):  5 10 17
set:data($set4):  4 8 9
set:data(set:U($set1,$set4)):  4 5 8 9 10 17
set:print(set:U($set1,$set4)):

<treeNode>

   <value>10</value>
   <treeNode>
      <value>5</value>
      <treeNode>
         <value>4</value>
      </treeNode>
      <treeNode>
         <value>8</value>
         <treeNode>
            <value>9</value>
         </treeNode>
      </treeNode>
   </treeNode>
   <treeNode>
      <value>17</value>
   </treeNode>
</treeNode>
set:print(set:U($set4,$set1)):
<treeNode>
   <value>8</value>
   <treeNode>
      <value>4</value>
      <treeNode>
         <value>5</value>
      </treeNode>
   </treeNode>
   <treeNode>
      <value>9</value>
      <treeNode>
         <value>10</value>
         <treeNode>
            <value>17</value>
         </treeNode>
      </treeNode>
   </treeNode>
</treeNode>
set:print(set:U($set1,$set2)):
<treeNode>
   <value>10</value>
   <treeNode>
      <value>5</value>
   </treeNode>
   <treeNode>
      <value>17</value>
   </treeNode>
</treeNode>
set:equals($set1, set:U($set1,$set2)):  true
set:data($set1):  5 10 17
set:data($set4):  4 8 9
set:data(set:U($set1,$set4)):  4 5 8 9 10 17
set:print(set:U($set1,$set4)):

<treeNode>
   <value>10</value>
   <treeNode>
      <value>5</value>
      <treeNode>
         <value>4</value>
      </treeNode>
      <treeNode>
         <value>8</value>
         <treeNode>
            <value>9</value>
         </treeNode>
      </treeNode>
   </treeNode>
   <treeNode>
      <value>17</value>
   </treeNode>
</treeNode>
set:data($set1):  5 10 17
set:data($set2):  10 17
set:data(set:diff($set1,$set2)):  5
set:print(tree:print(set:diff($set1,$set2)):

<treeNode>
   <value>5</value>
</treeNode>
set:data($set1):  5 10 17
set:data($set2):  10 17
set:data(set:intersect($set1, $set2)):  10 17
set:data($set1):  5 10 17
set:data($set5):  10 17 19
set:data(set:sym-diff($set1, $set5)):  5 19
set:includes($set1, $set2):  true
set:includes($set1, $set5):  false
set:includes($set1, set:set()):  true
set:member((), $set1):  true
set:member((), set:set()):  true
set:serialize($set5):
<treeNode>
   <value>17</value>
   <treeNode>
      <value>10</value>
   </treeNode>
   <treeNode>
      <value>19</value>
   </treeNode>
</treeNode>
set:data(set:deserialize(set:serialize($set5))): 10 17 19
set:size(set:deserialize(set:serialize($set5))):  3

In a few of my next posts I will show other interesting applications of the two new datatypes: the set and the binary search tree.

Useful utility: Zip Codes – Free zip code lookup and zip code database download.

This entry was posted in functional data structures, XPath, XPath 3.0. Bookmark the permalink.

2 Responses to The set datatype implemented in XPath 3.0

  1. dpawson says:

    Context Dimitre? You call this as xpath 3.0, which should be part of an XSLT script? How about showing a use for this in an XSLT context?

    regards DaveP

    • Dave, I wrote this in my blog one year ago, when Saxon’s XSLT 3.0 was still buggy to an extent that didn’t allow such code to run correctly. Therefore, I had only Saxon’s XQuery 3.0 implementation left.

      Now, I guess that Saxon’s XSLT 3.0 has been improved.

      It is extremely easy to convert my Set datatype to XSLT 3.0 — you may notice that I deliberately don’t use any XQuery specific feature at all — all the code is pure XPath 3.0 that should run equally-well in XSLT 3.0, too.

      Just make every declare function an <xsl:function> and make its code the value of the select attribute of a single <xsl:sequence> instruction.

      This raises an interesting use-case for the future:
      Code that is a set of pure XPath functions and can be used both from XSLT and XQuery.

      We need to be able to save each “XPath function” in a separate file and we need a pure XPath function-import clause.

Leave a comment