In my previous two posts I introduced the binary search tree datatype, implemented in XPath 3.0. In most cases the binary search tree operations find, insert and delete have efficiency of O(log(d)) and the print/serialize operation is O(N*log(d)), where d is the maximum depth of the tree and N is the total number of tree nodes.
Today, I will show a set implemented using a binary search tree. This implementation choice means that the set datatype operations have the same efficiency as the corresponding binary search tree operations. This implementation choice also results in the constraint that the set of values for any kind of item, that we want to put in such set, must have total ordering. In other words, the standard (or user-defined) lt operation must be applicable on any pair of items belonging to the set.
Here is the code:
import module namespace set-impl =“http://fxsl.sf.net/data/set/implementation”
(: declare functionset:empty($pSet as function() as item()*) declare functionset:size($pSet as function() as item()*) declare function set:equals declare function set:add declare functionset:data($pSet as function() as item()* ) (: declare function set:not-member declare function set:remove (: (: (: declare function set:print declare function set:serialize declare function set:deserialize |
Now, you would be right that this code shows very little, if anything, at all. However, it has at least two virtues:
- Can serve as interface.
- Is implementation independent – to use a different implementation just import another implementation module and bind the prefix set-impl to its namespace.
Having said that, here is the “true” implementation, which is contained in the file “implementations/set-implementation-as-binary-search-tree.xquery”.
This is a separate XQuery module with its own namespace. It imports and uses the module that implements the binary search tree:
module namespace set-impl =“http://fxsl.sf.net/data/set/implementation” import module namespace tree =http://fxsl.sf.net/data/bintree; |
The set implementation module “contains” a binary search tree. The “empty set” ∅ is nothing else than a zero-argument function that returns an empty tree:
(: |
Many of the basic set operations have their binary search tree counterparts. Here is what it means in our implementation for a set to be “empty” and how we compute the size of a set:
declare functionset-impl:empty($pSet as function() as item()*)
declare functionset-impl:size($pSet as function() as item()*) |
Two sets are equal exactly when they have the same size and the sequences of their atomized values are deep-equal:
declare function set-impl:equals |
Here is how we add an item to a set:
declare function set-impl:add |
and how we atomize a set:
declare functionset-impl:data($pSet as function() as item()*) |
When is an item member of a set:
(: |
and when it isn’t a member of a set:
(: declare function set-impl:not-member |
This is how we remove an item from a set:
declare function set-impl:remove |
The classic set theory U (union) operation returns the union of two sets. Making use of the new standard XPath 3.0 higher order function fold-left(), we add all items from the smaller of the two sets to the bigger, thus performing only the minimal possible number of add operations:
(: |
Similarly, the classic set theory \ (set difference) operation:
(: |
And the classic set theory ∩ intersect operation:
(: |
Two more set theory operations – symmetric difference:
(: |
and ⊇ (non-strict) set inclusion:
(: |
And finally, three infrastructural operations: print(), serialize() and deserialize() :
declare function set-impl:print declare function set-impl:serialize declare function set-impl:deserialize |
Here is a test of our set implementation:
let$set0 :=set:add(set:set(), 2), |
When this test is executed, the expected, correct results are produced:
set:empty(set:set()): true |
In a few of my next posts I will show other interesting applications of the two new datatypes: the set and the binary search tree.
Useful utility: Zip Codes – Free zip code lookup and zip code database download.
Context Dimitre? You call this as xpath 3.0, which should be part of an XSLT script? How about showing a use for this in an XSLT context?
regards DaveP
Dave, I wrote this in my blog one year ago, when Saxon’s XSLT 3.0 was still buggy to an extent that didn’t allow such code to run correctly. Therefore, I had only Saxon’s XQuery 3.0 implementation left.
Now, I guess that Saxon’s XSLT 3.0 has been improved.
It is extremely easy to convert my Set datatype to XSLT 3.0 — you may notice that I deliberately don’t use any XQuery specific feature at all — all the code is pure XPath 3.0 that should run equally-well in XSLT 3.0, too.
Just make every
declare function
an<xsl:function>
and make its code the value of the select attribute of a single<xsl:sequence>
instruction.This raises an interesting use-case for the future:
Code that is a set of pure XPath functions and can be used both from XSLT and XQuery.
We need to be able to save each “XPath function” in a separate file and we need a pure XPath
function-import
clause.