Not for XSLT? More Fun with Project Euler

 

Here is another Project Euler problem that seems exactly what XSLT was not intended for:

 

In the 5 by 5 matrix below, the minimal path sum from the top left to the bottom right, by only moving to the right and down, is indicated in bold red and is equal to 2427.

  

131

673

234

103

18

201

96

342

965

150

630

803

746

422

111

537

699

497

121

956

805

732

524

37

331

  

 

Find the minimal path sum, in matrix.txt (right click and ‘Save Link/Target As…’), a 31K text file containing a 80 by 80 matrix, from the top left to the bottom right by only moving right and down.

My solution:

 

<xsl:stylesheet version="2.0"

 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

 xmlns:xs="http://www.w3.org/2001/XMLSchema"

 xmlns:saxon="http://saxon.sf.net/"

 xmlns:mx="my:my" exclude-result-prefixes="xs saxon mx"

 >

 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 

 <xsl:variable name="vMatrix" as="element()*">

  <xsl:variable name="vLines" as="xs:string*"

   select="tokenize(unparsed-text(‘matrix.txt’),’\s+’)[.]"/>

   <xsl:for-each select="$vLines">

     <line>

      <xsl:for-each select="tokenize(.,’,’)">

        <v><xsl:value-of select="."/></v>

      </xsl:for-each>

     </line>

   </xsl:for-each>

 </xsl:variable>

 

 <xsl:variable name="vDimension" as="xs:integer"

  select="count($vMatrix)"/>

 

 <xsl:template match="/">

  <xsl:sequence select="mx:path-minSum(1,1,0)"/>

 </xsl:template>

 

 <xsl:function name="mx:path-minSum" as="xs:integer"

  saxon:memo-function="yes">

  <xsl:param name="pX" as="xs:integer"/>

  <xsl:param name="pY" as="xs:integer"/>

  <xsl:param name="pcurSum" as="xs:integer"/>

 

  <xsl:variable name="curVal" as="xs:integer"

   select="mx:mtx($pX, $pY)"/>

 

  <xsl:sequence select=

   "if($pX eq $vDimension and $pY eq $vDimension)

      then $curVal

      else

        for $nextX in min(($vDimension, $pX+1)),

            $nextY in min(($vDimension, $pY+1)),

            $s1 in if($nextY gt $pY)

                    then

                     $pcurSum + $curVal

                     + mx:path-minSum($pX, $nextY, $pcurSum)

                    else 999999999999,

            $s2 in if($nextX gt $pX)

                    then

                     $pcurSum + $curVal

                     + mx:path-minSum($nextX, $pY, $pcurSum)

                    else 999999999999

          return

            min(($s1,$s2))

   "/>

 </xsl:function>

 

 <xsl:function name="mx:mtx" as="xs:integer"

  saxon:memo-function="yes">

  <xsl:param name="pX" as="xs:integer"/>

  <xsl:param name="pY" as="xs:integer"/>

 

  <xsl:sequence select="$vMatrix[$pY]/*[$pX]"/>

 </xsl:function>

</xsl:stylesheet>

 

 

Properties:

  • LOC:                                         65 – well-formatted code for readability.
  • Execution time:                  078 ms.
  • Time to write the code:  ~ 10 minutes.
Posted in Project Euler | Leave a comment

Fun with Project Euler

I have been having fun solving some Project Euler problems. Most of my 76 solutions were done in XSLT 2.0 last year.

For those who are busy arguing whether or not  XSLT is a programming language, here is one easy problem:

The 5-digit number, 16807=75, is also a fifth power. Similarly, the 9-digit number, 134217728=89, is a ninth power.

How many n-digit positive integers exist which are also an nth power?

Below is the solution, and it takes 20ms to run with Saxon 9.1.07:

<xsl:stylesheet version="2.0"
xmlns:xsl=http://www.w3.org/1999/XSL/Transform
xmlns:f=http://fxsl.sf.net/>

<xsl:import href="../../../f/func-exp.xsl"/>

<xsl:output method="text"/>

<xsl:template match="/">
  <xsl:sequence select=
   "
(: count( :)
       for $k in 1 to 9,
           $n in 1 to 23,
           $k-pwr-n in f:intppow($k,$n),
           $low in f:intppow(10,$n -1) -1,
           $high in f:intppow(10,$n)
        return
          if($k-pwr-n gt $low and $k-pwr-n lt $high)
           then ($k, $n, $k-pwr-n, ‘
‘)
           else()
   (: ) :)
   "/>
</xsl:template>
</
xsl:stylesheet>

 

f:intppow(k, n)  is an FXSL 2.x function with positive integer arguments that calculates kn.

And if all this is possible even now, imagine yourself working with XSLT 2.1 in the near future…

Posted in Project Euler | Leave a comment

Should XPath have virtual axes?

In his recent blog-article: “XPath needs virtual axes. Making XPath more XPathy?Rick Jelliffe sais:

I really like XPath2. I would never recommend anyone start with XPath1, unless you were doing very basic transformations with no text processing or data formatting.

But the niggle I have with XPath2 is that it is less XPath-y than XPath1. It does not significantly improve the central syntactical feature of XPaths: the location steps. (The only improvement that springs to mind is that XPath2 did improve the use of parentheses in location steps.) Instead, XPath2 provided much more conventional features like a for iterator. I think these significantly decrease the comprehensibility of an XPath, are anonymous and therefore require may comments to explain them, and fracture the line. To an extent, once you start to use nested syntaxes and iterators, why both using XPath at all?”

Rick gives this example:

find-rep( find-client(//manager/clients/client-ref)/rep-ref)/name

and proposes to improve the expression above by introducing “virtual axes” so that it could be written as:

//manager/clients/client-ref/find-client::rep-ref/find-rep::rep/name

So, curious reader, pause for a while, don’t read below, and think: has Rick uncovered a hole in XPath? 

Here is my answer:

What you are asking for can be expressed almost exactly in the same form (actually I prefer the current XPath 2.0 form of expression).

You are asking for:

//manager/clients/client-ref/find-client::rep-ref/find-rep::rep/name

One can write this in XPath 2.0 as:

//manager/clients/client-ref/find-client(.)/rep-ref/find-rep(.)/name

The current XPath way of expressing this is cleaner — no need for virtual axes.

This is a feature of XPath 2.0 that is not widely used and known: any function can be used as the current location step. The syntax rules that resolve its use are (at http://www.w3.org/TR/2007/REC-xpath20-20070123/#id-grammar):

[27] StepExpr ==> [38] FilterExpr ==> [41] PrimaryExpr

and the fact that a FunctionCall is a PrimaryExpr:

[41]   PrimaryExpr   ::=   Literal | VarRef | ParenthesizedExpr | ContextItemExpr | FunctionCall

 

Also, consider that the axes in XPath have been useful so far mainly because there are just a few of them. Imagine having to deal with zillions of axes and struggling to remember what they mean. And if everyone can introduce their own axes, then why bother with them at all?

Dear reader, it’s up to you to decide… as I have already done.

Posted in Uncategorized | 2 Comments

Optimized Repetitive Prepends, Part III: Understanding the Solution

 

In Part II of this post a solution was given to the problem presented in Part I:

"Can we implement an algorithm for repetitive prepends that will be run by P1 in linear time, in addition to its excellent linear performance of repetitive appends?"

The solution is implemented within the function my:prepend-iter() that is defined in the following way:

      <xsl:function name="my:prepend-iter">

        <xsl:param name="pNumTimes" as="xs:integer"/>

        <xsl:param name="pstartSeq" as="element()*"/>

        <xsl:param name="pInserts" as="element()*"/>

 

          <xsl:sequence select=

           "if($pNumTimes gt 0)

               then

                 my:prepend-iter

                        ($pNumTimes -1,

                         $pstartSeq,

                         ($pInserts, reverse($vPrepends) )

                         )

               else

                 (reverse($pInserts), $pstartSeq)

            "/>

      </xsl:function>

The first argument passed to this function, pNumTimes, is the number of times to perform the prepend operation on the initial sequence pstartSeq, which is passed as the second argument.

The third argument has an auxiliary purpose. As its name, pInserts suggests, it contains the accumulation of all sequences that are to be prepended to pstartSeq.

The idea is to produce the final result only when pNumTimes  is zero, using at this time the accumulated prepends in pInserts and the initial sequence pstartSeq.

The idea of the algorithm is to replace the repetitive prepend operations with repetitive append operations, which are carried out by the xslt processor with linear complexity.

We notice that:

y2s ++ y1s ++ xs   =

reverse(  reverse(y1s) ++ reverse(y2s) ) ++ xs       (1)

Or in general, if we have N sequences:

y1s, y2s, …, yNs 

to be prepended to an initial sequence xs in that order,

yNs ++ yN-1s ++ … y2s ++ y1s ++ xs =

reverse(reverse(y1s)++reverse(y2s) ++ …

        …    ++ reverse(yNs) )     ++xs                 (2)

 

Remarkably, the argument of the outermost reverse() is a repetitive appends operation and it has only linear complexity with P1!

We have added N+1 reverse() operations, but each of them is of linear complexity, so the total complexity of implementing (2) is O(N).

Certainly, the function my:prepend-iter() we have demonstrated is a simplified case of the many different real world scenarios we may encounter that will require repetitive prepends. Nevertheless its use is sufficient to prove and demonstrate how such an O(N) algorithm can be constructed.

We could use essentially the same algorithm, with a modification that accepts as argument a function
next-prepend(), which produces the next sequence to be prepended. To signal the end of the repetitive prepend operation, we can use a convention that next-prepend() will indicate this by producing some special sequence, such as the empty sequence.

In case you may be wondering how can a function be passed as an argument to another function in XSLT, do read about FXSL.

Posted in Performance Optimization | Leave a comment

Performance Feat: Eliminate a dimension of complexity in XSLT Processor’s repetitive prepends. Part II: The Solution.

Update: Minor code cleanup (using better names now).

In my previous post I defined the problem of improving the quadratical performance of an XSLT processor P1 when performing repetitive prepends to a sequence, while having an excellent linear repetitive appends performance.

The question asked was:

"Can we implement an algorithm for repetitive prepends that will be run by P1 in linear time, in addition to its excellent linear performance of repetitive appends?"

This transformation produces the result of repetitive prepends:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:my="my:namespace"
  exclude-result-prefixes="my"
>
      <xsl:output method="text"/> 

      <xsl:variable name="vRepetitions" as="xs:integer"
       select="1000000"/> 

      <xsl:variable name="vPrepends" as="element()*">
       <my:node/><my:node/><my:node/><my:node/><my:node/>
       <my:node/><my:node/><my:node/><my:node/><my:node/>
      </xsl:variable>

      <xsl:template match="/">
        <xsl:variable name="vNodes" as="element()*">

          <xsl:sequence select=
              "my:prepend-iter($vRepetitions, (), ())"/>
        </xsl:variable>

        <xsl:value-of select="name($vNodes[last()])"/>

        <xsl:text>, </xsl:text>

        <xsl:value-of select="count($vNodes)"/>
      </xsl:template>

      <xsl:function name="my:prepend-iter">
        <xsl:param name="pNumTimes" as="xs:integer"/>
        <xsl:param name="pstartSeq" as="element()*"/>
        <xsl:param name="pInserts" as="element()*"/>

          <xsl:sequence select=
           "if($pNumTimes gt 0)
               then
                 my:prepend-iter
                        ($pNumTimes -1,

                         $pstartSeq,
                         ($pInserts, reverse($vPrepends) )
                         )
               else
                 (reverse($pInserts), $pstartSeq)
            "/>
      </xsl:function>
</xsl:stylesheet>

When run with the P1 XSLT processor, the results are:

 

Prepends

Time, ms

Speedup
tNonOpt / tOpt

Remarks

100

42

1.47

Hundred prepends.

1000

67

25

Thousand prepends.

10000

145

1411

Ten thousand prepends

100000

932

~ 16000

Estimated speedup for one hundred thousand prepends.

1000000

11666

~ 160000

Estimated speedup for one million prepends.

So, we see that the repetitive prepends have been carried out with linear complexity! We achive a speedup that can reach thousands and even hundred of thousands times!

When carried out with P2, the time complexity remains quadratical, however the actual results are two times faster than with the non-optimized algorithm.

I will explain the algorithm implemented in the above transformation, in Part III of this post.

Posted in Performance Optimization | Leave a comment

Performance Feat: Eliminate a dimension of complexity in XSLT Processor’s repetitive prepends. Part I: The Problem.

Update: Minor code cleanup.

Prepending a list xs with a list ys to obtain the concatenation (of ys ++ xs )  of the two is usually a cheap operation which, when done non-destructively, requires only to copy the list ys to a new list y1s and link the last item of y1s to the head of xs.

Thus if we have to prepend K such lists:

y1s, y2s, …, yks

starting with the prepend of y1s to xs, and prepending each of the following lists above to the result of the last prepend operation, the result will be a list consisting of the items of:

yks, …, y2s, y1s, xs

in that order. Every list will be copied only once and the total operation of K prepends will require copying the items of all the lists. So, prepending a number of lists is a linear operation — proportional to the total number of items in those lists.

On the other side, continuous appending of lists may be O(N^2) when done in a naive way, because the growing list will have to be copied again and again in each append operation.

In the XSLT 2.0 processors available today, the cost of repetitive prepending and appending are different than the above. I have studied two XSLT 2.0 processors, P1 and P2.

P1 optimizes repetitive appends, by achieving linear complexity. Its time complexity for repetitive prepending is O(N^2) — square.

P2 doesn’t optimize either way.

The simple transformation below:

 

<xsl:stylesheet version="2.0"      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"      xmlns:xs="http://www.w3.org/2001/XMLSchema"      xmlns:my="my:namespace"
exclude-result-prefixes="my"
>
      <xsl:output method="text"/>

      <xsl:variable name="vRepetitions" as="xs:integer"
       select="100000"/>     

      <xsl:variable name="vAppends" as="element()*">
       <my:node/><my:node/><my:node/><my:node/><my:node/>
       <my:node/><my:node/><my:node/><my:node/><my:node/>
      </xsl:variable>

      <xsl:template match="/">
        <xsl:variable name="vNodes" as="element()*">
             <xsl:sequence select="my:iter($vRepetitions, ())"/>
        </xsl:variable>

        <xsl:value-of select="name($vNodes[last()])"/>
      </xsl:template>

      <xsl:function name="my:iter" as="element()*">
        <xsl:param name="pNumTimes" as="xs:integer"/>
        <xsl:param name="pstartSeq" as="element()*"/>

            <xsl:sequence select=
             "if($pNumTimes gt 0)
                 then
                   my:iter($pNumTimes -1,
                           ($pstartSeq, $vAppends)
                           )
                 else
                   $pstartSeq
             "/>
      </xsl:function>
</xsl:stylesheet> 

when run with P1 with different number of repetitions, takes the following times:

Appends

Time, ms

Remarks

100

42

Hundred appends.

1000

47

Thousand appends.

10000

85

Ten thousand appends

100000

458

One hundred thousands appends.

1000000

7141

One million appends.

We see that the claims of P1’s linear or better performance are true.

P2 behaves much worse and has O(N^2) results:

Appends

Time, ms

Remarks

100

21

Hundred appends.

1000

7420

Thousand appends.

5000

38100

Failed after 38 seconds. Five thousand appends

10000

——

Failed. Ten thousands appends.

 
With repetitive prepends both P1 and P2 have square time complexity. This was tested with the following transformation:


<xsl:stylesheet version="2.0"
 xmlns:xsl="
http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="
http://www.w3.org/2001/XMLSchema"
 xmlns:my="my:namespace"
  exclude-result-prefixes="my"
 >
 
 <xsl:output method="text"/>
 
 <xsl:variable name="vRepetitions" as="xs:integer"
  select="1000"/>

 

 <xsl:variable name="vPrepends" as="element()*">
  <my:node/><my:node/><my:node/><my:node/><my:node/>
  <my:node/><my:node/><my:node/><my:node/><my:node/>
 </xsl:variable>

 

 <xsl:template match="/">
   <xsl:variable name="vNodes" as="element()*">
      <xsl:sequence select="my:iter($vRepetitions, ())"/>
   </xsl:variable>
  
   <xsl:value-of select="name($vNodes[last()])"/>
 </xsl:template>

 

 <xsl:function name="my:iter" as="element()*">
   <xsl:param name="pNumTimes" as="xs:integer"/>
   <xsl:param name="pstartSeq" as="element()*"/>
 
       <xsl:sequence select=
        "if($pNumTimes gt 0)
            then
              my:iter($pNumTimes -1, insert-before($pstartSeq, 1, $vPrepends))
            else
              $pstartSeq
              "/>
 </xsl:function>
</xsl:stylesheet> 

  


 

P1’s prepend results were the following:

Prepends

Time, ms

Remarks

100

62

Hundred prepends.

1000

1729

Thousand prepends.

10000

204682

Ten thousand prepends

P2’s prepend results were again O(N^2) and much worse than P1’s.

The problem I am trying to solve here is the following:

Can we implement an algorithm for repetitive prepends that will be run by P1 in linear time, in addition to its excellent linear performance of repetitive appends?

P1’s O(N^2) repetitive prepends performance has stayed the same for years so maybe its developer thought it could not be improved or an improvement would be not too important.

At first, it may seem that repetitive prepends are not so important (as repetitive appends — the process via which we create every output). However, some of the most important data structures, such as the stack, often undergo a long series of prepends (the "push" operation). Another important use-case is any attempt to port an existing application that uses repetitive list prepends.

The answer to this question is contained in my next post.

 

Posted in Performance Optimization | Leave a comment

“Real World Haskell” is a JOLT Finalist

The book "Real World Haskell" has been nominated a JOLT Finalist.

I have been reading this book

in online form for the past month and got the hardcopy a few days ago. Two thirds through, I can confidently say that this is the best Haskell book I have read.

From the Jolt Awards Site:

"How are the winners selected?
Our judges not only examine the standard criteria of audience suitability, productivity, innovation, quality, ROI, risk and flexibility, but also seek to honor products that are ahead of the curve. Jolt-winning products are universally useful; are simple, yet rich in functionality; redefine their product space; and/or solve a nagging problem that has consistently eluded other products and books."

Posted in Haskell | Leave a comment

XPath 2.0 Gems: Find all duplicate values in a sequence (Part 2)

Update: Minor correction in the last two rows of the table — thanks to a comment by Michael Ludwig. I will talk about the efficiency of this and other related XPath expressions in my next post.

In my first post I provided a compact one-liner XPath expression that obtains all duplicate items in a given sequence.

Since then a number of people have contacted me, asking for an explanation what this expression means and how it is really evaluated by a compliant XPath 2.0 engine.

C. M. Sperberg-McQueen, a MIT professor and member of W3C, blogged about it:

"Dimitre’s solution is beautiful and concise: 21 characters long (longer if you use variables with names longer than singler characters), and well worth the five or ten minutes it took me to work out why it works. I won’t explain it to you, dear reader; you deserve the brief puzzlement and Aha-moment of understanding it."

"Dimitre gets my vote for this month’s best programming-language application of Strunk and White’s rule: “Omit needless words!” "

I have always admired any of Michael’s works and his high recognition is the best reward I would not have dared dreaming of.

Now that enough time has passed for anyone to try and understand how the solution works, let me explain it myself.

From the very start working on this problem I decided that a good solution must use the available standard functions on sequences.

While distinct-values() was mentioned by the OP, it is not directly useful here, as we are looking for any item that occurs more than once in the sequence.

We can eventually think of using exists() or empty(), however the one function that is most relevant to the task is index-of(). As explained in the F & O Spec.,

15.1.3 fn:index-of

fn:index-of(

$seqParam

as xs:anyAtomicType*,

$srchParam

as xs:anyAtomicType) as xs:integer*

fn:index-of(

$seqParam

as xs:anyAtomicType*,

$srchParam

as xs:anyAtomicType,

$collation

as xs:string) as xs:integer*

Summary: Returns a sequence of positive integers giving the positions within the sequence $seqParam of items that are equal to $srchParam.

The first step in the solution is to ask the question:

How are duplicate items different from unique items (in terms of index-of() ) ?

The answer comes naturally:

For any unique item vUi, the sequence that

  index-of($vSeq, $vUi)

produces, is just of length of one.

For any duplicate item vDup, the sequence that

    index-of($vSeq, $vDup)

produces, is of length of two or more.

In other words,

    index-of($vSeq, $vDup)

always has a second item, while

  index-of($vSeq, $vUi)

never has a second item.

To express this in XPath 2.0, the expression:

index-of($vSeq, $vItem)[2]

produces the empty sequence for any unique item $vItem, and it produces an  xs:integer  index for any $vItem that occurs two or more times in the sequence.

Then one way to find all duplicate items in the sequence is to evaluate:

$vSeq[exists(index-of($vSeq, .)[2])]

In XPath 2.0 the expression in the [] brackets that follow a sequence is called a FilteringExpression. It is evaluated for every item of the sequence and only items that satisfy this predicate are produced.

Let $vSeq is defined as:

   1, 2, 2, 3, 4, 5, 5, 6, 7

Then

$vSeq[exists(index-of($vSeq, .)[2])]

produces:

  2, 2, 5, 5

This is quite close to the result we need, we just need to dedup it:

distinct-values
      
($vSeq[exists(index-of($vSeq, .)[2])])

produces the wanted result:

  2, 5

This is fine, but the last expression seems already too long. Can we do even better?

What about this one:

$vSeq[index-of($vSeq,.)[2]]

It produces :

  2, 5

exactly the wanted result and it is really tight.

The only problem is to explain how this expression "works" :)

Let me admit that on the morning after discovering this solution I found I had completely forgotten the rational behind it. After failing again and again to explain it, I finally asked the Gods for help.

Here is the explanation provided by Dr. Michael Kay:

"This expression

  $vSeq[index-of($vSeq,.)[2]]

means

   $vSeq [position() = index-of($vSeq,.)[2]]

Let’s evaluate the predicate for each item in $vseq:

pos

value

index-of($vSeq,.)

index-of($vSeq,.)[2]

pos=index-of($vSeq,.)[2]

 1

1

1

()

false

 2

2

2, 3

3

false

 3

2

2, 3

3

true

 4

3

4

()

false

 5

4

5

()

false

 6

5

6, 7

7

false

 7

5

6, 7

7

true

 8

6

8

()

false

 9

7

9 

()

false

So the only items that match the predicate are those in positions 3 and 7, with values (2, 5)."

To summarize, the critical step in understanding how this works is to remember that if the type of the FilteringExpression is xs:integer, then

$someSequence[someFilteringExpression]

is simply an abbreviation for:

$someSequence[position() = someFilteringExpression]

Therefore,

  $vSeq[index-of($vSeq,.)[2]]

means :

All items from $vSeq such that their position is equal to the value of the index of their second occurence in the sequence.

As there is at most one item at any given position, this is how we get just one "2" and one  "5" in

2, 5

as contrasted to two of each in

2, 2, 5, 5

Concluding, here is one final note:

If you have ever used the Muenchian method of grouping, you may have spotted its resemblance to the current solution.

A typical expression used in Muenchian grouping is something like this:

<xsl:key match="person" name="kPersByName" use="@name"/>

<xsl:for-each select=
"*/person
     [generate-id()
     =
      generate-id(key(‘kPersByName’, @name)[1])
      ]">

<!– Do something –>

</xsl:for-each>

In the above, the role of the filtering expression is played by:

      generate-id()
     =
      generate-id(key(‘kPersByName’, @name)[1])

and this is the same as the role played by:

  index-of($vSeq,.)[2]

in finding the duplicate items.

The Muenchian method uses an index of "1" in:

  key(‘kPersByName’, @name)[1]

and in our solution:

   index-of($vSeq,.)[2]

a similar role is played by the index "2".

"1" in the Muenchian method means: Take one node from each group of nodes with unique keys. Because any group can consist just of only one node, it is only safe here to take the first node from any group.

In the case of finding all duplicate items in a sequence, we want to pick one index from the list of indexes of any duplicate item. Having a first index does not qualify an item as duplicate — it needs to have at least a second index in the sequence. On the other side there may be duplicate items that occur only twice (not thrice or more times) and they will not have a third index.

Therefore, the only safe way to take an existing index for any duplicate item is to take its second index.

If you have successfully read up to here, you would have little difficulty solving the following problems, provided as exercises:

Let $vSeq be (1,  2,2,   3,3,3,    4,4,4,4,   5,5,   6,   7)

P1. Produce all items in $vSeq that occur only once in it.

P2. Produce all items in $vSeq that occur exactly two times in it.

P3. Produce all items in $vSeq that occur exactly three times in it (triples).

P4. Produce all items in $vSeq that occur exactly four times in it (quadruples).

P5. Produce all items in $vSeq that occur more than once but less than four times.

Hint: You may consider using the last() function in your solution.

Posted in Uncategorized | 2 Comments

XPath 2.0 Gems: Find all duplicate values in a sequence

 

Someone recently asked the following question:

"I have an XPath expression which provides me a sequence of values like the one below:

1 2 2 3 4 5 5 6 7

It is easy to convert this to a set of unique values "1 2 3 4 5 6 7" using the distinct-values function. However, what I want to extract is the list of duplicate values = "2 5". I can’t think of an easy way to do this.
Can anyone help?"

One of the best XPath/XSLT experts, whom I greatly respect, provided the following answer:

"What about:

distinct-values(
  for $item in $seq
  return if (count($seq[. eq $item]) > 1)
         then $item
         else ())

This iterates through the items in the sequence, and returns the item if the number of items in the sequence that are equal to that item is greater than one. You then have to use distinct-values() to remove the duplicates from that list."

Can you do better?

My own solution is a 21 chars one-liner, and would only be slightly longer, if efficiency would need to be addressed.

Posted in XPath | Leave a comment

Michael Crichton dies

 

Michael Crichton died unexpectedly at an age of 66.

As with Asimov, will miss the continuation of his writing, which I had taken for granted.

Is this really the end of an epoch?

Posted in Books | Leave a comment