xslt - Nesterovsky bros

While doing a migration of some big xslt 3 project into plain C# we run into a case that was not obvious to resolve.

Documents we process can be from a tiny to a moderate size. Being stored in xml they might take from virtually zero to, say, 10-20 MB.

In C# we may rewrite Xslt code virtually in one-to-one manner using standard features like XDocument, LINQ, regular classes, built-in collections, and so on. Clearly C# has a reacher repertoire, so task is easily solved unless you run into multiple opportunities to solve it.

The simplest solution is to use XDocument API to represent data at runtime, and use LINQ to query it. All features like xslt keys, templates, functions, xpath sequences, arrays and maps and primitive types are natuarally mapped into C# language and its APIs.

Taking several xslt transformations we could see that xslt to C# rewrite is rather straightforward and produces recognizable functional programs that have close C# source code size to their original Xslt. As a bonus C# lets you write code in asynchronous way, so C# wins in a runtime scalability, and in a design-time support.

But can you do it better in C#, especially when some data has well defined xml schemas?

The natural step, in our opinion, would be to produce C# plain object model from xml schema and use it for runtime processing. Fortunately .NET has xml serialization attributes and tools to produce classes from xml schemas. With small efforts we have created a relevant class hierarchy for a rather big xml schema. XmlSerializer is used to convert object model to and from xml through XmlReader and XmlWriter. So, we get typed replacement of generic XDocument that still supports the same LINQ API over collections of objects, and takes less memory at runtime.

The next step would be to commit a simple test like:

read object model;
transform it;
write it back.

We have created such tests both for XDocument and for object model cases, and compared results from different perspectives.

Both solution produce very similar code, which is also similar to original xslt both in style and size.

Object model has static typing, which is much better to support.

But the most unexpected outcome is that object model was up to 20% slower due to serialization and deserialization even with pregenerated xmlserializer assemblies. Difference of transformation performance and memory consumption was so unnoticable that it can be neglected. These results were confirmed with multiple tests, with multiple cycles including heating up cycles.

Here we run into a case where static typing harms more than helps. Because of the nature of our processing pipeline, which is offline batch, this difference can be mapped into 10th of minutes or even more.

Thus in this particular case we decided to stay with runtime typing as a more performant way of processing in C#.

Sunday, 08 January 2023 13:28:14 UTC

Comments [0] -
.NET | xslt

xml vs json

Xslt is oftentimes thought as a tool to take input xml, and run transformation to get html or some xml on output. Our use case is more complex, and is closer to a data mining of big data in batch. Our transformation pipelines often take hour or more to run even with SSD disks and with CPU cores fully loaded with work.

So, we're looking for performance opportunities, and xml vs json might be promising.

Here are our hypotheses:

json is lighter than xml to serialize and deserialize;
json stored as map(*), array(*) and other items() are ligher than node() at runtime, in particular subtree copy is zero cost in json;
templates with match patterns are efficiently can be implemented with maps();
there is incremental way forward from use of xml to use of json.

If it pays off we might be switching xml format to json all over, even though it is a development effort.

But to proceed we need to commit an experiment to measure processing speed of xml vs json in xslt.

Now our task is to find an isolated small representative sample to prove or reject our hypotheses.

Better to start off with some existing transformation, and change it from use of xml to json.

The question is whether there is such a candidate.

Saturday, 16 April 2022 19:03:04 UTC

Comments [0] -
Thinking aloud | xslt

Xslt Graph and Saxon processor

Not sure what is use of our Xslt Graph exercises but what we are sure with is that it stresses different parts of Saxon Xslt engine and helps to find and resolve different bugs.

While implementing biconnected components algorithm we incidently run into internal error with Saxon 10.1 with rather simple xslt:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:array="http://www.w3.org/2005/xpath-functions/array"
  exclude-result-prefixes="xs array">

  <xsl:template match="/">
    <xsl:sequence select="
      array:fold-left
      (
        [8, 9], 
        (), 
        function($first as item(), $second as item()) 
        {  
          min(($first, $second))
        }
      )"/>
  </xsl:template>

</xsl:stylesheet>

More detail can be found at Saxon's issue tracker: Bug #4578: NullPointerException when array:fold-left|right $zero argument is an empty sequence.

Bug is promptly resolved.

Monday, 08 June 2020 05:58:32 UTC

Comments [0] -
Tips and tricks | xslt

Algorithm for Biconnected components

While working on algorithm to trace Biconnected components for Graph API in the XSLT we realized that we implemented it unconventionally.

A pseudocode in Wikipedia is:

GetArticulationPoints(i, d)
    visited[i] := true
    depth[i] := d
    low[i] := d
    childCount := 0
    isArticulation := false

    for each ni in adj[i] do
        if not visited[ni] then
            parent[ni] := i
            GetArticulationPoints(ni, d + 1)
            childCount := childCount + 1
            if low[ni] ≥ depth[i] then
                isArticulation := true
            low[i] := Min (low[i], low[ni])
        else if ni ≠ parent[i] then
            low[i] := Min (low[i], depth[ni])
    if (parent[i] ≠ null and isArticulation) or (parent[i] = null and childCount > 1) then
        Output i as articulation point

That algorithm is based on the fact that connected graph can be represented as a tree of biconnected components. Vertices of such tree are called articulation points. Implementation deals with a depth of each vertex, and with a lowpoint parameter that is also related to vertex depth during Depth-First-Search.

Out of interest we approached to the problem from different perspective. A vertex is an articulation point if it has neighbors that cannot be combined into a path not containing this vertex. As well as classical algorithm we use Depth-First-Search to navigate the graph, but in contrast we collect cycles that pass through each vertex. If during back pass of Depth-First-Search we find not cycle from "child" to "ancestor" then it is necessary an articulation point.

Here is pseudocode:

GetArticulationPoints(v, p) -> result
    index = index + 1
    visited[v] = index 
    result = index
    articulation = p = null ? -1 : 0

    for each n in neighbors of v except p do
        if visited[n] = 0 then
            nresult = GetArticulationPoints(n, v)
            result = min(result, nresult)

            if nresult >= visited[v] then
                articulation = articulation + 1
        else
            result = min(result, visited[n])

    if articulation > 0 then
        Output v as articulation point

Algorithms' complexity are the same.

What is interesting is that we see no obvious way to transform one algorithm into the other except from starting from Graph theory.

More is on Wiki.

Sunday, 24 May 2020 12:15:02 UTC

Comments [0] -
Thinking aloud | xslt

On XSLT 4

Michael Key's "A Proposal for XSLT 4.0" has spinned our interest in what could be added or changed in XSLT. This way we decided to implement Graph API purely in xslt. Our goal was to prove that:

it's possible to provide efficient implementation of different Graph Algorithms in XSLT;
to build Graph API the way that engine could provide native implementations of Grahp Algorithms.
to find through an experiments what could be added to XSLT as a language.

At present we may confirm that first two goals are reachable; and experiments have shown that XSLT could provide more help to make program better, e.g. we have seen that language could simplify coding cycles.

Graph algorithms are often expressed with while cycles, e.g "Dijkstra's algorithm" has:

12      while Q is not empty:
13          u ← vertex in Q with min dist[u]

body is executed when condition is satisfied, but condition is impacted by body itself.

In xslt 3.0 we did this with simple recursion:

<xsl:template name="f:while" as="item()*">
  <xsl:param name="condition" as="function(item()*) as xs:boolean"/>
  <xsl:param name="action" as="function(item()*) as item()*"/>
  <xsl:param name="next" as="function(item()*, item()*) as item()*"/>
  <xsl:param name="state" as="item()*"/>

  <xsl:if test="$condition($state)">
    <xsl:variable name="items" as="item()*" select="$action($state)"/>

    <xsl:sequence select="$items"/>

    <xsl:call-template name="f:while">
      <xsl:with-param name="condition" select="$condition"/>
      <xsl:with-param name="action" select="$action"/>
      <xsl:with-param name="next" select="$next"/>
      <xsl:with-param name="state" select="$next($state, $items)"/>
    </xsl:call-template>
  </xsl:if>
</xsl:template>

But here is the point. It could be done in more comprehended way. E.g. to let xsl:iterate without select to cycle until xsl:break is reached.

<xsl:iterate>
  <xsl:param name="name" as="..." value="..."/>
  
  <xsl:if test="...">
    <xsl:break/>
  </xsl:if>

  ...
</xsl:iterate>

So, what we propose is to let xsl:iterate/@select to be optional, and change the behavior of processor when the attribute is missing from compilation error to a valid behavior. This should not impact on any existing valid XSLT 3.0 program.

Tuesday, 19 May 2020 07:00:25 UTC

Comments [0] -
Thinking aloud | xslt

Graphs in XSLT

Recently we've read an article "A Proposal for XSLT 4.0", and thought it worth to suggest one more idea. We have written a message to Michael Kay, author of this proposal. Here it is:

A&V
Historically xslt, xquery and xpath were dealing with trees. Nowadays it became much common to process graphs. Many tasks can be formulated in terms of graphs, and in particular any task processing trees is also graph task.

I suggest to take a deeper look in this direction.

As an inspiration I may suggest to look at "P1709R2: Graph Library" - the C++ proposal.

Michael Kay
I have for many years found it frustrating that XML is confined to hierarchic relationships (things like IDREF and XLink are clumsy workarounds); also the fact that the arbitrary division of data into "documents" plays such a decisive role: documents should only exist in the serialized representation of the model, not in the model itself.

I started my career working with the Codasyl-defined network data model. It's a fine and very flexible data model; its downfall was the (DOM-like) procedural navigation language. So I've often wondered what one could do trying to re-invent the Codasyl model in a more modern idiom, coupling it with an XPath-like declarative access language extended to handle networks (graphs) rather than hierarchies.

I've no idea how close a reinventiion of Codasyl would be to some of the modern graph data models; it would be interesting to see. The other interesting aspect of this is whether you can make it work for schema-less data.

But I don't think that would be an incremental evolution of XSLT; I think it would be something completely new.

A&V
I was not so radical in my thoughts.

Even C++ API is not so radical, as they do not impose hard requirements on internal graph representation but rather define template API that will work both with third party representations (they even mention Fortran) or several built-in implementations that uses standard vectors.

Their strong point is in algorithms provided as part of library and not graph internal structure (I think authors of that paper have structured it not the best way). E.g. in the second part they list graph algorithms: Depth First Search (DFS); Breadth First Search (BFS); Topological Sort (TopoSort); Shortest Paths Algorithms; Dijkstra Algorithms; and so on.

If we shall try to map it to xpath world them graph on API level might be represented as a user function or as a map of user functions.

On a storage level user may implement graph using a sequence of maps or map of maps, or even using xdm elements.

So, my approach is evolutional. In fact I suggest pure API that could even be implemented now.

Michael Kay
Yes, there's certainly scope for graph-oriented functions such as closure($origin, $function) and is-reachable($origin, $function) and find-path($origin, $destination, $function) where we use the existing data model, treating any item as a node in a graph, and representing the arcs using functions. There are a few complications, e.g. what's the identity comparison between arbitrary items, but it can probably be done.

A&V
> There are a few complications, e.g. what's the identity comparison between arbitrary items, but it can probably be done.

One approach to address this is through definition of graph API. E.g. to define graph as a map (interface analogy) of functions, with equality functions, if required:
map
{
  vertices: function(),
  edges: function(),
  value: function(vertex),
  in-vertex: function(edge),
  out-vertex: function(edge),
  edges: function(vertex),
  is-in-vertex: function(edge, vertex),
  is-out-vertex: function(edge, vertex)
  ...
}

Not sure how far this will go but who knows.

Tuesday, 12 May 2020 06:08:51 UTC

Comments [0] -
Thinking aloud | xslt

Scheduling algorithm for xsl:for-each/@saxon:threads=N

This story started half year ago when Michael Kay, author of Saxon XSLT processor, was dealing with performance in multithreaded environment. See Bug #3958.

The problem is like this.

Given XSLT:

<xsl:stylesheet exclude-result-prefixes="#all" 
  version="3.0" 
  xmlns:saxon="http://saxon.sf.net/"
  xmlns:xs="http://www.w3.org/2001/XMLSchema" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="text" />

  <xsl:template name="main">
    <xsl:for-each saxon:threads="4" select="1 to 10">
      <xsl:choose>
        <xsl:when test=". eq 1">
          <!-- Will take 10 seconds -->
          <xsl:sequence select="
            json-doc('https://httpbin.org/delay/10')?url"/>
        </xsl:when>
        <xsl:when test=". eq 5">
          <!-- Will take 9 seconds -->
          <xsl:sequence select="
            json-doc('https://httpbin.org/delay/9')?url"/>
        </xsl:when>
        <xsl:when test=". eq 10">
          <!-- Will take 8 seconds -->
          <xsl:sequence select="
            json-doc('https://httpbin.org/delay/8')?url"/>
        </xsl:when>
      </xsl:choose>
    </xsl:for-each>
    <xsl:text>
</xsl:text>
  </xsl:template>
</xsl:stylesheet>

Implement engine to achieve best performance of parallel for-each.

Naive implementation that will distribute iterations per threads will run into unfair load on threads, so some load-balancing is required. That was the case Saxon EE.

Michael Kay has been trying to find most elegant way for the implementation and has written the comment:

I can't help feeling that the answer to this must lie in using the Streams machinery, and Spliterators in particular. I've spent another hour or so reading all about Spliterators, and I have to confess I really don't understand the paradigm. If someone can enlighten me, please go ahead...

We have decided to take the challange and to model the expected behavior using Streams. Here is our go:

import java.util.stream.IntStream;
import java.util.stream.Stream;
import java.util.function.Consumer;
import java.util.function.Function;

public class Streams
{
  public static class Item<T>
  {
    public Item(int index, T data)
    {
      this.index = index;
      this.data = data;
    }
    
    int index;
    T data;
  }

  public static void main(String[] args)
  {
    run(
      "Sequential",
      input(),
      Streams::action,
      Streams::output,
      true);
    
    run(
      "Parallel ordered", 
      input().parallel(),
      Streams::action,
      Streams::output,
      true);
    
    run(
      "Parallel unordered", 
      input().parallel(),
      Streams::action,
      Streams::output,
      false);    
  }
  
  private static void run(
    String description,
    Stream<Item<String>> input,
    Function<Item<String>, String[]> action,
    Consumer<String[]> output,
    boolean ordered)
  {
    System.out.println(description);
    
    long start = System.currentTimeMillis();
   
    if (ordered)
    {
      input.map(action).forEachOrdered(output);
    }
    else
    {
      input.map(action).forEach(output);
    }
    
    long end = System.currentTimeMillis();
    
    System.out.println("Execution time: " + (end - start) + "ms.");
    System.out.println();
  }
  
  private static Stream<Item<String>> input()
  {
    return IntStream.range(0, 10).
      mapToObj(i -> new Item<String>(i + 1, "Data " + (i + 1)));
  }
  
  private static String[] action(Item<String> item)
  {
    switch(item.index)
    {
      case 1:
      {
        sleep(10);
        
        break;
      }
      case 5:
      {
        sleep(9);
        
        break;
      }
      case 10:
      {
        sleep(8);
        
        break;
      }
      default:
      {
        sleep(1);
        
        break;
      }
    }
    
    String[] result = { "data:", item.data, "index:", item.index + "" };
    
    return result;
  }
  
  private synchronized static void output(String[] value)
  {
    boolean first = true;
    
    for(String item: value)
    {
      if (first)
      {
        first = false;
      }
      else
      {
        System.out.print(' ');
      }
    
      System.out.print(item);
    }

    System.out.println();
  }
  
  private static void sleep(int seconds)
  {
    try
    {
      Thread.sleep(seconds * 1000);
    }
    catch(InterruptedException e)
    {
      throw new IllegalStateException(e);
    }
  }
}

We model three cases:

"Sequential"

slowest, single threaded execution with output:

data: Data 1 index: 1
data: Data 2 index: 2
data: Data 3 index: 3
data: Data 4 index: 4
data: Data 5 index: 5
data: Data 6 index: 6
data: Data 7 index: 7
data: Data 8 index: 8
data: Data 9 index: 9
data: Data 10 index: 10
Execution time: 34009ms.

"Parallel ordered"

fast, multithread execution preserving order, with output:

data: Data 1 index: 1
data: Data 2 index: 2
data: Data 3 index: 3
data: Data 4 index: 4
data: Data 5 index: 5
data: Data 6 index: 6
data: Data 7 index: 7
data: Data 8 index: 8
data: Data 9 index: 9
data: Data 10 index: 10
Execution time: 10019ms.

"Parallel unordered"

fastest, multithread execution not preserving order, with output:

data: Data 6 index: 6
data: Data 2 index: 2
data: Data 4 index: 4
data: Data 3 index: 3
data: Data 9 index: 9
data: Data 7 index: 7
data: Data 8 index: 8
data: Data 5 index: 5
data: Data 10 index: 10
data: Data 1 index: 1
Execution time: 10001ms.

What we can add in conclusion is that xslt engine could try automatically decide what approach to use, as many SQL engines are doing, and not to force developer to go into low level engine details.

Sunday, 24 March 2019 07:52:02 UTC

Comments [0] -
Java | Thinking aloud | xslt

XPath through evolution

Recently we observed how we solved the same task in different versions of XPath: 2.0, 3.0, and 3.1.

Consider, you have a sequence $items, and you want to call some function over each item of the sequence, and to return combined result.

In XPath 2.0 this was solved like this:

for $item in $items return
  f:func($item)

In XPath 3.0 this was solved like this:

$items!f:func(.)

And now with XPath 3.1 that defined an arrow operator => we attempted to write something as simple as:

$items=>f:func()

That is definitely not working, as it is the same as f:func($items).

Next attempt was:

$items!=>f:func()

That even does not compile.

So, finally, working expression using => looks like this:

$items!(.=>f:func())

This looks like a step back comparing to XPath 3.0 variant.

More than that, XPath grammar of arrow operator forbids the use of predictes, axis or mapping operators, so this won't compile:

$items!(.=>f:func()[1])

$items!(.=>f:func()!something)

Our conclusion is that arrow operator is rather confusing addition to XPath.

Saturday, 03 November 2018 20:59:28 UTC

Comments [0] -
Thinking aloud | xslt

Xslt Streamability

Xslt 3.0 defines a feature called streamability: a technique to write xslt code that is able to handle arbitrary sized inputs.

This contrasts with conventional xslt code (and xslt engines) where inputs are completely loaded in memory.

To make code streamable a developer should declare her code as such, and the code should pass Streamability analysis.

The goal is to define subset of xslt/xpath operations that allow to process input in one pass.

In simple case it's indeed a simple task to verify that code is streamable, but the more complex your code is the less trivial it's to witness it is streamable.

On the forums we have seen a lot of discussions, where experts were trying to figure out whether particular xslt is streamable. At times it's remarkably untrivial task!

This, in our opinion, clearly manifests that the feature is largerly failed attempt to inscribe some optimization technique into xslt spec.

The place of such optimization is in the implementation space, and not in spec. Engine had to attempt such optimization and fallback to the traditional implementation.

The last such example is: Getting SXST0060 "No streamable path found in expression" when trying to push a map with grounded nodes to a template of a streamable mode, where both xslt code and engine developers are not sure that the code is streamable in the first place.

By the way, besides streamability there is other optimization technique that works probably in all SQL engines. When data does not fit into memory engine may spill it on disk. Thus trading memory pressure for disk access. So, why didn't such techninque find the way into the Xslt or SQL specs?

Tuesday, 02 October 2018 12:50:22 UTC

Comments [0] -
Thinking aloud | xslt

Saxon 9.9.0-1 is out

Saxon 9.9.0-1 is out!

Shortly we have reported our first bug in the new version. See https://saxonica.plan.io/issues/3923.

Friday, 28 September 2018 17:47:37 UTC

Comments [0] -
xslt

Fail with ancestor-or-self

After 17 years of experience we still run into dummy bugs in xslt (xpath in fact).

The latest one is related to order of nodes produced by ancestor-or-self axis.

Consider the code:

<xsl:stylesheet version="3.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xsl:template match="/">
    <xsl:variable name="data" as="element()">
      <a>
        <b>
          <c/>
        </b>
      </a>
    </xsl:variable>

    <xsl:variable name="item" as="element()" select="($data//c)[1]"/>

    <xsl:message select="$item!ancestor-or-self::*!local-name()"/>
    <xsl:message select="$item!local-name(), $item!..!local-name(), $item!..!..!local-name()"/>
  </xsl:template>

</xsl:stylesheet>

We expected to have the following outcome

c b a
c b a

But correct one is

a b c
c b a

Here is why:

ancestor-or-self::* is an AxisStep. From XPath §3.3.2:

[Definition: An axis step returns a sequence of nodes that are reachable from the context node via a specified axis. Such a step has two parts: an axis, which defines the "direction of movement" for the step, and a node test, which selects nodes based on their kind, name, and/or type annotation.] If the context item is a node, an axis step returns a sequence of zero or more nodes; otherwise, a type error is raised [err:XPTY0020]. The resulting node sequence is returned in document order.

For some reason we were thinking that reverse axis produces result in reverse order. It turns out the reverse order is only within predicate of such axis.

See more at https://saxonica.plan.io/boards/3/topics/7312

Thursday, 27 September 2018 05:52:58 UTC

Comments [0] -
xslt

|| in boolean expression in XPath

XPath 3 has introduced a syntactic sugar for a string concatenation, so following:

concat($a, $b)

can be now written as:

$a || $b

This is nice addition, except when you run into a trouble. Being rooted in C world we unintentionally have written a following xslt code:

<xsl:if test="$a || $b">
...
</xsl:if>

Clearly, we intended to write $a or $b. In contrast $a || $b is evaluated as concat($a, $b). If both variables are false() we get 'falsefalse' outcome, which has effective boolean value true(). This means that test condition of xsl:if is always true().

What can be done to avoid such unfortunate typo, which is manifested in no way neither during compilation nor during runtime?

The answer is to issue informational message during the compilation, e.g. if result of || operator is converted to a boolean, and if its arguments are booleans also then chances are high this is typo, and not intentional expression.

We adviced to implement such message in the saxon processor (see https://saxonica.plan.io/boards/3/topics/7305).

Tuesday, 18 September 2018 12:23:08 UTC

Comments [0] -
xslt

Bug with regex in Saxon xslt

It seems we've found discrepancy in regex implementation during the transformation in Saxon. Consider the following xslt:

<xsl:stylesheet version="3.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xsl:template match="/">
    <xsl:variable name="text" as="xs:string" 
      select="'A = &quot;a&quot; OR B = &quot;b&quot;'"/>

    <xsl:analyze-string regex="&quot;(\\&quot;|.)*?&quot;" select="$text">
      <xsl:matching-substring>
        <xsl:message>
          <xsl:sequence select="regex-group(0)"/>
        </xsl:message>
      </xsl:matching-substring>
    </xsl:analyze-string>
  </xsl:template>

</xsl:stylesheet>

vs javascript

<html>
<body>
  <script>
    var text = 'A = "a" OR B = "b"';
    var regex = /"(\\"|.)*?"/;
    var match = text.match(regex);

    alert(match[0]);
  </script>
</body>
</html>

xslt produces: "a" OR B = "b"

while javascript: "a"

What is interesting is that we're certain this was working correctly in Saxon several years ago.

You can track progress of the bug at: https://saxonica.plan.io/boards/3/topics/7300 and at https://saxonica.plan.io/issues/3902.

Thursday, 13 September 2018 06:29:05 UTC

Comments [0] -
xslt

Saxon 9.8.0-2 second attempt.

We've found that there is a Saxon HE update that was going to fix problems we mentioned in the previous post, and decided to give it a second chance.

Now Saxon fails with two other errors:

We shall be waiting for the fixes. Mean time we're back to version 9.7.

Tuesday, 27 June 2017 22:30:28 UTC

Comments [0] -
Announce | xslt

Saxon 9.8 is out

Finally, Saxon 9.8 is out!
This means that basic xslt 3 is available in the HE version.

Update: as usually, each new release has new bugs...
See https://saxonica.plan.io/boards/3/topics/6809

Wednesday, 14 June 2017 21:05:51 UTC

Comments [0] -
xslt

Saxon HE map and array types.

We have found that Saxon HE 9.7.0-18 has finally exposed partial support to map and array item types. So, now you can encapsulate your data in sequence rather than having a single sequence and treating odd and even elements specially.

Basic example is:

<xsl:stylesheet version="3.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="t"
  xmlns:map="http://www.w3.org/2005/xpath-functions/map"
  exclude-result-prefixes="xs t map">

  <xsl:template match="/">
    <xsl:variable name="map" as="map(xs:string, xs:string)" select="
      map 
      {
        'Su': 'Sunday',
        'Mo': 'Monday',
        'Tu': 'Tuesday',
        'We': 'Wednesday',
        'Th': 'Thursday',
        'Fr': 'Friday',
        'Sa': 'Saturday'
      }"/>
      
     <xsl:message select="map:keys($map)"/>
  </xsl:template>  

</xsl:stylesheet>

A list of map functions can be found here http://www.w3.org/2005/xpath-functions/map/, though not all are available, as Saxon HE still does not allow inline functions.

P.S. From the development perspective it's a great harm that Saxon HE is so limited. Basically limited to xslt 2.0 + some selected parts of 3.0.

Tuesday, 16 May 2017 06:20:48 UTC

Comments [0] -
Thinking aloud | xslt

View on tunnel parameters in XSLT

Lately we do not program in XSLT too often but rather in java, C#, SQL and javascript, but from time to time we have tasks in XSLT.

People claim that those languages are too different and use this argument to explain why XSLT is only a niche language. We, on the other hand, often spot similarities between them.

So, what it is in other languages that is implemented as tunnel parameters in XSLT?

To get an answer we reiterated how they work in XSLT, so, you:

define a template with parameters marked as tunnel="yes";
use these parameters the same way as regular parameters;
pass template parameters down to other templates marking them as tunnel="yes";

The important difference of regular template parameters from tunnel parameters is that the tunnel parameters are implicitly passed down the call chain of templates. This means that you:

define your API that is expected to receive some parameter;
pass these parameters somewhere high in the stack, or override them later in the stack chain;
do not bother to propagate them (you might not even know all of the tunnel parameters passed, so encapsulation is in action);

As a result we have a template with some parameters passed explicitly, and some others are receiving values from somewhere, usually not from direct caller. It’s possible to say that these tunnel parameters are injected into a template call. This resembles a lot injection API in other languages where you configure that some parameters are prepared for you by some container rather then by direct caller.

Now, when we have expressed this idea it seems so obvious but before we thought of this we did not realize that tunnel parameters in XSLT and Dependency Injection in other languages are the same thing.

Sunday, 26 March 2017 04:21:36 UTC

Comments [0] -
Thinking aloud | xslt

Languages XOM update

Recently we have found and fixed a bug in unreachable statement optimization in jxom.

Latest version of stylesheets can be found at github.com languages-xom.

Wednesday, 21 December 2016 22:10:06 UTC

Comments [0] -
xslt

Saxon-HE-9.7.0-5

Good bad and good news.

Good: recently a new version Saxon XSLT processor was published:: 12 May 2016
Saxon 9.7.0.5 maintenance release for Java and .NET.
Bad: we run that release on our code base and found a bug:: See Internal error in Saxon-HE-9.7.0-5
Good: Michael Kay has confirmed the problem and even fixed it:: See Bug #2770
The only missing ingredient is when the patch will be available to the public:: "We tend to do a new maintenance release every 4-6 weeks. Can't commit to firm dates."

Friday, 03 June 2016 21:09:10 UTC

Comments [0] -
xslt

Pull visitor pattern

Visitor pattern is often used to separate operation from object graph it operates with. Here we assume that the reader is familiar with the subject.

The idea is like this:

The operation over object graph is implemented as type called Visitor.
Visitor defines methods for each type of object in the graph, which a called during traversing of the graph.
Traversing over the graph is implemented by a type called Traverser, or by the Visitor or by each object type in the graph.

Implementation should collect, aggregate or perform other actions during visit of objects in the graph, so that at the end of the visit the purpose of operation will be complete.

Such implementation is push-like: you create operation object and call a method that gets object graph on input and returns operation result on output.

In the past we often dealt with big graphs (usually these are virtual graphs backended at database or at a file system).

Also having a strong experience in the XSLT we see that the visitor pattern in OOP is directly mapped into xsl:template and xsl:apply-templates technique.

Another thought was that in XML processing there are two camps:

SAX (push-like) - those who process xml in callbacks, which is very similar to visitor pattern; and
XML Reader (pull-like) - those who pull xml components from a source, and then iterate and process them.

As with SAX vs XML Reader or, more generally, push vs pull processing models, there is no the best one. One or the other is preferable in particular circumstances. E.g. Pull like component fits into a transformation pipeline where one pull component has another as its source; another example is when one needs to process two sources at once, which is untrivial with push like model. On the other hand push processing fits better into Reduce part of MapReduce pattern where you need to accumulate results from source.

So, our idea was to complete classic push-like visitor pattern with an example of pull-like implementation.

For the demostration we have selected Java language, and a simplest boolean expression calculator.

Please follow GitHub nesterovsky-bros/VisitorPattern to see the detailed explanation.

Tuesday, 09 February 2016 12:37:10 UTC

Comments [0] -
Java | Thinking aloud | xslt

Error during transformation in Saxon 9.7 - Continue

Essence of the problem (see Error during transformation in Saxon 9.7, thread on forum):

XPath engine may arbitrary reorder predicates whose expressions do not depend on a context position.
While an XPath expression $N[@x castable as xs:date][xs:date(@x) gt xs:date("2000-01-01")] cannot raise an error if it's evaluated from the left to right, an expression with reordered predicates $N[xs:date(@x) gt xs:date("2000-01-01")][@x castable as xs:date] may generate an error when @x is not a xs:date.

To avoid a potential problem one should rewrite the expression like this: $N[if (@x castable as xs:date) then xs:date(@x) gt xs:date("2000-01-01") else false()].

Please note that the following rewrite will not work: $N[(@x castable as xs:date) and (xs:date(@x) gt xs:date("2000-01-01"))], as arguments of and expression can be evaluated in any order, and error that occurs during evaluation of any argument may be propageted.

With these facts we faced a task to check our code base and to fix possible problems.

A search has brought ~450 instances of XPath expessions that use two or more consequtive predicates. Accurate analysis limited this to ~20 instances that should be rewritten. But then, all of sudden, we have decided to commit an experiment. What if we split XPath expression in two sub expressions. Can error still resurface?

Consider:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xsl:variable name="elements" as="element()+"><a/><b value="c"/></xsl:variable> <xsl:template match="/"> <xsl:variable name="a" as="element()*" select="$elements[self::d or self::e]"/> <xsl:variable name="b" as="element()*" select="$a[xs:integer(@value) = 1]"/> <xsl:sequence select="$b"/> </xsl:template> </xsl:stylesheet>

As we expected Saxon 9.7 internally assembles a final XPath with two predicates and reorders them. As result we get an error:

Error at char 20 in xsl:variable/@select on line 8 column 81 of Saxon9.7-filter_speculation.xslt: FORG0001: Cannot convert string "c" to an integer

This turn of events greately complicates the code review we have to commit.

Michiel Kay's answer to this example:

I think your argument that the reordering is inappropriate when the expression is written using variables is very powerful. I shall raise the question with my WG colleagues.

In fact we think that either: reordering of predicates is inappropriate, or (weaker, to allow reordering) to treat an error during evaluation of predicate expression as false(). This is what is done in XSLT patterns. Other solutions make XPath less intuitive.

In other words we should use XPath (language) to express ideas, and engine should correctly and efficiently implement them. So, we should not be forced to rewrite expression to please implementation.

Monday, 04 January 2016 10:07:12 UTC

Comments [0] -
Thinking aloud | xslt

Error during transformation in Saxon 9.7

On December, 30 we have opened a thread in Saxon help forum that shows a stylesheet generating an error. This is the stylesheet:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xsl:variable name="elements" as="element()+"><a/><b value="c"/></xsl:variable> <xsl:template match="/"> <xsl:sequence select="$elements[self::d or self::e][xs:integer(@value) = 1]"/> </xsl:template> </xsl:stylesheet>

We get an error:

Error at char 47 in xsl:sequence/@select on line 7 column 83 of Saxon9.7-filter_speculation.xslt: FORG0001: Cannot convert string "c" to an integer Exception in thread "main" ; SystemID: .../Saxon9.7-filter_speculation.xslt; Line#: 7; Column#: 47 ValidationException: Cannot convert string "c" to an integer at ...

It's interesting that error happens in Saxon 9.7 but not in earlier versions.

The answer we got was expected but disheartening:

The XPath specification (section 2.3.4, Errors and Optimization) explicitly allows the predicates of a filter expression to be reordered by an optimizer. See this example, which is very similar to yours:
The expression in the following example cannot raise a casting error if it is evaluated exactly as written (i.e., left to right). Since neither predicate depends on the context position, an implementation might choose to reorder the predicates to achieve better performance (for example, by taking advantage of an index). This reordering could cause the expression to raise an error.
$N[@x castable as xs:date][xs:date(@x) gt xs:date("2000-01-01")]

Following the spec, Michael Kay advices us to rewrite XPath:

$elements[self::d or self::e][xs:integer(@value) = 1]

like this:

$elements[if (self::d or self::e) then xs:integer(@value) = 1 else false()]

Such subtleties make it hard to reason about and to teach XPath. We doubt many people will spot the difference immediately.

We think that if such optimization was so much important to spec writers, then they had to change filter rules to treat failed predicates as false(). This would avoid any obscure differences in these two, otherwise equal, expressions. In fact something similar already exists with templates where failed evaluation of pattern is treated as un-match.

Saturday, 02 January 2016 21:32:16 UTC

Comments [0] -
Thinking aloud | xslt

Support C# 6.0 in languages xom

It's time to align csharpxom to the latest version of C#. The article New Language Features in C# 6 sums up what's being added.

Sources can be found at nesterovsky-bros/languages-xom, and C# model is at csharp folder.

In general we feel hostile to any new features until they prove they bring an added value. So, here our list of new features from most to least useless:

String interpolation

var s = $"{p.Name} is {p.Age} year{{s}} old";

This is useless, as it does not account resource localization.
Null-conditional operators

int? first = customers?[0].Orders?.Count();

They claim to reduce cluttering from null checks, but in our opinion it looks opposite. It's better to get NullReferenceException if arguments are wrong.
Exception filters

private static bool Log(Exception e) { /* log it */ ; return false; } … try { … } catch (Exception e) when (Log(e)) {}

"It is also a common and accepted form of “abuse” to use exception filters for side effects; e.g. logging."

Design a feature for abuse just does not tastes good.
Expression-bodied function and property members.

public Point Move(int dx, int dy) => new Point(x + dx, y + dy); public string Name => First + " " + Last;

Not sure it's that usefull.

Monday, 24 August 2015 10:52:07 UTC

Comments [0] -
.NET | Announce | Java | xslt

Internal Evaluation Error in Saxon 9.6

Taking into an account that we use Saxon for many years, it was strange to run into so simple error like the following:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <xsl:variable name="doc" as="element()+"><a/><b/><c/></xsl:variable>   
    <xsl:sequence select="$doc = 3"/>
  </xsl:template>
</xsl:stylesheet>

This is a simplified case that should produce an dynamic error FORG0001 as per General Comparisions; the real code is more complex, as it uses SFINAE and continues.

This case crushes in Saxon with exception:

Exception in thread "main" java.lang.RuntimeException: 
      Internal error evaluating template  at line 3 in module ICE9.6.xslt
    at net.sf.saxon.expr.instruct.Template.applyLeavingTail()
    at net.sf.saxon.trans.Mode.applyTemplates()
    at net.sf.saxon.Controller.transformDocument()
    at net.sf.saxon.Controller.transform()
    at net.sf.saxon.s9api.XsltTransformer.transform()
    at net.sf.saxon.jaxp.TransformerImpl.transform()
    ...
Caused by: java.lang.NumberFormatException: For input string: "" 
    at java.lang.NumberFormatException.forInputString()
    at java.lang.Long.parseLong()
    at java.lang.Long.parseLong()
    at net.sf.saxon.expr.GeneralComparison.quickCompare()
    at net.sf.saxon.expr.GeneralComparison.compare()
    at net.sf.saxon.expr.GeneralComparison.evaluateManyToOne()
    at net.sf.saxon.expr.GeneralComparison.evaluateItem()
    at net.sf.saxon.expr.GeneralComparison.evaluateItem()
    at net.sf.saxon.expr.Expression.process()
    at net.sf.saxon.expr.instruct.Template.applyLeavingTail()
    ... 8 more

We have reported the problem at Saxon's forum, and as usual the problem was shortly resolved.

Friday, 31 July 2015 08:36:39 UTC

Comments [0] -
xslt

Java 8 Xml Object Model

After ECMAScript Xml Object Model we aligned JXOM to support Java 8. This includes support of:

Lambda Expressions - http://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html
Method References - http://docs.oracle.com/javase/tutorial/java/javaOO/methodreferences.html
Default Methods - http://docs.oracle.com/javase/tutorial/java/IandI/defaultmethods.html

As with ECMAScript, all sources are available at https://github.com/nesterovsky-bros/languages-xom

Thursday, 09 April 2015 19:46:22 UTC

Comments [0] -
Announce | Java | xslt

ECMAScript Xml Object Model

Much time has passed since we fixed or extended Languages Xml Object Model. But now we needed to manipulate with and generate javascript programs.

Though xslt today is not a language of choice but rather niche language, it still fits very well to tasks of code generation and transformation.

So, we're pleased to announce ECMAScript Xml Object Model, which includes:

xml schema ecmasript.xsd, which is based on ECMAScript 6^th Edition / Draft February 20, 2015;
a set of xslt to generate the text code, and perform some basic transformations over the tree;
a set of smoke tests to verify the generation.

All sources are available at github: https://github.com/nesterovsky-bros/languages-xom

Monday, 06 April 2015 12:17:04 UTC

Comments [0] -
Announce | javascript | xslt

Saxon 9.6 HE -xslt 3.0

After investigation we have found that Saxon 9.6 HE does not support xslt 3.0 as we assumed earlier.

On Saxonica site it's written: "Support for XQuery 3.0 and XPath 3.0 (now Recommendations) has been added to the open-source product."

As one can notice no xslt is mentioned.

More details are on open-source 3.0 support.

:-(

Monday, 06 October 2014 10:16:03 UTC

Comments [0] -
xslt

Error in SaxonHE9-6-0-1J

The new release of Saxon HE (version 9.6) claims basic support of xslt 3.0. So we're eager to test it but... errors happen. See error report at Error in SaxonHE9-6-0-1J and Bug #2160.

As with previous release Exception during execution in Saxon-HE-9.5.1-6 we bumped into engine's internal error.

We expect to see an update very soon, and to continue with long waited xslt 3.0.

Here is an argument to the discussion of open source vs commercial projects: open source projects with rich community may benefit, as problems are detected promptly; while commercial projects risk to live with more unnoticed bugs.

Sunday, 05 October 2014 08:10:14 UTC

Comments [0] -
xslt

Saxon 9.6

With Saxon 9.6 we can finally play with open source xslt 3.0

It's sad that it took so much time to make it available.

See Saxonica's home page to get details.

Friday, 03 October 2014 10:37:11 UTC

Comments [0] -
xslt

Exception during execution in Saxon-HE-9.5.1-6

These days we're not active xslt developers, though we still consider xslt and xquery are important part of our personal experience and self education.

Besides, we have a pretty large xslt code base that is and will be in use. We think xslt/xquery is in use in many other big and small projects thus they have a strong position as a niche languages.

Thus we think it's important to help to those who support xslt/xquery engines. That's what we're regularly doing (thanks to our code base).

Now, to the problem we just have found. Please consider the code:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:t="this" exclude-result-prefixes="xs t"> <xsl:template match="/"> <xsl:param name="new-line-text" as="xs:string" select="' '"/> <xsl:variable name="items" as="item()*" select="'select', $new-line-text"/> <xsl:message select="t:string-join($items)"/> </xsl:template>  <xsl:function name="t:string-join" as="item()*"> <xsl:param name="items" as="item()*"/> <xsl:variable name="indices" as="xs:integer*" select=" 0, index-of ( ( for $item in $items return $item instance of xs:string ), false() ), count($items) + 1"/> <xsl:sequence select=" for $i in 1 to count($indices) - 1 return ( $items[$indices[$i]], string-join ( subsequence ( $items, $indices[$i] + 1, $indices[$i + 1] - $indices[$i] - 1 ), '' ) )"/> </xsl:function> </xsl:stylesheet>

The output is:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1 at net.sf.saxon.om.Chain.itemAt(Chain.java:161) at net.sf.saxon.om.SequenceTool.itemAt(SequenceTool.java:130) at net.sf.saxon.expr.FilterExpression.iterate(FilterExpression.java:1143) at net.sf.saxon.expr.LetExpression.iterate(LetExpression.java:365) at net.sf.saxon.expr.instruct.BlockIterator.next(BlockIterator.java:49) at net.sf.saxon.expr.MappingIterator.next(MappingIterator.java:70) at net.sf.saxon.expr.instruct.Message.processLeavingTail(Message.java:264) at net.sf.saxon.expr.instruct.Block.processLeavingTail(Block.java:660) at net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.java:239) at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1057) at net.sf.saxon.Controller.transformDocument(Controller.java:2088) at net.sf.saxon.Controller.transform(Controller.java:1911) ...

The problem is reported at: Exception during execution in Saxon-HE-9.5.1-6 and also tracked at https://saxonica.plan.io/issues/2104.

Update: according to Michael Kay the issue is fixed. See note #4:

I have committed a patch to Chain.itemAt() on the 9.5 and 9.6 branches to check for index<0

Monday, 14 July 2014 08:08:01 UTC

Comments [0] -
xslt

New features in XPath 3.1

Among proposed new features (other than Maps and Arrays) in XPath 3.1 we like Arrow operator (=>).

It's defined like this:

[Definition: An arrow operator is a postfix operator that applies a function to an item, using the item as the first argument to the function.] If $i is an item and f() is a function, then $i=>f() is equivalent to f($i), and $i=>f($j) is equivalent to f($i, $j).

This syntax is particularly helpful when conventional function call syntax is unreadable, e.g. when applying multiple functions to an item. For instance, the following expression is difficult to read due to the nesting of parentheses, and invites syntax errors due to unbalanced parentheses:

tokenize((normalize-unicode(upper-case($string))),"\s+")

Many people consider the following expression easier to read, and it is much easier to see that the parentheses are balanced:

$string=>upper-case()=>normalize-unicode()=>tokenize("\s+")

What it looks like?

Right! It's like extension functions in C#.

Monday, 28 April 2014 06:20:27 UTC

Comments [0] -
xslt

Derived xml schemas

Awhile ago we have created a set of xml schemas and xslt to represent different languages as xml, and to generate source from those xmls. This way we know to represent and generate: java, c#, cobol, and several sql dialects (read about languages xom on this site).

Here, we'd like to expose a nuisance we had with sql dialects schema.

Our goal was to define a basic sql schema, and dialect extensions. This way we assumed to express general and dialect specific constructs. So, lets consider an example.

General:

-- Select one row select * from A

DB2:

select * from A fetch first row only

T-SQL:

select top 1 * from A

Oracle:

select * from A where rownum = 1

All these queries have common core syntax, while at the same time have dialect specific means to express intention to return first row only.

Down to the xml schema basic select statement looks like this:

<xs:complexType name="select-statement"> <xs:complexContent> <xs:extension base="full-select-statement"> <xs:sequence> <xs:element name="columns" type="columns-clause"> <xs:element name="from" type="from-clause" minOccurs="0"> <xs:element name="where" type="unary-expression" minOccurs="0"/> <xs:element name="group-by" type="expression-list" minOccurs="0"/> <xs:element name="having" type="unary-expression" minOccurs="0"/> <xs:element name="order-by" type="order-by-clause" minOccurs="0"/> </xs:sequence> <xs:attribute name="specification" type="query-specification" use="optional" default="all"/> </xs:extension> </xs:complexContent> </xs:complexType>

Here all is relatively clear. The generic select looks like:

<sql:select> <sql:columns> <sql:column wildcard="true"/> </sql:columns> <sql:from> <sql:table name="A"/> </sql:from> </sql:select>

But how would you define dialect specifics?

E.g. for T-SQL we would like to see a markup:

<sql:select> <tsql:top> <sql:number value="1"/> </tsql:top> <sql:columns> <sql:column wildcard="true"/> </sql:columns> <sql:from> <sql:table name="A"/> </sql:from> </sql:select>

While for DB2 there should be:

<sql:select> <sql:columns> <sql:column wildcard="true"/> </sql:columns> <sql:from> <sql:table name="A"/> </sql:from> <db2:fetch-first rows="1"/> </sql:select>

So, again the quesions are:

how to define basic sql schema with goal to extend it in direction of DB2 or T-SQL?
how to define an xslt sql serializer that will be also extendable?

Though we have tried several solutions to that problem, none is satisfactory enough. To allow extensions we have defined that all elements in sql schema are based on sql-element, which allows extensions:

<xs:complexType name="sql-element" abstract="true"> <xs:sequence> <xs:element ref="extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> <xs:element name="extension" type="extension"/> <xs:complexType name="extension" abstract="true"> <xs:complexContent> <xs:extension base="sql-element"/> </xs:complexContent> </xs:complexType> ... <xs:element name="top" type="top-extension" substitutionGroup="sql:extension"/> <xs:complexType name="top-extension"> <xs:complexContent> <xs:extension base="sql:extension"> <xs:sequence> <xs:element ref="sql:expression"/> </xs:sequence> <xs:attribute name="percent" type="xs:boolean" use="optional" default="false"/> </xs:extension> </xs:complexContent> </xs:complexType>

Unfortunately, this creates too weak typed schema for extensions, thus intellisence suggests too many options.

Wednesday, 03 July 2013 05:50:43 UTC

Comments [0] -
Thinking aloud | xslt

Export data to Excel

If you deal with web applications you probably have already dealt with export data to Excel. There are several options to prepare data for Excel:

generate CSV;
generate HTML that excel understands;
generate XML in Spreadsheet 2003 format;
generate data using Open XML SDK or some other 3rd party libraries;
generate data in XLSX format, according to Open XML specification.

You may find a good article with pros and cons of each solution here. We, in our turn, would like to share our experience in this field. Let's start from requirements:

Often we have to export huge data-sets.
We should be able to format, parametrize and to apply different styles to the exported data.
There are cases when exported data may contain more than one table per sheet or even more than one sheet.
Some exported data have to be illustrated with charts.

All these requirements led us to a solution based on XSLT processing of streamed data. The advantage of this solution is that the result is immediately forwarded to a client as fast as XSLT starts to generate output. Such approach is much productive than generating of XLSX using of Open XML SDK or any other third party library, since it avoids keeping a huge data-sets in memory on the server side.

Another advantage - is simple maintenance, as we achieve clear separation of data and presentation layers. On each request to change formatting or apply another style to a cell you just have to modify xslt file(s) that generate variable parts of XLSX.

As result, our clients get XLSX files according with Open XML specifications. The details of implementations of our solution see in our next posts.

Monday, 29 October 2012 15:34:38 UTC

Comments [0] -
.NET | ASP.NET | Thinking aloud | xslt

Stream xslt transformation through WCF

Earlier we have shown how to build streaming xml reader from business data and have reminded about ForwardXPathNavigator which helps to create a streaming xslt transformation. Now we want to show how to stream content produced with xslt out of WCF service.

To achieve streaming in WCF one needs:

1. To configure service to use streaming. Description on how to do this can be found in the internet. See web.config of the sample Streaming.zip for the details.

2. Create a service with a method returning Stream:

[ServiceContract(Namespace = "http://www.nesterovsky-bros.com")] [AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)] public class Service { [OperationContract] [WebGet(RequestFormat = WebMessageFormat.Json)] public Stream GetPeopleHtml(int count, int seed) { ... } }

2. Return a Stream from xsl transformation.

Unfortunately (we mentioned it already), XslCompiledTransform generates its output into XmlWriter (or into output Stream) rather than exposes result as XmlReader, while WCF gets input stream and passes it to a client.

We could generate xslt output into a file or a memory Stream and then return that content as input Stream, but this will defeat a goal of streaming, as client would have started to get data no earlier that the xslt completed its work. What we need instead is a pipe that form xslt output Stream to an input Stream returned from WCF.

.NET implements pipe streams, so our task is trivial. We have defined a utility method that creates an input Stream from a generator populating an output Stream:

public static Stream GetPipedStream(Action<Stream> generator) { var output = new AnonymousPipeServerStream(); var input = new AnonymousPipeClientStream( output.GetClientHandleAsString()); Task.Factory.StartNew( () => { using(output) { generator(output); output.WaitForPipeDrain(); } }, TaskCreationOptions.LongRunning); return input; }

We wrapped xsl transformation as such a generator:

[OperationContract] [WebGet(RequestFormat = WebMessageFormat.Json)] public Stream GetPeopleHtml(int count, int seed) { var context = WebOperationContext.Current; context.OutgoingResponse.ContentType = "text/html"; context.OutgoingResponse.Headers["Content-Disposition"] = "attachment;filename=reports.html"; var cache = HttpRuntime.Cache; var path = HttpContext.Current.Server.MapPath("~/People.xslt"); var transform = cache[path] as XslCompiledTransform; if (transform == null) { transform = new XslCompiledTransform(); transform.Load(path); cache.Insert(path, transform, new CacheDependency(path)); } return Extensions.GetPipedStream( output => { // We have a streamed business data. var people = Data.CreateRandomData(count, seed, 0, count); // We want to see it as streamed xml data. using(var stream = people.ToXmlStream("people", "http://www.nesterovsky-bros.com")) using(var reader = XmlReader.Create(stream)) { // XPath forward navigator is used as an input source. transform.Transform( new ForwardXPathNavigator(reader), new XsltArgumentList(), output); } }); }

This way we have build a code that streams data directly from business data to a client in a form of report. A set of utility functions and classes helped us to overcome .NET's limitations and to build simple code that one can easily support.

The sources can be found at Streaming.zip.

Friday, 03 August 2012 22:32:49 UTC

Comments [0] -
.NET | ASP.NET | Thinking aloud | Tips and tricks | xslt

Streaming xslt transformation with ForwardXPathNavigator

In the previous post about streaming we have dropped at the point where we have XmlReader in hands, which continously gets data from IEnumerable<Person> source. Now we shall remind about ForwardXPathNavigator - a class we have built back in 2002, which adds streaming transformations to .NET's xslt processor.

While XslCompiledTransform is desperately obsolete, and no upgrade will possibly follow; still it's among the fastest xslt 1.0 processors. With ForwardXPathNavigator we add ability to transform input data of arbitrary size to this processor.

We find it interesting that xslt 3.0 Working Draft defines streaming processing in a way that closely matches rules for ForwardXPathNavigator:

Streaming achieves two important objectives: it allows large documents to be transformed without requiring correspondingly large amounts of memory; and it allows the processor to start producing output before it has finished receiving its input, thus reducing latency.

The rules for streamability, which are defined in detail in 19.3 Streamability Analysis, impose two main constraints:

The only nodes reachable from the node that is currently being processed are its attributes and namespaces, its ancestors and their attributes and namespaces, and its descendants and their attributes and namespaces. The siblings of the node, and the siblings of its ancestors, are not reachable in the tree, and any attempt to use their values is a static error. However, constructs (for example, simple forms of xsl:number, and simple positional patterns) that require knowledge of the number of preceding elements by name are permitted.
When processing a given node in the tree, each descendant node can only be visited once. Essentially this allows two styles of processing: either visit each of the children once, and then process that child with the same restrictions applied; or process all the descendants in a single pass, in which case it is not possible while processing a descendant to make any further downward selection.

The only significant difference between ForwardXPathNavigator and xlst 3.0 streaming is in that we reported violations of rules for streamability at runtime, while xslt 3.0 attempts to perform this analysis at compile time.

Here the C# code for the xslt streamed transformation:

var transform = new XslCompiledTransform(); transform.Load("People.xslt"); // We have a streamed business data. var people = Data.CreateRandomData(10000, 0, 0, 10000); // We want to see it as streamed xml data. using(var stream = people.ToXmlStream("people", "http://www.nesterovsky-bros.com")) using(var reader = XmlReader.Create(stream)) using(var output = File.Create("people.html")) { // XPath forward navigator is used as an input source. transform.Transform( new ForwardXPathNavigator(reader), new XsltArgumentList(), output); }

Notice how XmlReader is wrapped into ForwardXPathNavigator.

To complete the picture we need xslt that follows the streaming rules:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" xmlns:d="http://www.nesterovsky-bros.com" exclude-result-prefixes="msxsl d"> <xsl:output method="html" indent="yes"/>  <xsl:template match="/d:people"> <html> <head> <title>List of persons</title> <style type="text/css"> .even { } .odd { background: #d0d0d0; } </style> </head> <body> <table border="1"> <tr> <th>ID</th> <th>First name</th> <th>Last name</th> <th>City</th> <th>Title</th> <th>Age</th> </tr> <xsl:for-each select="d:person">  <xsl:variable name="person"> <xsl:copy-of select="."/> </xsl:variable> <xsl:variable name="position" select="position()"/> <xsl:apply-templates mode="snapshot" select="msxsl:node-set($person)/d:person"> <xsl:with-param name="position" select="$position"/> </xsl:apply-templates> </xsl:for-each> </table> </body> </html> </xsl:template> <xsl:template mode="snapshot" match="d:person"> <xsl:param name="position"/> <tr> <xsl:attribute name="class"> <xsl:choose> <xsl:when test="$position mod 2 = 1"> <xsl:text>odd</xsl:text> </xsl:when> <xsl:otherwise> <xsl:text>even</xsl:text> </xsl:otherwise> </xsl:choose> </xsl:attribute> <td> <xsl:value-of select="d:Id"/> </td> <td> <xsl:value-of select="d:FirstName"/> </td> <td> <xsl:value-of select="d:LastName"/> </td> <td> <xsl:value-of select="d:City"/> </td> <td> <xsl:value-of select="d:Title"/> </td> <td> <xsl:value-of select="d:Age"/> </td> </tr> </xsl:template> </xsl:stylesheet>

So, we have started with a streamed entity data, proceeded to the streamed XmlReader and reached to the streamed xslt transformation.

But at the final post about streaming we shall remind a simple way of building WCF service returning html stream from our xslt transformation.

The sources can be found at Streaming.zip.

Thursday, 26 July 2012 18:49:51 UTC

Comments [0] -
.NET | Thinking aloud | Tips and tricks | xslt

Streaming entity data

For some reason neither .NET's XmlSerializer nor DataContractSerializer allow reading data through an XmlReader. These APIs work other way round writing data into an XmlWriter. To get data through XmlReader one needs to write it to some destination like a file or memory stream, and then to read it using XmlReader. This complicates streaming design considerably.

In fact the very same happens with other .NET APIs.

We think the reason of why .NET designers preferred XmlWriter to XmlReader in those APIs is that XmlReader's implementation is a state machine like, while XmlWriter's implementation looks like a regular procedure. It's much harder to manually write and to support a correct state machine logic than a procedure.

If history would have gone slightly different way, and if yield return, lambda, and Enumerator API appeared before XmlReader, and XmlWriter then, we think, both these classes looked differently. Xml source would have been described with a IEnumerable<XmlEvent> instead of XmlReader, and XmlWriter must be looked like a function receiving IEnumerable<XmlEvent>. Implementing XmlReader would have meant a creating a enumerator. Yield return and Enumerable API would have helped to implement it in a procedural way.

But in our present we have to deal with the fact that DataContractSerializer should write the data into XmlWriter, so let's assume we have a project that uses Entity Framework to access the database, and that you have a data class Person, and data access method GetPeople():

[DataContract(Name = "person", Namespace = "http://www.nesterovsky-bros.com")] public class Person { [DataMember] public int Id { get; set; } [DataMember] public string FirstName { get; set; } [DataMember] public string LastName { get; set; } [DataMember] public string City { get; set; } [DataMember] public string Title { get; set; } [DataMember] public DateTime BirthDate { get; set; } [DataMember] public int Age { get; set; } } public static IEnumerable<Person> GetPeople() { ... }

And your goal is to expose result of GetPeople() as XmlReader. We achieve result with three simple steps:

Define JoinedStream - an input Stream implementation that reads data from a enumeration of streams (IEnumerable<Stream>).
Build xml parts in the form of IEnumerable<Stream>.
Combine parts into final xml stream.

The code is rather simple, so here we qoute its essential part:

public static class Extensions { public static Stream JoinStreams(this IEnumerable<Stream> streams, bool closeStreams = true) { return new JoinedStream(streams, closeStreams); } public static Stream ToXmlStream<T>( this IEnumerable<T> items, string rootName = null, string rootNamespace = null) { return items.ToXmlStreamParts<T>(rootName, rootNamespace). JoinStreams(false); } private static IEnumerable<Stream> ToXmlStreamParts<T>( this IEnumerable<T> items, string rootName = null, string rootNamespace = null) { if (rootName == null) { rootName = "ArrayOfItems"; } if (rootNamespace == null) { rootNamespace = ""; } var serializer = new DataContractSerializer(typeof(T)); var stream = new MemoryStream(); var writer = XmlDictionaryWriter.CreateTextWriter(stream); writer.WriteStartDocument(); writer.WriteStartElement(rootName, rootNamespace); writer.WriteXmlnsAttribute("s", XmlSchema.Namespace); writer.WriteXmlnsAttribute("i", XmlSchema.InstanceNamespace); foreach(var item in items) { serializer.WriteObject(writer, item); writer.WriteString(" "); writer.Flush(); stream.Position = 0; yield return stream; stream.Position = 0; stream.SetLength(0); } writer.WriteEndElement(); writer.WriteEndDocument(); writer.Flush(); stream.Position = 0; yield return stream; } private class JoinedStream: Stream { public JoinedStream(IEnumerable<Stream> streams, bool closeStreams = true) ... } }

The use is even more simple:

// We have a streamed business data. var people = GetPeople(); // We want to see it as streamed xml data. using(var stream = people.ToXmlStream("persons", "http://www.nesterovsky-bros.com")) using(var reader = XmlReader.Create(stream)) { ... }

We have packed the sample into the project Streaming.zip.

In the next post we're going to remind about streaming processing in xslt.

Sunday, 22 July 2012 20:38:29 UTC

Comments [2] -
.NET | Thinking aloud | Tips and tricks | xslt

xslt/xquery

Some time ago we were taking a part in a project where 95% of all sources are xslt 2.0. It was a great experience for us.

The interesting part is that we used xslt in areas we would never expect it in early 2000s. It crunched gigabytes of data in offline, while earlier we generally sought xslt application in a browser or on a server as an engine to render the data.

Web applications (both .NET and java) are in our focus today, and it became hard to find application for xslt or xquery.

Indeed, client side now have a very strong APIs: jquery, jqueryui, jsview, jqgrid, kendoui, and so on. These libraries, and today's browsers cover developer's needs in building managable applications. In contrast, a native support of xslt (at least v2) does not exist in browsers.

Server side at present is seen as a set of web services. These services support both xml and json formats, and implement a business logic only. It would be a torture to try to write such a frontend in xslt/xquery. A server logic itself is often dealing with a diversity of data sources like databases, files (including xml files) and other.

As for a database (we primarily work with SQL Server 2008 R2), we think that all communication should go through stored procedures, which implement all data logic. Clearly, this place is not for xslt. However, those who know sql beyond its basics can confirm that sql is very similar to xquery. More than that SQL Server (and other databases) integrate xquery to work with xml data, and we do use it extensively.

Server logic itself uses API like LINQ to manipulate with different data sources. In fact, we think that one can build a compiler from xquery 3.0 to C# with LINQ. Other way round compiler would be a whole different story.

The net result is that we see little place for xslt and xquery. Well, after all it's only a personal perspective on the subject. The similar type of thing has happened to us with C++. As with xslt/xquery we love C++ very much, and we fond of C++11, but at present we have no place in our current projects for C++. That's pitty.

P.S. Among other things that play against xslt/xquery is that there is a shortage of people who know these languages, thus who can support such projects.

Tuesday, 08 May 2012 20:28:51 UTC

Comments [0] -
Thinking aloud | xslt

Languages XOM update

This time we update csharpxom to adjust it to C# 4.5. Additions are async modifier and await operator.

They are used to simplify asynchronous programming.

The following example from the msdn:

private async Task<byte[]> GetURLContentsAsync(string url) { var content = new MemoryStream(); var request = (HttpWebRequest)WebRequest.Create(url); using(var response = await request.GetResponseAsync()) using(var responseStream = response.GetResponseStream()) { await responseStream.CopyToAsync(content); } return content.ToArray(); }

looks like this in csharpxom:

Friday, 23 March 2012 00:07:35 UTC

Comments [0] -
.NET | Announce | xslt

Saxon 9.4 is out

@michaelhkay Saxon 9.4 is out.

But why author does not state that HE version is still xslt/xpath 2.0, as neither xslt maps, nor function items are supported.

Saturday, 10 December 2011 12:16:28 UTC

Comments [0] -
Thinking aloud | xslt

jQuery

It has happened so, that we have never worked with jQuery, however were aware of it.

In early 2000 we have developed a web application that contained rich javascript APIs, including UI components. Later, we were actively practicing in ASP.NET, and later in JSF.

At present, looking at jQuery more closely we regret that we have failed to start using it earlier.

Separation of business logic and presentation is remarkable when one uses JSON web services. In fact server part can be seen as a set of web services representing a business logic and a set of resources: html, styles, scripts, others. Nor ASP.NET or JSF approach such a consistent separation.

The only trouble, in our opinion, is that jQuery has no standard data binding: a way to bind JSON data to (and from) html controls. The technique that will probably be standardized is called jQuery Templates or JsViews .

Unfortunatelly after reading about this binding API, and being in love with Xslt and XQuery we just want to cry. We don't know what would be the best solution for the task, but what we see looks uncomfortable to us.

Friday, 28 October 2011 22:59:23 UTC

Comments [0] -
ASP.NET | JSF and Facelets | Thinking aloud | Tips and tricks | xslt

An XPath enumerator function

A couple of weeks ago, we have suggested to introduce a enumerator function into the XPath (see [F+O30] A enumerator function):

I would like the WG to consider an addition of a function that turns a sequence into a enumeration of values.

Consider a function like this: fn:enumerator($items as item()*) as function() as item()?;

alternatively, signature could be:

fn:enumerator($items as function() as item()*) as function() as item()?;

This function receives a sequence, and returns a function item, which upon N's call shall return N's element of the original sequence. This way, a sequence of items is turned into a function providing a enumeration of items of the sequence.

As an example consider two functions:

a) t:rand($seed as xs:double) as xs:double* - a function producing a random number sequence;
b) t:work($input as element()) as element() - a function that generates output from it's input, and that needs random numbers in the course of the execution.

t:work() may contain a code like this:
let $rand := fn:enumerator(t:rand($seed)),

and later it can call $rand() to get a random numbers.

Enumerators will help to compose algorithms where one algorithm communicate with other independant algorithms, thus making code simpler. The most obvious class of enumerators are generators: ordered numbers, unique identifiers, random numbers.

Technically, function returned from fn:enumerator() is nondetermenistic, but its "side effect" is similar to a "side effect" of a function generate-id() from a newly created node (see bug #13747, and bug #13494).

The idea is inspired by a generator function, which returns a new value upon each call.

Such function can be seen as a stateful object. But our goal is to look at it in a more functional way. So, we look at the algorithm as a function that produces a sequence of output, which is pure functional; and an enumerator that allows to iterate over algorithm's output.

This way, we see the function that implements an algorithm and the function that uses it can be seen as two thread of functional programs that use messaging to communicate to each other.

Honestly, we doubt that WG will accept it, but it's interesting to watch the discussion.

Thursday, 29 September 2011 11:56:05 UTC

Comments [0] -
Thinking aloud | xslt

Resolution of the Saxon optimizer bug

More than month has passed since we have reported a problem to the saxon forum (see Saxon optimizer bug and Saxon 9.2 generate-id() bug).

The essence of the problem is that we have constructed argumentless function to return a unique identifiers each time function is called. To achieve the effect we have created a temporary node and returned its generate-id() value.

Such a function is nondetermenistic, as we cannot state that its result depends on arguments only. This means that engine's optimizer is not free to reorder calls to such a function. That's what happens in Saxon 9.2, and Saxon 9.3 where engine elevates function call out of cycle thus producing invalid results.

Michael Kay, the author of the Saxon engine, argued that this is "a gray area of the xslt spec":

If the spec were stricter about defining exactly when you can rely on identity-dependent operations then I would be obliged to follow it, but I think it's probably deliberate that it currently allows implementations some latitude, effectively signalling to users that they should avoid depending on this aspect of the behaviour.

He adviced to raise a bug in the w3c bugzilla to resolve the issue. In the end two related bugs have been raised:

Bug 13494 - Node uniqueness returned from XSLT function;
Bug 13747 - [XPath 3.0] Determinism of expressions returning constructed nodes.

Yesterday, the WG has resolved the issue:

The Working Group agreed that default behavior should continue to require these nodes to be constructed with unique IDs. We believe that this is the kind of thing implementations can do with annotations or declaration options, and it would be best to get implementation experience with this before standardizing.

This means that the technique we used to generate unique identifiers is correct and the behaviour is well defined.

The only problem is to wait when Saxon will fix its behaviour accordingly.

Wednesday, 14 September 2011 05:54:56 UTC

Comments [0] -
Thinking aloud | xslt

Yield return update

Recently one of users of java yield return annotation has kindly informed us about some problem that happened in his environment (see Java's @Yield return annotation update).

Incidentally we have never noticed the problem earlier. Along with this issue we have found that eclipse compiler has changed in the Indigo in a way that we had to recompile the source. Well, that's a price you have to pay when you access internal API.

Updated sources can be found at Yield.zip, and compiled jars at Yield.jar (pre-Indigo), and Yield.3.7.jar (Indigo and probably higher).