RSS 2.0
Sign In
# Friday, 04 April 2008

The type system of xslt 2.0 is not complete (see Sequence of sequences in xslt 2.0). You cannot perform manipulations over items as you could do. The reason is in the luck of set based constructs: xslt 2.0 supports sequences, but not associative maps of items.

If you think that xml can be used as a good approximation of a map, I shan't agree with you. Xml has an application in a very specific cases only. Maps I'm thinking of,  would allow associate items by reference, like sequences do.

This opens a perspective to create a state objects, to manage sequence of sequences, to create cyclic graphs of items, and so on. These maps are richer than what key() function provides right now, and allow to implement for-each-group in xquery.

Such maps can be modeled with several functions, however I would wish they were built in:

f:map($items as item()*) as item()
Returns a map from a sequence $items of pairs (key, value).

f:map-items($map as item()) as item()*
Returns a sequence of pairs (key, value) for a map $map.

f:map-keys($map as item()) as item()*
Returns a sequence of keys contained in a map $map.

f:map-values($map as item()) as item()*
Returns a sequence of values contained in a map $map.

f:map-value($map as item(), $key as item()) as item()*
Returns a sequence of values corresponding to a specified key $key contained a specified map $map.

The other thing I would add is items tuple. It's like a sequence, however a sequence of tuples is never transformed into single sequence, but stays as sequence of tuples.

Fortunately it's possible to implement such extension functions.

Friday, 04 April 2008 13:49:56 UTC  #    Comments [0] -
xslt
# Wednesday, 02 April 2008

xslt 2.0 is a beautiful language and at the same time it allows constructs, which may trouble anyone.

Look at this valid stylesheet:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xsl:template match="/">
    <xsl:variable name="x" as="node()" select="."/>
    <xsl:variable name="x" as="xs:int" select="***"/>

    <xsl:sequence select="$x"/>
  </xsl:template>

</xsl:stylesheet>

Fun, isn't it? :-)

Wednesday, 02 April 2008 05:45:28 UTC  #    Comments [0] -
xslt
# Monday, 31 March 2008

I was thinking earlier about the difference between named tamplates and functions in xslt 2.0 and have not found satisfactory criterion for a decision of what to use in each case. I was not first one who has troubled with this, see stylesheet functions or named templates.

To feel easy I deliberately have decided to use functions whenever possible, avoid named tamplates completely, and use matching templates to apply logic depending on context (something like virtual function). I've forgot about the issue until yesterday. To realize the difference one should stop thinking of it, quite opposite she must start solving practical xslt tasks, and if there is any difference, except syntactic, it will manifest itself somehow.

To make things obvious to those whose programming roots are in a language like C++ I shall compare xsl:function with free standing (or static) C++ function, and named xsl:template with C++ member function. In C++ you can use both free standing and member functions interchangeably, however if there is only one argument (among others) whose state transition this function represents then it's preferrable to define it as a member function. The most important difference between these two type of functions is that a member function has hidden argument "this", and is able to access its private state.

Please, do not try to think I'm going to compare template context item in xslt 2.0 with "this" in C++, quite opposite I consider context item as a part of a state. I'm arguing however, of private state that can be passed through template call chain with tunnel parameters. Think of a call tunneling some state (like options, flags, values), and that state accessed several levels deep in call hierarchy, whenever one needs to. You cannot do it with xsl:function, you cannot pass all private state through the function call, you just do not know of it.

This way my answer to the tacit question is:

  •  use xsl:function to perform independent unit of logic;
  •  use named xsl:template when a functionality is achieved cooperatively, and when you will possibly need to share the state between different implementation blocks;

After thinking through this, I've noticed that such distinction does not exist in XQuery 1.0. There is no tunneling there. :-)

Monday, 31 March 2008 06:54:22 UTC  #    Comments [0] -
xslt
# Tuesday, 25 March 2008

In the xslt world there is no widely used custom to think of stylesheet members as of public and private in contrast to other programming languages like C++/java/c# where access modifiers are essential. The reason is in complexity of stylesheets: the less size of code - the easier to developer to keep all details in memory. Whenever xslt program grows you should modularize it to keep it manageable.

At the point where modules are introduced one starts thinking of public interface of module and its implementation details. This separation is especially important for the template matching as you won't probably want to match private template just because you've forgotten about some template in implementation of some module.

To make public or private member distinction you can introduce two namespaces in your stylesheet, like:

For the private namespace you can use a unique name, e.g. stylesheet name as part of uri.

The following example is based on jxom. This stylesheet builds expression from expression tree. Public part consists only of t:get-expression function, other members are private:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="http://www.nesterovsky-bros.com/public"
  xmlns:p="http://www.nesterovsky-bros.com/private/expression.xslt"
  xmlns="http://www.nesterovsky-bros.com/download/jxom.zip"
  xpath-default-namespace="http://www.nesterovsky-bros.com/download/jxom.zip"
  exclude-result-prefixes="xs t p">

  <xsl:output method="text" indent="yes"/>

  <!-- Entry point. -->
  <xsl:template match="/">
    <xsl:variable name="expression" as="element()">
      <lt>
        <sub>
          <mul>
            <var name="b"/>
            <var name="b"/>
          </mul>
          <mul>
            <mul>
              <int>4</int>
              <var name="a"/>
            </mul>
            <var name="c"/>
          </mul>
        </sub>
        <double>0</double>
      </lt>
    </xsl:variable>

    <xsl:value-of select="t:get-expression($expression)" separator=""/>
  </xsl:template>

  <!--
    Gets expression.
      $element - expression element.
      Returns expression tokens.
  -->
  <xsl:function name="t:get-expression" as="item()*">
    <xsl:param name="element" as="element()"/>

    <xsl:apply-templates mode="p:expression" select="$element"/>
  </xsl:function>

  <!--
    Gets binary expression.
      $element - assignment expression.
      $type - expression type.
      Returns expression token sequence.
  -->
  <xsl:function name="p:get-binary-expression" as="item()*">
    <xsl:param name="element" as="element()"/>
    <xsl:param name="type" as="xs:string"/>

    <xsl:sequence select="t:get-expression($element/*[1])"/>
    <xsl:sequence select="' '"/>
    <xsl:sequence select="$type"/>
    <xsl:sequence select="' '"/>
    <xsl:sequence select="t:get-expression($element/*[2])"/>
  </xsl:function>

  <!-- Mode "expression". Empty match. -->
  <xsl:template mode="p:expression" match="@*|node()">
    <xsl:sequence select="error(xs:QName('invalid-expression'), name())"/>
  </xsl:template>

  <!-- Mode "expression". or. -->
  <xsl:template mode="p:expression" match="or">
    <xsl:sequence select="p:get-binary-expression(., '||')"/>
  </xsl:template>

  <!-- Mode "expression". and. -->
  <xsl:template mode="p:expression" match="and">
    <xsl:sequence select="p:get-binary-expression(., '&&')"/>
  </xsl:template>

  <!-- Mode "expression". eq. -->
  <xsl:template mode="p:expression" match="eq">
    <xsl:sequence select="p:get-binary-expression(., '==')"/>
  </xsl:template>

  <!-- Mode "expression". ne. -->
  <xsl:template mode="p:expression" match="ne">
    <xsl:sequence select="p:get-binary-expression(., '!=')"/>
  </xsl:template>

  <!-- Mode "expression". le. -->
  <xsl:template mode="p:expression" match="le">
    <xsl:sequence select="p:get-binary-expression(., '<=')"/>
  </xsl:template>

  <!-- Mode "expression". ge. -->
  <xsl:template mode="p:expression" match="ge">
    <xsl:sequence select="p:get-binary-expression(., '>=')"/>
  </xsl:template>

  <!-- Mode "expression". lt. -->
  <xsl:template mode="p:expression" match="lt">
    <xsl:sequence select="p:get-binary-expression(., '<')"/>
  </xsl:template>

  <!-- Mode "expression". gt. -->
  <xsl:template mode="p:expression" match="gt">
    <xsl:sequence select="p:get-binary-expression(., '>')"/>
  </xsl:template>

  <!-- Mode "expression". add. -->
  <xsl:template mode="p:expression" match="add">
    <xsl:sequence select="p:get-binary-expression(., '+')"/>
  </xsl:template>

  <!-- Mode "expression". sub. -->
  <xsl:template mode="p:expression" match="sub">
    <xsl:sequence select="p:get-binary-expression(., '-')"/>
  </xsl:template>

  <!-- Mode "expression". mul. -->
  <xsl:template mode="p:expression" match="mul">
    <xsl:sequence select="p:get-binary-expression(., '*')"/>
  </xsl:template>

  <!-- Mode "expression". div. -->
  <xsl:template mode="p:expression" match="div">
    <xsl:sequence select="p:get-binary-expression(., '/')"/>
  </xsl:template>

  <!-- Mode "expression". neg. -->
  <xsl:template mode="p:expression" match="neg">
    <xsl:sequence select="'-'"/>
    <xsl:sequence select="t:get-expression(*[1])"/>
  </xsl:template>

  <!-- Mode "expression". not. -->
  <xsl:template mode="p:expression" match="not">
    <xsl:sequence select="'!'"/>
    <xsl:sequence select="t:get-expression(*[1])"/>
  </xsl:template>

  <!-- Mode "expression". parens. -->
  <xsl:template mode="p:expression" match="parens">
    <xsl:sequence select="'('"/>
    <xsl:sequence select="t:get-expression(*[1])"/>
    <xsl:sequence select="')'"/>
  </xsl:template>

  <!-- Mode "expression". var. -->
  <xsl:template mode="p:expression" match="var">
    <xsl:sequence select="@name"/>
  </xsl:template>

  <!-- Mode "expression". int, short, byte, long, float, double. -->
  <xsl:template mode="p:expression"
    match="int | short | byte | long | float | double">
    <xsl:sequence select="."/>
  </xsl:template>

 </xsl:stylesheet>

Tuesday, 25 March 2008 06:23:30 UTC  #    Comments [0] -
Tips and tricks | xslt
# Monday, 03 March 2008

I often find myself in a position that whenever I'm thinking of something, I can find the idea to be already implemented somewhere.

A good example is xslt/xquery -> java code.

Well, the world is full with smart guys. :-)

Monday, 03 March 2008 18:08:21 UTC  #    Comments [0] -
xslt
# Thursday, 28 February 2008

Wow, I've found an article Code generation in XSLT 2.0. The article is dated by year 2005.

Well, I was inventing a bicycle. This is a good lesson for me.

I'm going to study very carefully about SQL Code Generation, as this is exacly the same task I'm facing now.

Thursday, 28 February 2008 04:35:24 UTC  #    Comments [0] -
xslt
# Wednesday, 27 February 2008

I've updated jxom.zip.

There are minor fixes there. The most important addition is a line breaker. The purpose of the line breaker is to split long lines.

Long lines appear if there are verbose comments, or there is a very long expression, which was not categorized as multiline.

It's not perfect, however looks acceptable.

Now I'm facing a next problem: I need to do a similar job I'm doing to java, however with sql. Moreover, I need to support several dialects of sql. I'm not sure if it's possible (worth) to define single sql-xom.xsd, or should I define sql-db2-v9-xom.xsd, sql-sqlserver-2005-xom.xsd, ...

The bad news are that sql grammar is much more complex than one of java. Probably I'll start from some sql subset. In any case I do not consider generation of sql "directly", as jxom fits remarkably into its role.

Wednesday, 27 February 2008 13:30:47 UTC  #    Comments [0] -
xslt
# Wednesday, 20 February 2008

Building jxom stylesheets I've learned what is a "good" and "bad" recursion from the saxon's perspective.

I'm using control tokens $t:indent and $t:unindent to control indentation in the sequence of tokens defining java output. To build output lines I need to calculate total indentation for each line. This can be done using cummulative sum, considering $t:indent as +1 and $t:unindent as -1.

This task can be formalized as "calculate cummulative integer sum".

The first approach I've tested is non recursive: "for $i in 1 to count($items) return sum(subsequence($items, 1, $i))".
It is incredibly slow.

The next try was recurrent: calculate and spew results as they are calculated.
This is "crash fast" method. Saxon, indeed, implements this as recursion and arrives to a stack limit early.

The last approach, employes saxon's ability to detect some particular flavour of tail calls. When function contains a tail call, and the output on a tail call code path consists of this tail call only, then saxon transforms such construction into a cycle. Thus I need to accumulate result and pass it down to a tail call chain and output it on the last opportunity only.

The following sample shows this technique:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="http://www.nesterovsky-bros.com"
  exclude-result-prefixes="xs t">

  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="/">
    <xsl:variable name="values" as="xs:integer*" select="1 to 10000"/>

    <result>
      <sum>
        <xsl:value-of select="t:cumulative-integer-sum($values)"/>

        <!-- This call crashes with stack overflow. -->
        <!-- <xsl:value-of select="t:bad-cumulative-integer-sum($values)"/> -->

        <!-- To compare speed uncomment following lines. -->
        <!--<xsl:value-of select="sum(t:cumulative-integer-sum($values))"/>-->
        <!--<xsl:value-of select="sum(t:slow-cumulative-integer-sum($values))"/>-->
      </sum>
    </result>
  </xsl:template>

  <!--
    Calculates cumulative sum of integer sequence.
      $items - input integer sequence.
      Returns an integer sequence that is a cumulative sum of original sequence.
  -->
  <xsl:function name="t:cumulative-integer-sum" as="xs:integer*">
    <xsl:param name="items" as="xs:integer*"/>

    <xsl:sequence select="t:cumulative-integer-sum-impl($items, 1, 0, ())"/>
  </xsl:function>

  <!--
    Implementation of the t:cumulative-integer-sum.
      $items - input integer sequence.
      $index - current iteration index.
      $sum - base sum.
      $result - collected result.
      Returns an integer sequence that is a cumulative sum of original sequence.
  -->
  <xsl:function name="t:cumulative-integer-sum-impl" as="xs:integer*">
    <xsl:param name="items" as="xs:integer*"/>
    <xsl:param name="index" as="xs:integer"/>
    <xsl:param name="sum" as="xs:integer"/>
    <xsl:param name="result" as="xs:integer*"/>

    <xsl:variable name="item" as="xs:integer?" select="$items[$index]"/>

    <xsl:choose>
      <xsl:when test="empty($item)">
        <xsl:sequence select="$result"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:variable name="value" as="xs:integer" select="$item + $sum"/>
        <xsl:variable name="next" as="xs:integer+" select="$result, $value"/>

        <xsl:sequence select="
          t:cumulative-integer-sum-impl($items, $index + 1, $value, $next)"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:function>

  <!-- "Bad" implementation of the cumulative-integer-sum. -->
  <xsl:function name="t:bad-cumulative-integer-sum" as="xs:integer*">
    <xsl:param name="items" as="xs:integer*"/>

    <xsl:sequence select="t:bad-cumulative-integer-sum-impl($items, 1, 0)"/>
  </xsl:function>

  <!-- "Bad" implementation of the cumulative-integer-sum. -->
  <xsl:function name="t:bad-cumulative-integer-sum-impl" as="xs:integer*">
    <xsl:param name="items" as="xs:integer*"/>
    <xsl:param name="index" as="xs:integer"/>
    <xsl:param name="sum" as="xs:integer"/>

    <xsl:variable name="item" as="xs:integer?" select="$items[$index]"/>

    <xsl:if test="exists($item)">
      <xsl:variable name="value" as="xs:integer" select="$item + $sum"/>
 
      <xsl:sequence select="$value"/>
      <xsl:sequence select="
        t:bad-cumulative-integer-sum-impl($items, $index + 1, $value)"/>
    </xsl:if>
  </xsl:function>

 <!-- Non recursive implementation of the cumulative-integer-sum. -->
 <xsl:function name="t:slow-cumulative-integer-sum" as="xs:integer*">
   <xsl:param name="items" as="xs:integer*"/>

   <xsl:sequence select="
     for $i in 1 to count($items) return
       sum(subsequence($items, 1, $i))"/>
 </xsl:function>

</xsl:stylesheet>

Wednesday, 20 February 2008 08:59:22 UTC  #    Comments [0] -
xslt
# Tuesday, 19 February 2008

Comparing xslt 2.0 with its predecessor I see a great evolution of the language. There are however parts of language, which are not as good as they could be.

Look at manipulations of sequence of sequence of items. xpath 2.0/xquery 1.0 type system treats type quantifiers separately from type itself. One can declare a variable of type "xs:string", or variable of type of sequence of strings "xs:string*". Unfortunately it's not possible to declare a sequence of sequence of strings "xs:string**", as type can have only one quantifier.

I think this is wrong. People do different tricks to remedy the problem. Typically one builds nodes that contain copy of items of sequences. Clearly this is a heavy way to achieve a simple result, moreover it does not preserve item identity.

In jxom I'm using different solution to store sequence of sequences, namely storing all sequences in one, separated with terminator.

A typical sample is in the java serializer. After building method's parameters I should format them one (compact) or the other (verbose) way depending on decision, which can be made when all parameters are already built.

To see how it's working please look at following xslt:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="http://www.nesterovsky-bros.com"
  exclude-result-prefixes="xs t">

  <xsl:output method="xml" indent="yes"/>

  <!-- Terminator token. -->
  <xsl:variable name="t:terminator" as="xs:QName"
    select="xs:QName('t:terminator')"/>

  <!-- New line. -->
  <xsl:variable name="t:crlf" as="xs:string" select="'&#10;'"/>

  <xsl:template match="/">
    <!--
      We need to manipulate a sequence of sequence of tokens.
      To do this we use $t:terminator to separate sequences.
    -->
    <xsl:variable name="short-items" as="item()*">
      <xsl:sequence select="t:get-param('int', 'a')"/>
      <xsl:sequence select="$t:terminator"/>

      <xsl:sequence select="t:get-param('int', 'b')"/>
      <xsl:sequence select="$t:terminator"/>

      <xsl:sequence select="t:get-param('int', 'c')"/>
      <xsl:sequence select="$t:terminator"/>
    </xsl:variable>

    <xsl:variable name="long-items" as="item()*">
      <xsl:sequence select="t:get-param('int', 'a')"/>
      <xsl:sequence select="$t:terminator"/>

      <xsl:sequence select="t:get-param('int', 'b')"/>
      <xsl:sequence select="$t:terminator"/>

      <xsl:sequence select="t:get-param('int', 'c')"/>
      <xsl:sequence select="$t:terminator"/>

      <xsl:sequence select="t:get-param('int', 'd')"/>
      <xsl:sequence select="$t:terminator"/>
    </xsl:variable>

    <result>
      <short>
        <xsl:value-of select="t:format($short-items)" separator=""/>
      </short>
      <long>
        <xsl:value-of select="t:format($long-items)" separator=""/>
      </long>
    </result>
  </xsl:template>

  <!--
    Returns a sequence of tokens that defines a parameter.
      $type - parameter type.
      $name - parameter name.
      Returns sequence of parameter tokens.
  -->
  <xsl:function name="t:get-param" as="item()*">
    <xsl:param name="type" as="xs:string"/>
    <xsl:param name="name" as="xs:string"/>

    <xsl:sequence select="$type"/>
    <xsl:sequence select="' '"/>
    <xsl:sequence select="$name"/>
  </xsl:function>

  <!--
    Format sequence of sequence of tokens separated with $t:terminator.
      $tokens - sequence of sequence of tokens to format.
      Returns formatted sequence of tokens.
  -->
  <xsl:function name="t:format" as="item()*">
    <xsl:param name="tokens" as="item()*"/>

    <xsl:variable name="terminators" as="xs:integer+"
      select="0, index-of($tokens, $t:terminator)"/>
    <xsl:variable name="count" as="xs:integer"
      select="count($terminators) - 1"/>
    <xsl:variable name="verbose" as="xs:boolean"
      select="$count > 3"/>

    <xsl:sequence select="
      for $i in 1 to $count return
      (
        subsequence
        (
          $tokens,
          $terminators[$i] + 1,
          $terminators[$i + 1] - $terminators[$i] - 1
        ),
        if ($i = $count) then ()
        else
        (
          ',',
          if ($verbose) then $t:crlf else ' '
        )
      )"/>
  </xsl:function>

</xsl:stylesheet>

Tuesday, 19 February 2008 07:54:11 UTC  #    Comments [0] -
xslt
# Monday, 18 February 2008

I've updated jxom.zip. Now it supports qualified type name optimization.

I need to mention that this optimization is only possible when imports does not contain wildcard declarations like:

import a.b.*;

The only important thing to do is a good line breaker.

Monday, 18 February 2008 09:28:34 UTC  #    Comments [0] -
xslt

Is it possible to call function indirectly in xslt 2.0?

The answer is yes, however implementation uses dull trick of template matching to select a function handler. Template matching is a beautiful thing. Definitely it was not devised to make this trick possible.

The following example defines two functions t:sum, and t:count to call indirectly by t:test.
Function id (a.k.a. function pointer) is defined by t:sum, and t:count variables.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="http://www.nesterovsky-bros.com"
  exclude-result-prefixes="xs t">

<xsl:output method="xml" indent="yes"/>

<xsl:template match="/">
  <xsl:variable name="items" as="element()*">
    <value>1</value>
    <value>2</value>
    <value>3</value>
    <value>4</value>
    <value>5</value>
  </xsl:variable>

  <root>
    <sum>
      <xsl:sequence select="t:test($items, $t:sum)"/>
    </sum>
    <count>
      <xsl:sequence select="t:test($items, $t:count)"/>
    </count>
  </root>
</xsl:template>

<!-- Mode "t:function-call". Default match. -->
<xsl:template mode="t:function-call" match="@* | node()">
 <xsl:sequence select="
   error
   (
     xs:QName('invalid-call'),
     concat('Unbound function call. Id: ', name())
   )"/>
</xsl:template>

<!-- Id of the function t:sum. -->
<xsl:variable name="t:sum" as="item()">
 <t:sum/>
</xsl:variable>

<!-- Mode "t:function-call". t:sum handler. -->
<xsl:template mode="t:function-call" match="t:sum">
  <xsl:param name="items" as="element()*"/>

  <xsl:sequence select="t:sum($items)"/>
</xsl:template>

<!--
  Calculates a sum of elements.
    $param - items to sum.
    Returns sum of element values.
-->
<xsl:function name="t:sum" as="xs:integer">
  <xsl:param name="items" as="element()*"/>

  <xsl:sequence select="sum($items/xs:integer(.))"/>
</xsl:function>

<!-- Id of the function t:count. -->
<xsl:variable name="t:count" as="item()">
  <t:count/>
</xsl:variable>

<!-- Mode "t:function-call". t:count handler. -->
<xsl:template mode="t:function-call" match="t:count">
  <xsl:param name="items" as="element()*"/>

  <xsl:sequence select="t:count($items)"/>
</xsl:template>

<!--
  Calculates the number of elements in a sequence.
    $param - items to count.
    Returns count of element values.
-->
<xsl:function name="t:count" as="xs:integer">
 <xsl:param name="items" as="element()*"/>

 <xsl:sequence select="count($items)"/>
</xsl:function>

<!--
  A function that performs indirect call.
    $param - items to pass to an indirect call.
    $function-id - a function id.
    Returns a value calculated in the indirect function.
-->
<xsl:function name="t:test" as="xs:integer">
 <xsl:param name="items" as="element()*"/>
 <xsl:param name="function-id" as="item()"/>

 <xsl:variable name="result" as="xs:integer">
   <xsl:apply-templates mode="t:function-call" select="$function-id">
     <xsl:with-param name="items" select="$items"/>
   </xsl:apply-templates>
 </xsl:variable>

 <xsl:sequence select="$result"/>
</xsl:function>

</xsl:stylesheet>

Monday, 18 February 2008 05:53:46 UTC  #    Comments [0] -
xslt
# Saturday, 16 February 2008

Hello again!

To see first part about jxom please read.

I'm back with jxom (Java xml object model). I've finally managed to create an xslt that generates java code from jxom document.

Will you ask why it took as long as a week to produce it?

There are two answers:
1. My poor talents.
2. I've virtually created two implementations.

My first approach was to directly generate java text from xml. I was a truly believer that this is the way. I've screwed things up on that way, as when you're starting to deal with indentations, formatting and reformatting of text you're generating you will see things are not that simple. Well, it was a naive approach.

I could finish it, however at some point I've realized that its complexity is not composable from complexity of its  parts, but increases more and more. This is not permissible for a such simple task. Approach is bad. Point.

An alternative I've devised is simple and in fact more natural than naive approach. This is a two stage generation:
  a) generate sequence of tokens - serializer;
  b) generate and then print a sequence of lines - streamer.

Tokens (item()*) are either control words (xs:QName), or literals (xs:string).

I've defined following control tokens:

Token Description
t:indent indents following content.
t:unindent unindents following content.
t:line-indent resets indentation for one line.
t:new-line new line token.
t:terminator separates token sequences.
t:code marks line as code (default line type).
t:doc marks line as documentation comment.
t:begin-doc marks line as begin of documentation comment.
t:end-doc marks line as end of documentation comment.
t:comment marks line as comment.

Thus an input for the streamer looks like:

<xsl:sequence select="'public'"/>
<xsl:sequence select="' '"/>
<xsl:sequence select="'class'"/>
<xsl:sequence select="' '"/>
<xsl:sequence select="'A'"/>
<xsl:sequence select="$t:new-line"/>
<xsl:sequence select="'{'"/>
<xsl:sequence select="$t:new-line"/>
<xsl:sequence select="$t:indent"/>
<xsl:sequence select="'public'"/>
<xsl:sequence select="' '"/>
<xsl:sequence select="'int'"/>
<xsl:sequence select="' '"/>
<xsl:sequence select="'a'"/>
<xsl:sequence select="';'"/>
<xsl:sequence select="$t:unindent"/>
<xsl:sequence select="$t:new-line"/>
<xsl:sequence select="'}'"/>
<xsl:sequence select="$t:new-line"/>

Streamer receives a sequence of tokens and transforms it in a sequence of lines.

One beautiful thing about tokens is that streamer can easily perform line breaks in order to keep page width, and another convenient thing is that code generating tokens should not track indentation level, as it just uses t:indent, t:unindent control tokens to increase and decrease current indentation.

The way the code is built allows mimic any code style. I've followed my favorite one. In future I'll probably add options controlling code style. In my todo list there still are several features I want to implement, such as line breaker to preserve page width, and type qualification optimizer (optional feature) to reduce unnecessary type qualifications.

Current implementation can be found at jxom.zip. It contains:

File Description
java.xsd jxom xml schema.
java-serializer-main.xslt transformation entry point.
java-serializer.xslt generates tokens for top level constructs.
java-serializer-statements.xslt generates tokens for statements.
java-serializer-expressions.xslt generates tokens for expressions.
java-streamer.xslt converts tokens into lines.
DataAdapter.xml sample jxom document.

This was my first experience with xslt 2.0. I feel very pleased with what it can do. The only missed feature is indirect function call (which I do not want to model with dull template matching approach).

Note that in spite that xslt I've built is platform independed I want to point out that I was experimenting with saxon 9. Several times I've relied on efficient tail call implementation (see t:cumulative-integer-sum), which otherwise will lead to xslt stack overflow.

I shall be pleased to see your feedback on the subject.

Saturday, 16 February 2008 10:42:16 UTC  #    Comments [7] -
Tips and tricks | xslt
# Saturday, 09 February 2008

Hello,

I was not writing for a long time. IMHO: nothing to say? - do not noise!

Nowadays I'm busy with xslt.

Should I be pleased that w3c committee has finally delivered xpath 2.0/xslt 2.0/xquery? There possibly were people who have failed to wait till this happened, and who have died. Be grateful to the fate we have survived!

I'm working now with saxon 9. It's good implementation, however too interpreter like in my opinion. I think these languages could be compiled down to machine/vm code the same way as c++/java/c# do.

To the point.
I need to generate java code in xslt. I've done this earlier; that time I dealt with relatively simple templates like beans or interfaces. Now I need to generate beans, interfaces, classes with logic. In fact I should cover almost all java 6 features.

Immediately I've started thinking in terms of java xml object model (jxom). Thus there will be an xml schema of jxom (Am I inventing bicycle? I pray you to point me to an existing schema!) - java grammar as xml. There will be xslts, which generate code according to this schema, and xslt that will serialize jxom documents derectly into java.

This two stage generation is important as there are essentially two different tasks: generate java code, and serialize it down to a text format. Moreover whenever I have jxom document I can manipulate it! And finally this will allow to our team to concentrate efforts, as one should only generate jxom document.

Yesterday, I've found java ANLT grammar, and have converted it into xml schema: java.xsd. It is important to have this xml schema defined, even if no one shall use it except in editor, as it makes jxom generation more formal.

The next step is to create xslt serializer, which is in todo list.

To feel how jxom looks I've created it manually for some simple java file:

// $Id: DataAdapter.java 1122 2007-12-31 12:43:47Z arthurn $
package com.bphx.coolgen.data;

import java.util.List;

/**
* Encapsulates encyclopedia database access.
*/

public interface DataAdapter
{
  /**
   * Starts data access session for a specified model.
   * @param modelId - a model to open.
   */

  void open(int modelId)
    throws Exception;

  /**
   * Ends data access session.
   */

  void close()
   throws Exception;

  /**
   * Gets current model id.
   * @return current model id.
   */

  int getModelId();

  /**
   * Gets data objects for a specified object type for the current model.
   * @param type - an object type to get data objects for.
   * @return list of data objects.
   */

  List<DataObject> getObjectsForType(short type)
    throws Exception;

  /**
   * Gets a list of data associations for an object id.
   * @param id - object id.
   * @return list of data associations.
   */

  List<DataAssociation> getAssociations(int id)
    throws Exception;

  /**
   * Gets a list of data properties for an object id.
   * @param id - object id.
   * @return list of data properties.
   */

  List<DataProperty> getProperties(int id)
    throws Exception;
}

jxom:

<unit xmlns="http://www.bphx.com/java-1.5/2008-02-07" package="com.bphx.coolgen.data">
  <comment>$Id: DataAdapter.java 1122 2007-12-31 12:43:47Z arthurn $</comment>
  <import package="java.util.List"/>
  <interface access="public" name="DataAdapter">
    <comment doc="true">Encapsulates encyclopedia database access.</comment>
    <method name="open">
      <comment doc="true">
        Starts data access session for a specified model.
        <para type="param" name="modelId">a model to open.</para>
      </comment>
      <parameters>
        <parameter name="modelId"><type name="int"/></parameter>
      </parameters>
      <throws><type name="Exception"/></throws>
    </method>
    <method name="close">
      <comment doc="true">Ends data access session.</comment>
      <throws><type name="Exception"/></throws>
    </method>
    <method name="getModelId">
      <comment doc="true">
        Gets current model id.
        <para type="return">current model id.</para>
      </comment>
      <returns><type name="int"/></returns>
      <throws><type name="Exception"/></throws>
    </method>
    <method name="getObjectsForType">
      <comment doc="true">
        Gets data objects for a specified object type for the current model.
        <para name="param" type="type">
          an object type to get data objects for.
        </para>
        <para type="return">list of data objects.</para>
      </comment>
      <returns>
        <type>
          <part name="List">
            <typeArgument><type name="DataObject"/></typeArgument>
          </part>
        </type>
      </returns>
      <parameters>
        <parameter name="type"><type name="short"/></parameter>
      </parameters>
      <throws><type name="Exception"/></throws>
    </method>
    <method name="getAssociations">
      <comment doc="true">
        Gets a list of data associations for an object id.
        <para type="param" name="id">object id.</para>
        <para type="return">list of data associations.</para>
      </comment>
      <returns>
        <type>
          <part name="List">
            <typeArgument><type name="DataAssociation"/></typeArgument>
          </part>
        </type>
      </returns>
      <parameters>
        <parameter name="id"><type name="int"/></parameter>
      </parameters>
      <throws><type name="Exception"/></throws>
    </method>
    <method name="getProperties">
      <comment doc="true">
        Gets a list of data properties for an object id.
        <para type="param" name="id">object id.</para>
        <para type="return">list of data properties.</para>
      </comment>
      <returns>
        <!-- Compact form of generic type. -->
        <type name="List<DataProperty>"/>
      </returns>
      <parameters>
        <parameter name="id"><type name="int"/></parameter>
      </parameters>
      <throws><type name="Exception"/></throws>
    </method>
  </interface>
</unit>

To read about xslt for jxom please follow this link.

Saturday, 09 February 2008 17:56:45 UTC  #    Comments [3] -
Tips and tricks | xslt
# Sunday, 13 January 2008

We have set up the chess blog that our brother Aleksandr, who is chess fun, will support.

Sunday, 13 January 2008 07:17:23 UTC  #    Comments [0] -

# Monday, 12 March 2007
C++ Standard Library Issues List, Issue 254

I'm tracking this issue already for the several years, and have my unpretentious opinion. To make my arguments clear I'll bring the issue description here.

254. Exception types in clause 19 are constructed from std::string

Section: 19.1 [std.exceptions], 27.4.2.1.1 [ios::failure] Status: Tentatively Ready Submitter: Dave Abrahams Date: 2000-08-01

Discussion:

Many of the standard exception types which implementations are required to throw are constructed with a const std::string& parameter. For example:

     19.1.5  Class out_of_range                          [lib.out.of.range]
     namespace std {
       class out_of_range : public logic_error {
       public:
         explicit out_of_range(const string& what_arg);
       };
     }

   1 The class out_of_range defines the type of objects  thrown  as  excep-
     tions to report an argument value not in its expected range.

     out_of_range(const string& what_arg);

     Effects:
       Constructs an object of class out_of_range.
     Postcondition:
       strcmp(what(), what_arg.c_str()) == 0.

There are at least two problems with this:

  1. A program which is low on memory may end up throwing std::bad_alloc instead of out_of_range because memory runs out while constructing the exception object.
  2. An obvious implementation which stores a std::string data member may end up invoking terminate() during exception unwinding because the exception object allocates memory (or rather fails to) as it is being copied.

There may be no cure for (1) other than changing the interface to out_of_range, though one could reasonably argue that (1) is not a defect. Personally I don't care that much if out-of-memory is reported when I only have 20 bytes left, in the case when out_of_range would have been reported. People who use exception-specifications might care a lot, though.

There is a cure for (2), but it isn't completely obvious. I think a note for implementors should be made in the standard. Avoiding possible termination in this case shouldn't be left up to chance. The cure is to use a reference-counted "string" implementation in the exception object. I am not necessarily referring to a std::string here; any simple reference-counting scheme for a NTBS would do.

Further discussion, in email:

...I'm not so concerned about (1). After all, a library implementation can add const char* constructors as an extension, and users don't need to avail themselves of the standard exceptions, though this is a lame position to be forced into. FWIW, std::exception and std::bad_alloc don't require a temporary basic_string.

...I don't think the fixed-size buffer is a solution to the problem, strictly speaking, because you can't satisfy the postcondition
  strcmp(what(), what_arg.c_str()) == 0
For all values of what_arg (i.e. very long values). That means that the only truly conforming solution requires a dynamic allocation.

Further discussion, from Redmond:

The most important progress we made at the Redmond meeting was realizing that there are two separable issues here: the const string& constructor, and the copy constructor. If a user writes something like throw std::out_of_range("foo"), the const string& constructor is invoked before anything gets thrown. The copy constructor is potentially invoked during stack unwinding.

The copy constructor is a more serious problem, becuase failure during stack unwinding invokes terminate. The copy constructor must be nothrow. CuraƧao: Howard thinks this requirement may already be present.

The fundamental problem is that it's difficult to get the nothrow requirement to work well with the requirement that the exception objects store a string of unbounded size, particularly if you also try to make the const string& constructor nothrow. Options discussed include:

  • Limit the size of a string that exception objects are required to throw: change the postconditions of 19.1.2 [domain.error] paragraph 3 and 19.1.6 [runtime.error] paragraph 3 to something like this: "strncmp(what(), what_arg._str(), N) == 0, where N is an implementation defined constant no smaller than 256".
  • Allow the const string& constructor to throw, but not the copy constructor. It's the implementor's responsibility to get it right. (An implementor might use a simple refcount class.)
  • Compromise between the two: an implementation is not allowed to throw if the string's length is less than some N, but, if it doesn't throw, the string must compare equal to the argument.
  • Add a new constructor that takes a const char*

(Not all of these options are mutually exclusive.)

...

To be honest, I do not understand their (committee members') decisions. It seems they are trying to conceal themselves from the problem virtually proposing to store character buffer in the exception object. In fact the problem is more general, and is related to any exception types that store some data, and which can throw during copy construction. How to avoid problems during copy construction? Well, do not perform activity that can lead to an exception. If copying data can throw, then do not copy it! Thus we have to share data between exception objects.

This logic brought me to a safe exception type design. E.g. exception object should keep refcounted handle to a data object that is shared between type instances.

The only question is: why didn't they even consider this way?

Monday, 12 March 2007 09:52:09 UTC  #    Comments [0] -
Tips and tricks
Archive
<2008 April>
SunMonTueWedThuFriSat
303112345
6789101112
13141516171819
20212223242526
27282930123
45678910
Statistics
Total Posts: 387
This Year: 3
This Month: 0
This Week: 0
Comments: 2176
Locations of visitors to this page
Disclaimer
The opinions expressed herein are our own personal opinions and do not represent our employer's view in anyway.

© 2024, Nesterovsky bros
All Content © 2024, Nesterovsky bros
DasBlog theme 'Business' created by Christoph De Baene (delarou)