RSS 2.0
Sign In
# Thursday, 18 November 2010

Michael Key, author of the Saxon xslt processor, being inspired by the GWT ideas, has decided to compile Saxon HE into javascript. See Compiling Saxon using GWT.

The resulting script is about 1MB of size.

But what we thought lately, that it's overkill to bring whole xslt engine on a client, while it's possible to generate javascript from xslt the same way as he's building java from xquery. This will probably require some runtime but of much lesser size.

Thursday, 18 November 2010 16:19:52 UTC  #    Comments [0] -
Tips and tricks | xslt
# Tuesday, 09 November 2010

Search at www.google.fr: An empty sequence is not allowed as the @select attribute of xsl:analyze-string

That's known issue. See Bug 7976.

In xslt 2.0 you should either check the value before using xsl:analyze-string, or wrap it into string() call.

The problem is addressed in xslt 3.0

Tuesday, 09 November 2010 10:11:45 UTC  #    Comments [0] -
Tips and tricks | xslt
# Sunday, 07 November 2010

michaelhkay: Saxon 9.3 has been out for 8 days: only two bugs so far, one found by me. I think that's a record.

Not necessary. We, for example, who use Saxon HE, have found nothing new in Saxon 9.3, while expected to see xslt 3.0. Disappointed. No actual reason to migrate.

P.S. We were among the first who were finding early bugs in previous releases.

Sunday, 07 November 2010 09:07:11 UTC  #    Comments [0] -
Thinking aloud | xslt
# Tuesday, 02 November 2010

Reading individual papers of C++ WG, you can find the following one:

N3174 10-0164 To move or not to move Bjarne Stroustrup 2010-10-17 2010-10 Core

There, Bjarne Stroustrup thinks about issues with implicitly generated copy and move operations in C++.

It's always a pleasure to see how one can deal with a problem burdened with antagonisms. To conduct his position Bjarne skilfully uses not only rational but also emotional argumentation:

...We may deem this “bad code that deserves to be broken” or “unrealistic”, but this example demonstrates that the problem with a generated move has an exact counterpart for copy (which we have lived with for 27 years)...

...In 1984, I missed the chance to protect us against copy and we have lived with the problems ever since. I should have instituted some rule along the lines “if a class has a destructor, no copy operations are generated” or “if a class has a pointer member, no copy operations are generated.”...

It's impossible to recall this numbers without shivering. :-)

Tuesday, 02 November 2010 10:16:08 UTC  #    Comments [0] -
Thinking aloud

We're following w3's "Bug 9069 - Function to invoke an XSLT transformation".

There, people argue about xpath API to invoke xslt transformations. Function should look roughly like this:

transform
(
  $node-tree as node()?,
  $stylesheet as item(),
  $parameters as XXX
) as node()

The discussion is spinning around the last argument: $parameters as XXX. Should it be an xml element describing parameters, a function returning values for parameter names, or some new type modelling immutable map?

What is most interesting in this discussion is the leak about plans to introduce a map type:

Comment 7 Michael Kay, 2010-09-14 22:46:58 UTC

We're currently talking about adding an immutable map to XSLT as a new data type (the put operation would return a new map). There appear to be a number of possible efficient implementations. It would be ideally suited for this purpose, because unlike the mechanism used for serialization parameters, the values can be any data type (including nodes), not only strings.

There is a hope that map will finally appear in xslt!

See also:
Bug 5630 - [DM] Tuples and maps,
Tuples and maps - Status: CLOSED, WONTFIX,
Map, based on immutable trees,
Maps in exslt2?

Tuesday, 02 November 2010 08:34:52 UTC  #    Comments [0] -
Thinking aloud | xslt
# Monday, 01 November 2010

Historically jxom was developed first, and as such exhibited some imperfectness in its xml schema. csharpxom has taken into an account jxom's problems.

Unfortunately we could not easily fix jxom as a great amount of code already uses it. In this refactoring we tried to be conservative, and have changed only "type" and "import" xml schema elements in java.xsd.

Consider type reference and package import constructs in the old schema:

<!-- import java.util.ArrayList; -->
<import name="java.util.ArrayList"/>

<!-- java.util.ArrayList<java.math.BigDecimal> -->
<type package="java.util">
  <part name="ArrayList">
    <argument>
      <type name="BigDecimal" package="java.math">
    </argument>
  </part>
</type>

<!-- my.Parent.Nested -->
<type package="my">
  <part name="Parent"/>
  <part name="Nested"/>
<type>

Here we can observe that:

  • type is referred by a qualified name in import element;
  • type has two forms: simple (see BigDecimal), and other for nested or generic type (see ArrayList).

We have made it more consistent in the updated jxom:

<!-- import java.util.ArrayList; -->
<import>
  <type name="ArrayList" package="java.util"/>
</import>

<!-- java.util.ArrayList<java.math.BigDecimal> -->
<type name="ArrayList" package="java.util">
  <argument>
    <type name="BigDecimal" package="java.math">
  </argument>
</type>

<!-- my.Parent.Nested -->
<type name="Nested">
  <type name="Parent" package="my"/>
<type>

We hope that you will not be impacted very much by this fix.

Please refresh Languages XOM from languages-xom.zip.

P.S. we have also included xml schema and xslt api to generate ASPX (see Xslt serializer for ASPX output). We, in fact, in our projects, generate aspx documents with embedded csharpxom, and then pass it through two stage transformation.

Monday, 01 November 2010 15:48:19 UTC  #    Comments [0] -
Announce | xslt
# Friday, 22 October 2010

In the previous post we have announced an API to parse a COBOL source into the cobolxom.

We exploited the incremental parser to build a grammar xml tree and then were planning to create an xslt transformation to generate cobolxom.

Now, we would like to declare that such xslt is ready.

At present all standard COBOL constructs are supported, but more tests are required. Preprocessor support is still in the todo list.

You may peek into an examples of COBOL:

Cobol grammar:

And cobolxom:

While we were building a grammar to cobolxom stylesheet we asked ourselves whether the COBOL parsing could be done entirely in xslt. The answer is yes, so who knows it might be that we shall turn this task into pure xslt one. :-)

Friday, 22 October 2010 13:24:31 UTC  #    Comments [0] -
Announce | Incremental Parser | Thinking aloud | xslt
# Monday, 18 October 2010

Recently we've seen a code like this:

<xsl:variable name="a" as="element()?" select="..."/>
<xsl:variable name="b" as="element()?" select="..."/>

<xsl:apply-templates select="$a">
  <xsl:with-param name="b" tunnel="yes" as="element()" select="$b"/>
</xsl:apply-templates>

It fails with an error: "An empty sequence is not allowed as the value of parameter $b".

What is interesting is that the value of $a is an empty sequence, so the code could potentially work, provided processor evaluated $a first, and decided not to evaluate xsl:with-param.

Whether the order of evaluation of @select and xsl:with-param is specified by the standard or it's an implementation defined?

We asked this question on xslt forum, and got the following answer:

The specification leaves this implementation-defined. Since the values of the parameters are the same for every node processed, it's a reasonably strategy for the processor to evaluate the parameters before knowing how many selected nodes there are, though I guess an even better strategy would be to do it lazily when the first selected node is found.

Well, that's an expected answer. This question will probably induce Michael Kay to introduce a small optimization into the Saxon.

Monday, 18 October 2010 17:58:51 UTC  #    Comments [0] -
Tips and tricks | xslt
# Saturday, 09 October 2010

Once ago we have created an incremental parser, and now when we have decided to load COBOL sources directly into cobolxom (XML Object Model for a COBOL) the parser did the job perfectly.

The good point about incremental parser is that it easily handles COBOL's grammar.

The whole process looks like this:

  1. incremental parser having a COBOL grammar builds a grammar tree;
  2. we stream this tree into xml;
  3. xslt to transform xml from previous step into cobolxom (TODO).

This is an example of a COBOL:

IDENTIFICATION DIVISION.
PROGRAM-ID. FACTORIAL RECURSIVE.

DATA DIVISION.
WORKING-STORAGE SECTION.
01 NUMB PIC 9(4) VALUE IS 5.
01 FACT PIC 9(8) VALUE IS 0.

LOCAL-STORAGE SECTION.
01 NUM PIC 9(4).

PROCEDURE DIVISION.
  MOVE 'X' TO XXX
  MOVE NUMB TO NUM

  IF NUMB = 0 THEN
    MOVE 1 TO FACT
  ELSE
    SUBTRACT 1 FROM NUMB
    CALL 'FACTORIAL'
    MULTIPLY NUM BY FACT
  END-IF

  DISPLAY NUM '! = ' FACT

  GOBACK.
END PROGRAM FACTORIAL.

And a grammar tree:

<Program>
  <Name data="FACTORIAL"/>
  <Recursive/>
  <DataDivision>
    <WorkingStorageSection>
      <Data>
        <Level data="01"/>
        <Name data="NUMB"/>
        <Picture data="9(4)"/>
        <Value>
          <Numeric data="5"/>
        </Value>
      </Data>
      <Data>
        <Level data="01"/>
        <Name data="FACT"/>
        <Picture data="9(8)"/>
        <Value>
          <Numeric data="0"/>
        </Value>
      </Data>
    </WorkingStorageSection>
    <LocalStorageSection>
      <Data>
        <Level data="01"/>
        <Name data="NUM"/>
        <Picture data="9(4)"/>
      </Data>
    </LocalStorageSection>
  </DataDivision>
  <ProcedureDivision>
    <Sentence>
      <MoveStatement>
        <From>
          <String data="'X'"/>
        </From>
        <To>
          <Identifier>
            <DataName data="XXX"/>
          </Identifier>
        </To>
      </MoveStatement>
      <MoveStatement>
        <From>
          <Identifier>
            <DataName data="NUMB"/>
          </Identifier>
        </From>
        <To>
          <Identifier>
            <DataName data="NUM"/>
          </Identifier>
        </To>
      </MoveStatement>
      <IfStatement>
        <Condition>
          <Relation>
            <Identifier>
              <DataName data="NUMB"/>
            </Identifier>
            <Equal/>
            <Numeric data="0"/>
          </Relation>
        </Condition>
        <Then>
          <MoveStatement>
            <From>
              <Numeric data="1"/>
            </From>
            <To>
              <Identifier>
                <DataName data="FACT"/>
              </Identifier>
            </To>
          </MoveStatement>
        </Then>
        <Else>
          <SubtractStatement>
            <Value>
              <Numeric data="1"/>
            </Value>
            <From>
              <Identifier>
                <DataName data="NUMB"/>
              </Identifier>
            </From>
          </SubtractStatement>
          <CallStatement>
            <Name>
              <String data="'FACTORIAL'"/>
            </Name>
          </CallStatement>
          <MultiplyStatement>
            <Value>
              <Identifier>
                <DataName data="NUM"/>
              </Identifier>
            </Value>
            <By>
              <Identifier>
                <DataName data="FACT"/>
              </Identifier>
            </By>
          </MultiplyStatement>
        </Else>
      </IfStatement>
      <DisplayStatement>
        <Values>
          <Identifier>
            <DataName data="NUM"/>
          </Identifier>
          <String data="'! = '"/>
          <Identifier>
            <DataName data="FACT"/>
          </Identifier>
        </Values>
      </DisplayStatement>
      <GobackStatement/>
    </Sentence>
  </ProcedureDivision>
  <EndName data="FACTORIAL"/>
</Program>

The last step is to transform tree into cobolxom is in the TODO list.

We have commited COBOL grammar in the same place at SourceForge as it was with XQuery grammar. Solution is now under the VS 2010.

Saturday, 09 October 2010 08:26:23 UTC  #    Comments [0] -
Announce | Incremental Parser | xslt
# Friday, 08 October 2010

Suppose you have a timestamp string, and want to check whether it fits to one of the following formats with leading and trailing spaces:

  • YYYY-MM-DD-HH.MM.SS.NNNNNN
  • YYYY-MM-DD-HH.MM.SS
  • YYYY-MM-DD

We decided to use regex and its capture groups to extract timestamp parts. This left us with only solution: xsl:analyze-string instruction. It took a couple more minutes to reach a final solution:

<xsl:variable name="parts" as="xs:string*">
  <xsl:analyze-string select="$value"
    regex="
      ^\s*(\d\d\d\d)-(\d\d)-(\d\d)
      (-(\d\d)\.(\d\d)\.(\d\d)(\.(\d\d\d\d\d\d))?)?\s*$"
    flags="x">
    <xsl:matching-substring>
      <xsl:sequence select="regex-group(1)"/>
      <xsl:sequence select="regex-group(2)"/>
      <xsl:sequence select="regex-group(3)"/>

      <xsl:sequence select="regex-group(5)"/>
      <xsl:sequence select="regex-group(6)"/>
      <xsl:sequence select="regex-group(7)"/>

      <xsl:sequence select="regex-group(9)"/>
    </xsl:matching-substring>
  </xsl:analyze-string>
</xsl:variable>

<xsl:choose>
  <xsl:when test="exists($parts)">
    ...
  </xsl:when>
  <xsl:otherwise>
    ...
  </xsl:otherwise>
</xsl:choose>

How would you solve the problem? Is it the best solution?

Friday, 08 October 2010 17:37:44 UTC  #    Comments [0] -
Tips and tricks | xslt
# Sunday, 05 September 2010

One of our latest tasks was a conversion of data received from mainframe as an EBCDIC flat file into an XML file in UTF-8 encoding for further processing.

The solution was rather straightforward:

  • read the source flat file, record-by-record;
  • serialize each record as an element into target XML file using JAXB.

For reading data from EBCDIC encoded flat file, a good old tool named eXperanto was used. It allows to define C# and/or Java classes that suit for records in the source flat file. Thus we were able to read and convert records from EBCDIC to UTF-8.

The next sub-task was to serialize a Java bean to an XML element. JAXB marshaller was used for this.

Everything was ok, until we had started to test the implementation on real data.

We've realized that some decimal values (BigDecimal fields in Java classes) were serialized in scientific exponential notation. For example: 0.000000365 was serialized as 3.65E-7 and so on.

On the other hand, the target XML was used by another (non Java) application, which expected to receive decimal data, as it was defined in XSD schema (the field types were specified as xs:decimal).

According with W3C datatypes specification:

"...decimal has a lexical representation consisting of a finite-length sequence of decimal digits (#x30-#x39) separated by a period as a decimal indicator. An optional leading sign is allowed. If the sign is omitted, "+" is assumed. Leading and trailing zeroes are optional. If the fractional part is zero, the period and following zero(es) can be omitted. For example: -1.23, 12678967.543233, 100000.00, 210..."

So, the result was predictable, the consumer application fails.

Google search reveals that we deal with a well-known bug: "JAXB marshaller returns BigDecimal with scientific notation in JDK 6". It remains open already an year and half since May 2009, marked as "Fix in progress". We've tested our application with Java version 1.6.0_21-b07, JAXB 2.1.

Although this is rather critical bug that may affect on interoperability of Java applications (e.g. Java web services etc.), its priority was set just as "4-Low".

P.S. as a temporary workaround for this case only(!) we've replaced xs:decimal on xs:double in XSD schema for the target application.

Sunday, 05 September 2010 12:58:23 UTC  #    Comments [0] -
Java | Tips and tricks
# Wednesday, 25 August 2010

Accidentally we have found that implementation of String and StringBuilder have been considerably revised, while public interface has remained the same.

public sealed class String
{
  private int m_arrayLength;
  private int m_stringLength;
  private char m_firstChar;
}

This layout is dated to .NET 1.0.

VM, in fact, allocates more memory than that defined in C# class, as &m_firstChar refers to an inline char buffer.

This way string's buffer length and string's length were two different values, thus StringBuilder used this fact and stored its content in a private string which it modified in place.

In .NET 4, string is different:

public sealed class String
{
  private int m_stringLength;
  private char m_firstChar;
}

Memory footprint of such structure is smaller, but string's length should always be the same as its buffer. In fact layout of string is now the same as layout of char[].

This modification leads to implementation redesign of the StringBuilder.

Earlier, StringBuilder looked like the following:

public sealed class StringBuilder
{
  internal IntPtr m_currentThread;
  internal int m_MaxCapacity;
  internal volatile string m_StringValue;
}

Notice that m_StringValue is used as a storage, and m_currentThread is used to preserve thread affinity of the internal string value.

Now, guys at Microsoft have decided to implement StringBuilder very differently:

public sealed class StringBuilder
{
  internal int m_MaxCapacity;
  internal int m_ChunkLength;
  internal int m_ChunkOffset;
  internal char[] m_ChunkChars;
  internal StringBuilder m_ChunkPrevious;
}

Inspection of this layout immediately reveals implementation technique. It's a list of chunks. Instance itself references the last chunk (most recently appended), and the previous chunks.

Characteristics of this design are:

  • while Length is small, performance almost the same as it was earlier;
  • there are no more thread affinity checks;
  • Append(), and ToString() works as fast a in the old version.
  • Insert() in the middle works faster, as only a chuck should be splitted and probably reallocated (copied), instead of the whole string;
  • Random access is fast at the end O(1) and slows when you approaching the start O(chunk-count).

Personally, we would select a slightly different design:

public sealed class StringBuilder
{
  private struct Chunk
  {
    public int length; // Chunk length.
    public int offset; // Chunk offset.
    public char[] buffer; 
  }

  private int m_MaxCapacity;

  // Alternatively, one can use
  // private List<Chunk> chunks;
  private int chunkCount; // Number of used chunks.
  private Chunk[] chunks; // Array of chunks except last.

  private Chunk last; // Last chunk.
  private bool nonHomogenous; // false if all chunks are of the same size.
}

This design has better memory footprint, and random access time is O(1) when there were no inserts in the middle (nonHomogenous=false), and O(log(chunkCount)) after such inserts. All other characteristics are the same.

Wednesday, 25 August 2010 09:36:55 UTC  #    Comments [0] -
Thinking aloud | Tips and tricks

Earlier, there was a hype about how good VS 2010 is.

When we tried the beta and found that it's noticeably slower than VS 2008, we assumed that release will do better.

Unfortunately, that was an optimistic assumption.

Comparing VS 2008 and VS 2010 we can confirm that later:

  • eats more memory;
  • exhibits slower experience with C# projects (often hangs for a long periods and even crushes);
  • incapable to work with xslt 2.0 files;
  • has removed Shift+Enter key stroke to insert <br/> in html editor (why?);
  • has removed visualizer of the StringBuilder (in debugger).

Are we using too outdated hardware (laptops Lenovo T60 2GHz Core Duo/2GB RAM)? Other reason?

Wednesday, 25 August 2010 06:52:39 UTC  #    Comments [0] -

# Wednesday, 04 August 2010

We have updated C# XOM (csharpxom) to support C# 4.0 (in fact there are very few changes).

From the grammar perspective this includes:

  • Dynamic types;
  • Named and optional arguments;
  • Covariance and contravariance of generic parameters for interfaces and delegates.

Dynamic type, C#:

dynamic dyn = 1;

C# XOM:

<var name="dyn">
  <type name="dynamic"/>
  <initialize>
    <int value="1"/>
  </initialize>
</var>

Named and Optional Arguments, C#:

int Increment(int value, int increment = 1)
{
  return value + increment;
}

void Test()
{
  // Regular call.
  Increment(7, 1);

  // Call with named parameter.
  Increment(value: 7, increment: 1);
 
  // Call with default.
  Increment(7);
}

C# XOM:

<method name="Increment">
  <returns>
    <type name="int"/>
  </returns>
  <parameters>
    <parameter name="value">
      <type name="int"/>
    </parameter>
    <parameter name="increment">
      <type name="int"/>
      <initialize>
        <int value="1"/>
      </initialize>
    </parameter>
  </parameters>
  <block>
    <return>
      <add>
        <var-ref name="value"/>
        <var-ref name="increment"/>
      </add>
    </return>
  </block>
</method>

<method name="Test">
  <block>
    <expression>
      <comment>Regular call.</comment>
      <invoke>
        <method-ref name="Increment"/>
        <arguments>
          <int value="7"/>
          <int value="1"/>
        </arguments>
      </invoke>
    </expression>

    <expression>
      <comment>Call with named parameter.</comment>
      <invoke>
        <method-ref name="Increment"/>
        <arguments>
          <argument name="value">
            <int value="7"/>
          </argument>
          <argument name="increment">
            <int value="1"/>
          </argument>
        </arguments>
      </invoke>
    </expression>

    <expression>
      <comment>Call with default.</comment>
      <invoke>
        <method-ref name="Increment"/>
        <arguments>
          <int value="7"/>
        </arguments>
      </invoke>
    </expression>
  </block>
</method>

Covariance and contravariance, C#:

public interface Variance<in T, out P, Q>
{
  P X(T t);
}

C# XOM:

<interface access="public" name="Variance">
  <type-parameters>
    <type-parameter name="T" variance="in"/>
    <type-parameter name="P" variance="out"/>
    <type-parameter name="Q"/>
  </type-parameters>
  <method name="X">
    <returns>
      <type name="P"/>
    </returns>
    <parameters>
      <parameter name="t">
        <type name="T"/>
      </parameter>
    </parameters>
  </method>
</interface>

Other cosmetic fixes were also introduced into Java XOM (jxom), COBOL XOM (cobolxom), and into sql XOM (sqlxom).

The new version is found at languages-xom.zip.

See also: What's New in Visual C# 2010

Wednesday, 04 August 2010 14:00:26 UTC  #    Comments [0] -
Announce | xslt
# Thursday, 15 July 2010

We have run into another xslt bug, which depends on several independent circumstances and often behaves differently being observed. That's clearly a Heisenbug.

Xslt designers failed to realize that a syntactic suggar they introduce into xpath can turn into obscure bugs. Well, it's easy to be wise afterwards...

To the point.

Consider you have a sequence consisting of text nodes and elements, and now you want to "normalize" this sequence wrapping adjacent text nodes into separate elements. The following stylesheet is supposed to do the work:

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:t="http://www.nesterovsky-bros.com/xslt/this"
  exclude-result-prefixes="xs t">

  <xsl:template match="/">
    <xsl:variable name="nodes" as="node()*">
      <xsl:text>Hello, </xsl:text>
      <string value="World"/>
      <xsl:text>! </xsl:text>
      <xsl:text>Well, </xsl:text>
      <string value="hello"/>
      <xsl:text>, if not joking!</xsl:text>
    </xsl:variable>
 
    <result>
      <xsl:sequence select="t:normalize($nodes)"/>
    </result>
  </xsl:template>

  <xsl:function name="t:normalize" as="node()*">
    <xsl:param name="nodes" as="node()*"/>

    <xsl:for-each-group select="$nodes" group-starting-with="*">
      <xsl:variable name="string" as="element()?" select="self::string"/>
      <xsl:variable name="texts" as="node()*"
        select="current-group() except $string"/>

      <xsl:sequence select="$string"/>

      <xsl:if test="exists($texts)">
        <string value="{string-join($texts, '')}"/>
      </xsl:if>
    </xsl:for-each-group>
  </xsl:function>

</xsl:stylesheet>

We're expecting the following output:

<result>
  <string value="Hello, "/>
  <string value="World"/>
  <string value="! Well, "/>
  <string value="hello"/>
  <string value=", if not joking!"/>
</result>

But often we're getting other results, like:

<result>
  <string value="Hello, "/>
  <string value="World"/>
  <string value="Well, ! "/>
  <string value="hello"/>
  <string value=", if not joking!"/>
</result>

Such output may seriously confuse, unless you will recall the rule for the xpath except operator:

The except operator takes two node sequences as operands and returns a sequence containing all the nodes that occur in the first operand but not in the second operand.

... these operators eliminate duplicate nodes from their result sequences based on node identity. The resulting sequence is returned in document order..

...
The relative order of nodes in distinct trees is stable but implementation-dependent

These words mean that result sequence may be very different from original sequence.

In contrast, if we change $text definition to:

<xsl:variable name="texts" as="node()*"
  select="current-group()[not(. is $string)]"/>

then the result becomes stable, but less clear.

See also Xslt Heisenbug

Thursday, 15 July 2010 08:22:13 UTC  #    Comments [0] -
Thinking aloud | Tips and tricks | xslt
Archive
<2010 November>
SunMonTueWedThuFriSat
31123456
78910111213
14151617181920
21222324252627
2829301234
567891011
Statistics
Total Posts: 387
This Year: 3
This Month: 0
This Week: 0
Comments: 1859
Locations of visitors to this page
Disclaimer
The opinions expressed herein are our own personal opinions and do not represent our employer's view in anyway.

© 2024, Nesterovsky bros
All Content © 2024, Nesterovsky bros
DasBlog theme 'Business' created by Christoph De Baene (delarou)