michaelhkay: Saxon 9.3 has been out for 8 days: only two bugs so far, one found by me. I think that's a record.
Not necessary. We, for example, who use Saxon HE, have found nothing new in Saxon 9.3, while expected to see xslt 3.0. Disappointed. No actual reason to migrate.
P.S. We were among the first who were finding early bugs in previous releases.
Reading individual papers of C++ WG, you can find the following one:
N3174
|
10-0164
|
To move or not to move
|
Bjarne Stroustrup
|
2010-10-17
|
2010-10
|
|
Core
|
There, Bjarne Stroustrup thinks about issues with implicitly generated copy and
move operations in C++.
It's always a pleasure to see how one can deal with a problem burdened with
antagonisms. To conduct his position Bjarne skilfully uses not only rational but
also emotional argumentation:
...We may deem this “bad code that deserves to be broken” or “unrealistic”, but
this example demonstrates that the problem with a generated move has an exact counterpart
for copy (which we have lived with for 27 years)...
...In 1984, I missed the chance to protect us against copy and we have lived with
the problems ever since. I should have instituted some rule along the lines “if
a class has a destructor, no copy operations are generated” or “if a class has a
pointer member, no copy operations are generated.”...
It's impossible to recall this numbers without shivering. :-)
We're following w3's "Bug
9069 - Function to invoke an XSLT transformation".
There, people argue about xpath API to invoke xslt transformations. Function should
look roughly like this:
transform
(
$node-tree as node()?,
$stylesheet as item(),
$parameters as XXX
) as node()
The discussion is spinning around the last argument: $parameters as
XXX . Should it be an xml element describing parameters, a function returning values for parameter names, or some new type modelling immutable
map?
What is most interesting in this discussion is the leak about plans to introduce
a map type:
Comment 7 Michael Kay, 2010-09-14 22:46:58 UTC
We're currently talking about adding an immutable map to XSLT as a new data
type (the put operation would return a new map). There appear to be a number of
possible efficient implementations. It would be ideally suited for this purpose,
because unlike the mechanism used for serialization parameters, the values can be
any data type (including nodes), not only strings.
There is a hope that map will finally appear in xslt!
See also:
Bug 5630
- [DM] Tuples and maps,
Tuples and maps - Status: CLOSED, WONTFIX,
Map, based on immutable trees,
Maps in exslt2?
Historically
jxom was developed first, and as such exhibited some imperfectness in its
xml schema.
csharpxom has taken into an account jxom's problems.
Unfortunately we could not easily fix jxom as a great amount of code already
uses it. In this refactoring we tried to be conservative, and have changed only
"type" and "import" xml schema elements in java.xsd.
Consider type reference and package import constructs in the old schema:
<!-- import java.util.ArrayList; -->
<import name="java.util.ArrayList"/>
<!-- java.util.ArrayList<java.math.BigDecimal> -->
<type package="java.util">
<part name="ArrayList">
<argument>
<type name="BigDecimal" package="java.math">
</argument>
</part>
</type>
<!-- my.Parent.Nested -->
<type package="my">
<part name="Parent"/>
<part name="Nested"/>
<type>
Here we can observe that:
- type is referred by a qualified name in import element;
- type has two forms: simple (see BigDecimal), and other for nested or generic
type (see ArrayList).
We have made it more consistent in the updated jxom:
<!-- import java.util.ArrayList; -->
<import>
<type name="ArrayList" package="java.util"/>
</import>
<!-- java.util.ArrayList<java.math.BigDecimal> -->
<type name="ArrayList" package="java.util">
<argument>
<type name="BigDecimal" package="java.math">
</argument>
</type>
<!-- my.Parent.Nested -->
<type name="Nested">
<type name="Parent" package="my"/>
<type>
We hope that you will not be impacted very much by this fix.
Please refresh Languages XOM from
languages-xom.zip.
P.S. we have also included xml schema and xslt api to generate ASPX (see
Xslt serializer for ASPX output). We, in fact, in our projects, generate aspx documents with
embedded csharpxom, and then pass it through two stage transformation.
In the
previous post we have announced an
API to parse a COBOL source into the cobolxom.
We exploited the
incremental parser to build a grammar xml tree and then were planning to
create an xslt transformation to generate
cobolxom.
Now, we would like to declare that such xslt is ready.
At present all standard COBOL constructs are supported, but more tests
are required. Preprocessor support is still in the todo list.
You may peek into an examples of
COBOL:
Cobol grammar:
And
cobolxom:
While we were building a grammar to cobolxom stylesheet we asked ourselves
whether the COBOL parsing could be done entirely in xslt. The answer is yes, so
who knows it might be that we shall turn this task into pure xslt one. :-)
Recently we've seen a code like this:
<xsl:variable name="a" as="element()?" select="..."/>
<xsl:variable name="b" as="element()?" select="..."/>
<xsl:apply-templates select="$a">
<xsl:with-param name="b" tunnel="yes" as="element()" select="$b"/>
</xsl:apply-templates>
It fails with an error:
"An empty sequence is not allowed as the value of parameter $b".
What is interesting is that the value of $a is an empty sequence,
so the code could potentially work, provided processor evaluated $a first,
and decided not to evaluate xsl:with-param.
Whether the order of evaluation of @select and xsl:with-param is specified
by the standard or it's an implementation defined?
We asked this question on
xslt forum, and got the following answer:
The specification leaves this implementation-defined. Since the values
of the parameters are the same for every node processed, it's a
reasonably strategy for the processor to evaluate the parameters before
knowing how many selected nodes there are, though I guess an even better
strategy would be to do it lazily when the first selected node is found.
Well, that's an expected answer. This question will probably induce Michael Kay
to introduce a small optimization into the Saxon.
Once ago we have created an
incremental parser, and now when we have decided to load COBOL sources
directly into
cobolxom (XML Object Model for a COBOL) the parser did the job perfectly.
The good point about incremental parser is that it easily handles COBOL's
grammar.
The whole process looks like this:
- incremental parser having a COBOL grammar builds a grammar tree;
- we stream this tree into xml;
- xslt to transform xml from previous step into
cobolxom (TODO).
This is an example of a COBOL:
IDENTIFICATION DIVISION.
PROGRAM-ID. FACTORIAL RECURSIVE.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 NUMB PIC 9(4) VALUE IS 5.
01 FACT PIC 9(8) VALUE IS 0.
LOCAL-STORAGE SECTION.
01 NUM PIC 9(4).
PROCEDURE DIVISION.
MOVE 'X' TO XXX
MOVE NUMB TO NUM
IF NUMB = 0 THEN
MOVE 1 TO FACT
ELSE
SUBTRACT 1 FROM NUMB
CALL 'FACTORIAL'
MULTIPLY NUM BY FACT
END-IF
DISPLAY NUM '! = ' FACT
GOBACK.
END PROGRAM FACTORIAL.
And a grammar tree:
<Program>
<Name data="FACTORIAL"/>
<Recursive/>
<DataDivision>
<WorkingStorageSection>
<Data>
<Level data="01"/>
<Name data="NUMB"/>
<Picture data="9(4)"/>
<Value>
<Numeric data="5"/>
</Value>
</Data>
<Data>
<Level data="01"/>
<Name data="FACT"/>
<Picture data="9(8)"/>
<Value>
<Numeric data="0"/>
</Value>
</Data>
</WorkingStorageSection>
<LocalStorageSection>
<Data>
<Level data="01"/>
<Name data="NUM"/>
<Picture data="9(4)"/>
</Data>
</LocalStorageSection>
</DataDivision>
<ProcedureDivision>
<Sentence>
<MoveStatement>
<From>
<String data="'X'"/>
</From>
<To>
<Identifier>
<DataName data="XXX"/>
</Identifier>
</To>
</MoveStatement>
<MoveStatement>
<From>
<Identifier>
<DataName data="NUMB"/>
</Identifier>
</From>
<To>
<Identifier>
<DataName data="NUM"/>
</Identifier>
</To>
</MoveStatement>
<IfStatement>
<Condition>
<Relation>
<Identifier>
<DataName data="NUMB"/>
</Identifier>
<Equal/>
<Numeric data="0"/>
</Relation>
</Condition>
<Then>
<MoveStatement>
<From>
<Numeric data="1"/>
</From>
<To>
<Identifier>
<DataName data="FACT"/>
</Identifier>
</To>
</MoveStatement>
</Then>
<Else>
<SubtractStatement>
<Value>
<Numeric data="1"/>
</Value>
<From>
<Identifier>
<DataName data="NUMB"/>
</Identifier>
</From>
</SubtractStatement>
<CallStatement>
<Name>
<String data="'FACTORIAL'"/>
</Name>
</CallStatement>
<MultiplyStatement>
<Value>
<Identifier>
<DataName data="NUM"/>
</Identifier>
</Value>
<By>
<Identifier>
<DataName data="FACT"/>
</Identifier>
</By>
</MultiplyStatement>
</Else>
</IfStatement>
<DisplayStatement>
<Values>
<Identifier>
<DataName data="NUM"/>
</Identifier>
<String data="'! = '"/>
<Identifier>
<DataName data="FACT"/>
</Identifier>
</Values>
</DisplayStatement>
<GobackStatement/>
</Sentence>
</ProcedureDivision>
<EndName data="FACTORIAL"/>
</Program>
The last step is to transform tree into cobolxom is in the TODO list.
We have commited COBOL grammar in the same place at
SourceForge as it was with XQuery grammar. Solution is now under the VS
2010.
Suppose you have a timestamp string, and want to check whether it fits to one of the
following formats with leading and trailing spaces:
- YYYY-MM-DD-HH.MM.SS.NNNNNN
- YYYY-MM-DD-HH.MM.SS
- YYYY-MM-DD
We decided to use regex and its capture groups to extract timestamp parts. This
left us with only solution: xsl:analyze-string instruction. It took
a couple more minutes to reach a final solution:
<xsl:variable name="parts" as="xs:string*">
<xsl:analyze-string select="$value"
regex="
^\s*(\d\d\d\d)-(\d\d)-(\d\d)
(-(\d\d)\.(\d\d)\.(\d\d)(\.(\d\d\d\d\d\d))?)?\s*$"
flags="x">
<xsl:matching-substring>
<xsl:sequence select="regex-group(1)"/>
<xsl:sequence select="regex-group(2)"/>
<xsl:sequence select="regex-group(3)"/>
<xsl:sequence select="regex-group(5)"/>
<xsl:sequence select="regex-group(6)"/>
<xsl:sequence select="regex-group(7)"/>
<xsl:sequence select="regex-group(9)"/>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:variable>
<xsl:choose>
<xsl:when test="exists($parts)">
...
</xsl:when>
<xsl:otherwise>
...
</xsl:otherwise>
</xsl:choose>
How would you solve the problem? Is it the best solution?
One of our latest tasks was a conversion of data received from mainframe as an EBCDIC flat file into an XML file in UTF-8 encoding for further processing.
The solution was rather straightforward:
- read the source flat file, record-by-record;
- serialize each record as an element into target XML file using JAXB.
For reading data from EBCDIC encoded flat file, a good old tool named eXperanto was used. It allows to define C# and/or Java classes that suit for records in the source flat file. Thus we were able to read and convert records from EBCDIC to UTF-8.
The next sub-task was to serialize a Java bean to an XML element. JAXB marshaller was used for this.
Everything was ok, until we had started to test the implementation on real data.
We've realized that some decimal values (BigDecimal fields in Java classes) were serialized in scientific exponential notation. For example: 0.000000365 was serialized as 3.65E-7 and so on.
On the other hand, the target XML was used by another (non Java) application, which expected to receive decimal data, as it was defined in XSD schema (the field types were specified as xs:decimal ).
According with W3C datatypes specification:
"...decimal has a lexical representation consisting of a finite-length sequence of decimal digits (#x30-#x39) separated by a period as a decimal indicator. An optional leading sign is allowed. If the sign is omitted, "+" is assumed. Leading and trailing zeroes are optional. If the fractional part is zero, the period and following zero(es) can be omitted. For example: -1.23, 12678967.543233, 100000.00, 210..."
So, the result was predictable, the consumer application fails.
Google search reveals that we deal with a well-known bug: "JAXB marshaller returns BigDecimal with scientific notation in JDK 6". It remains open already an year and half since May 2009, marked as "Fix in progress". We've tested our application with Java version 1.6.0_21-b07, JAXB 2.1.
Although this is rather critical bug that may affect on interoperability of Java applications (e.g. Java web services etc.), its priority was set just as "4-Low".
P.S. as a temporary workaround for this case only(!) we've replaced xs:decimal on xs:double in XSD schema for the target application.
Accidentally we have found that implementation of String and StringBuilder
have been considerably revised, while public interface has remained the
same.
public sealed class String
{
private int m_arrayLength;
private int m_stringLength;
private char
m_firstChar;
}
This layout is dated to .NET 1.0.
VM, in fact, allocates more memory than that defined in C# class, as
&m_firstChar refers to an inline char buffer.
This way string's buffer length and string's length were two different
values, thus StringBuilder used this fact and stored its content in a private string
which it modified in place.
In .NET 4, string is different:
public sealed class String
{
private int m_stringLength;
private char
m_firstChar;
}
Memory footprint of such structure is smaller, but string's length should
always be the same as its buffer. In fact layout of string is now the same as
layout of char[] .
This modification leads to implementation redesign of the StringBuilder .
Earlier, StringBuilder looked like the following:
public sealed class StringBuilder
{
internal IntPtr m_currentThread;
internal int m_MaxCapacity;
internal volatile
string m_StringValue;
}
Notice that m_StringValue is used as a storage, and
m_currentThread is used to preserve thread affinity of the internal
string value.
Now, guys at Microsoft have decided to implement StringBuilder very differently:
public sealed class StringBuilder
{
internal int m_MaxCapacity;
internal int m_ChunkLength;
internal int m_ChunkOffset;
internal char[] m_ChunkChars;
internal StringBuilder m_ChunkPrevious;
}
Inspection of this layout immediately reveals implementation technique. It's a
list of chunks. Instance itself references the last chunk (most recently
appended), and the previous chunks.
Characteristics of this design are:
- while
Length is small, performance almost the same as it was earlier;
- there are no more thread affinity checks;
Append() , and ToString() works as fast a in the old version.
Insert() in the middle works faster, as only a chuck should be splitted and
probably reallocated (copied), instead of the whole string;
- Random access is fast at the end O(1) and slows when you approaching the start
O(chunk-count).
Personally, we would select a slightly different design:
public sealed class StringBuilder
{
private struct Chunk
{
public int length; // Chunk length.
public int offset; // Chunk offset.
public char[] buffer;
}
private int m_MaxCapacity;
// Alternatively, one can use
// private List<Chunk> chunks;
private int chunkCount; // Number of used chunks.
private Chunk[] chunks; // Array of chunks except last.
private Chunk last; // Last chunk.
private bool nonHomogenous; // false if all chunks are of the same size.
}
This design has better memory footprint, and random access time is O(1) when there were no
inserts in the middle (nonHomogenous=false ), and
O(log(chunkCount)) after such inserts. All other characteristics are the
same.
Earlier, there was a hype about how good VS 2010 is.
When we tried the beta and found that it's noticeably slower than VS 2008, we assumed that release will do better.
Unfortunately, that was an optimistic assumption.
Comparing VS 2008 and VS 2010 we can confirm that later:
- eats more memory;
- exhibits slower experience with C# projects (often hangs for a long periods and even crushes);
- incapable to work with xslt 2.0 files;
- has removed Shift+Enter key stroke to insert
<br/> in html editor (why?);
- has removed visualizer of the StringBuilder (in debugger).
Are we using too outdated hardware (laptops Lenovo T60 2GHz Core Duo/2GB RAM)? Other reason?
We have updated C# XOM (csharpxom) to support C# 4.0 (in fact there are very few
changes).
From the grammar perspective this includes:
- Dynamic types;
- Named and optional arguments;
- Covariance and contravariance of generic parameters for interfaces and
delegates.
Dynamic type, C#:
dynamic dyn = 1;
C# XOM:
<var name="dyn">
<type name="dynamic"/>
<initialize>
<int value="1"/>
</initialize>
</var>
Named and Optional Arguments, C#:
int Increment(int value, int increment = 1)
{
return value + increment;
}
void
Test()
{
// Regular call.
Increment(7, 1);
// Call with named parameter.
Increment(value: 7, increment: 1);
// Call with default.
Increment(7);
}
C# XOM:
<method name="Increment">
<returns>
<type name="int"/>
</returns>
<parameters>
<parameter name="value">
<type name="int"/>
</parameter>
<parameter
name="increment">
<type name="int"/>
<initialize>
<int value="1"/>
</initialize>
</parameter>
</parameters>
<block>
<return>
<add>
<var-ref name="value"/>
<var-ref name="increment"/>
</add>
</return>
</block>
</method>
<method
name="Test">
<block>
<expression>
<comment>Regular call.</comment>
<invoke>
<method-ref name="Increment"/>
<arguments>
<int value="7"/>
<int value="1"/>
</arguments>
</invoke>
</expression>
<expression>
<comment>Call with named
parameter.</comment>
<invoke>
<method-ref name="Increment"/>
<arguments>
<argument name="value">
<int value="7"/>
</argument>
<argument name="increment">
<int value="1"/>
</argument>
</arguments>
</invoke>
</expression>
<expression>
<comment>Call with default.</comment>
<invoke>
<method-ref name="Increment"/>
<arguments>
<int value="7"/>
</arguments>
</invoke>
</expression>
</block>
</method>
Covariance and contravariance, C#:
public interface Variance<in T, out P, Q>
{
P X(T
t);
}
C# XOM:
<interface access="public" name="Variance">
<type-parameters>
<type-parameter
name="T" variance="in"/>
<type-parameter name="P" variance="out"/>
<type-parameter name="Q"/>
</type-parameters>
<method name="X">
<returns>
<type name="P"/>
</returns>
<parameters>
<parameter name="t">
<type name="T"/>
</parameter>
</parameters>
</method>
</interface>
Other cosmetic fixes were also introduced into Java XOM (jxom), COBOL XOM
(cobolxom), and into sql XOM (sqlxom).
The new version is found at
languages-xom.zip.
See also: What's
New in Visual C# 2010
We have run into another xslt bug, which depends on several independent
circumstances and often behaves differently being observed. That's clearly a
Heisenbug.
Xslt designers failed to realize that a syntactic suggar they introduce into
xpath can turn into obscure bugs. Well, it's easy to be wise afterwards...
To the point.
Consider you have a sequence consisting of text nodes and
elements, and now you want to "normalize" this sequence wrapping
adjacent text nodes into
separate elements. The following stylesheet is supposed to do the work:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:t="http://www.nesterovsky-bros.com/xslt/this"
exclude-result-prefixes="xs t">
<xsl:template match="/">
<xsl:variable
name="nodes" as="node()*">
<xsl:text>Hello, </xsl:text>
<string value="World"/>
<xsl:text>! </xsl:text>
<xsl:text>Well, </xsl:text>
<string value="hello"/>
<xsl:text>, if not joking!</xsl:text>
</xsl:variable>
<result>
<xsl:sequence
select="t:normalize($nodes)"/>
</result>
</xsl:template>
<xsl:function
name="t:normalize" as="node()*">
<xsl:param name="nodes" as="node()*"/>
<xsl:for-each-group select="$nodes" group-starting-with="*">
<xsl:variable
name="string" as="element()?" select="self::string"/>
<xsl:variable name="texts"
as="node()*"
select="current-group() except $string"/>
<xsl:sequence
select="$string"/>
<xsl:if test="exists($texts)">
<string
value="{string-join($texts, '')}"/>
</xsl:if>
</xsl:for-each-group>
</xsl:function>
</xsl:stylesheet>
We're expecting the following output:
<result>
<string value="Hello, "/>
<string value="World"/>
<string value="! Well, "/>
<string value="hello"/>
<string value=", if not joking!"/>
</result>
But often we're getting other results, like:
<result>
<string value="Hello, "/>
<string value="World"/>
<string value="Well, ! "/>
<string value="hello"/>
<string value=", if not joking!"/>
</result>
Such output may seriously confuse, unless you will recall the rule for the
xpath except operator:
The except operator takes two node sequences as operands and returns a sequence containing all the nodes that occur in the first operand but not in the second operand.
... these operators eliminate duplicate nodes from their result sequences based
on node identity. The resulting sequence is returned in document order..
...
The relative order of nodes in distinct trees is stable but implementation-dependent
These words mean that result sequence may be very different from original
sequence.
In contrast, if we change $text definition to:
<xsl:variable name="texts"
as="node()*"
select="current-group()[not(. is $string)]"/>
then the result becomes stable, but less clear.
See also
Xslt Heisenbug
It does not matter that DataBindExtender looks not usual in the ASP.NET. It turns to be so handy that built-in data binding is not considered to be an option.
After a short try, you uderstand that people tried very hard and have invented many controls and methods like ObjectDataSource, FormView, Eval(), and Bind() with outcome, which is very specific and limited.
In contrast DataBindExtender performs:
- Two or one way data binding of any business data property to any control property;
- Converts value before it's passed to the control, or into the business data;
- Validates the value.
See an example:
<asp:TextBox id=Field8 EnableViewState="false" runat="server"></asp:TextBox> <bphx:DataBindExtender runat='server' EnableViewState='false' TargetControlID='Field8' ControlProperty='Text' DataSource='<%# Import.ClearingMemberFirm %>' DataMember='Id' Converter='<%# Converters.AsString("XXXXX", false) %>' Validator='<%# (extender, value) => Functions.CheckID(value as string) %>'/>
Here, we beside a regualar two way data binding of a property Import.ClearingMemberFirm.Id to a property Field8.Text , format (parse) Converters.AsString("XXXXX", false) , and finally validate an input value with a lambda function (extender, value) => Functions.CheckID(value as string) .
DataBindExtender works also well in template controls like asp:Repeater, asp:GridView, and so on. Having your business data available, you may reduce a size of the ViewState with EnableViewState='false' . This way DataBindExtender approaches page development to a pattern called MVC.
Recently, we have found that it's also useful to have a way to run a javascript during the page load (e.g. you want to attach some client side event, or register a component). DataBindExtender provides this with OnClientInit property, which is a javascript to run on a client, where this refers to a DOM element:
... OnClientInit='$addHandler(this, "change", function() { handleEvent(event, "Field8"); } );'/>
allows us to attach onchange javascript event to the asp:TextBox .
So, meantime we're very satisfied with what we can achieve with DataBindExtender. It's more than JSF allows, and much more stronger and neater to what ASP.NET has provided.
The sources can be found at DataBindExtender.cs
Lately, we have found that we've accustomed to declare C#'s local variables using var :
var exitStateName = exitState == null ? "" : exitState.Name;
var rules = Environment.NavigationRules;
var rule = rules[caller.Name];
var flow = rule.NavigationCases[procedure.OriginExitState];
This makes code cleaner, and in presense of good IDE still allows to figure out
types very easely.
We, howerer, found that var tends to have exceptions in its
uses. E.g. for some reason most of boolean locals in our code tend to remain explicit
(matter of taste?):
bool succeed = false;
try
{
...
succeed = true;
}
finally
{
if (!succeed)
{
...
}
}
Also, type often survives in for , but not in foreach :
for(int i = 0; i < sourceDataMapping.Length;
++i)
{
...
}
foreach(var property in properties)
{
...
}
In addition var has some limitations, as one cannot easily
initialize such local with null. From the following we prefer the first approach:
IWindowContext context = null;
var context = (IWindowContext)null;
var context = null as IWindowContext;
var context = default(IWindowContext);
We might need to figure out a consistent code style as for var . It
might be like that:
- Numeric, booleans and string locals should use explicit type;
- Try to avoid locals initialized with null, or without initializer, or use type
if such variable cannot be avoided;
- Use var in all other cases.
Another code style could be like that:
- For the consistency, completely avoid the use of keyword
var .
|