Today, I've tried to upgrade our projects to Saxon 9.2. We have a rather big set
of stylesheets grinding gigabytes of information. It's obvious that we
expected at least the same performance from the new version.
But to my puzzlement a pipeline of transformations failed almost immediately
with en error message:
XPTY0018: Cannot mix nodes and atomic values in the result of a path expression
We do agree with this statement in general, but what it had in common with our
stylesheets? And how everything was working in 9.1?
To find the root of the problem I've created a minimal problem reproduction:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:t="this"
exclude-result-prefixes="xs t">
<!-- Entry point. -->
<xsl:template match="/">
<xsl:variable name="p" as="element()">
<p l="1"/>
</xsl:variable>
<o l="{$p/t:d(.)}"/>
</xsl:template>
<xsl:function name="t:d" as="item()*">
<xsl:param name="p" as="element()"/>
<xsl:apply-templates mode="p" select="$p"/>
</xsl:function>
<xsl:template match="*" mode="p">
<xsl:sequence select="concat('0', @l)"/>
</xsl:template>
</xsl:stylesheet>
Really simple, isn't it? The problem is in a new optimization of
concat() function, introduced in version 9.2. It tries to eliminate
string concatenation, and in certain cases emits its arguments directly into the
output as text nodes, separating whole output with some stopper strings. The
only problem is that no such optimization is allowed in this particular case
(which is rather common, and surely legal, in our stylesheets); result of
<xsl:template match="p" mode="p"> should not be a node, but of type
xs:string .
Saxon 9.2 is here already for 3 month, at lest! Thus, how come that such
a bug was not
discovered earlier?
Update: the fix is commited into the svn on the next day. That's promptly!
We've added a new language to the set of Xml Object Model schemas and stylesheets.
The newcomer is COBOL! No jokes. It's not a whim, really. Believe it or
not but COBOL is still alive and we need to generate it (mainly different sorts of
proxies).
We've used VS COBOL II grammar Version 1.0.3 as a reference. Implemented grammar
is complete but without preprocessor statements. On the other hand it defines COPY and EXEC SQL constructs.
Definitely, it'll take a time for the xml schema and xslt implementation to
become mature.
Now language XOM is:
- jxom - for java;
- csharpxom - for C#;
- cobolxom - for COBOL.
Sources can be found at
languages-xom.
Given:
- an xml defining elements and groups;
- each element belongs to a group or groups;
- group may belong to another group.
Find:
- groups, a given element directly or inderectly belongs to;
- a function checking whether an element belongs to a group.
Example:
<groups>
<group name="g1">
<element ref="e1"/>
<element ref="e2"/>
<element
ref="e3"/>
<group ref="g2"/>
</group>
<group name="g2">
<element ref="e5"/>
</group>
<group name="g3">
<element ref="e1"/>
<element ref="e4"/>
</group>
</groups>
There are several solutions depending on aggresiveness of optimization. A
moderate one is done through the xsl:key. All this reminds recursive common
table expressions in SQL.
Anyone?
In spite of the fact that our last projects are being developed in Java, the .NET is definitly our favorite platform.
In a twitter I saw the phrase: "Java the language is a stagnant mess". It's said in favour of C#. It's true that C# significantly affects now even on Java (let's remember generics, jaxb, web services, etc.), but in my opinion, the C# won't be the leading language for worldwide enterprise applications in the nearest future.
One of causes is that the main platform for .NET still is Windows. The situation could be changed by Mono project, but I think there are yet not enough projects on platforms other than Windows.
My guess is confirmed by some of observations that I did as a software engineer of an IT company. Our company performs different software porting projects from legacy programming languages like COBOL, ADSO, Natural etc. into up to date languages like Java, C# etc. It worth to say that clients rarely select to migrate to .NET despite to our advices.
The main reason of such choice, according to most of our clients, is that they want to be platform independent and only Java gives them this choice.
It worth for Microsoft to think about cooperation with Mono in order to make .NET really platform indpendent, otherwise C# will always be a step behind Java despite apparent advantages of C# as a programming language.
A client asked us to produce Excel reports in ASP.NET
application. They've given an Excel templates, and also defined what they want to show.
What are our options?
- Work with Office COM API;
- Use Office Open XML SDK (which is a set of pure .NET
API);
- Try to apply xslt somehow;
- Macro, other?
For us, biased to xslt, it's hard to make a fair choice. To judge, we've
tried formalize client's request and to look into future support.
So, we have defined sql stored procedures to provide the data. This way data can be
represented either as ADO.NET DataSet, a set of classes, as xml, or in other reasonable format. We do not
predict any considerable problem with data representation if client will decide
to modify reports in future.
It's not so easy when we think about Excel generation.
Due to ignorance we've thought that Excel is much like xslt in some regard, and
that it's possible to provide a tabular data in some form and create Excel
template, which will consume the data to form a final output. To some extent
it's possible, indeed, but you should start creating macro or vb scripts to
achieve acceptable results.
When we've mentioned macroses to the client, they immediately stated that
such a solution won't work due to security reasons.
Comparing COM API and Open XML SDK we can see that both provide almost the same
level of service for us, except that the later is much more lighter and supports only Open XML format, and the earlier is a heavy
API exposing MS Office and supports earlier versions also.
Both solutions have a considerable drawback: it's not easy to create Excel
report in C#, and it will be a pain to support such solution if client will ask,
say in half a year, to modify something in Excel template or to create one more
report.
Thus we've approached to xslt. There we've found two more directions:
- generate data for Office Open XML;
- generate xml in format of MS Office 2003.
It's turned out that it's rather untrivial task to generate data for Open XML,
and it's not due to the format, which is not xml at all but a zipped folder
containing xmls. The problem is in the complex schemas and in many complex
relations between files constituting Open XML document. In contrast, MS
Office 2003 format allows us to create a single xml file for the spreadsheet.
Selecting between standard and up to date format, and older proprietary one, the
later looks more attractive for the development and support.
At present we're at position to use xslt and to generate files in MS Office
2003 format. Are there better options?
Did you ever hear that double numbers may cause roundings, and that
many financial institutions are very sensitive to those roundings?
Sure you did! We're also aware of this kind of problem, and we thought we've
taken care of it. But things are not that simple, as you're not always
know what an impact the problem can have.
To understand the context it's enough to say that we're converting (using xslt by the way) programs
written in a CASE tool called
Cool:GEN into java and into C#. Originally, Cool:GEN generated COBOL and C
programs as deliverables. Formally, clients compare COBOL results vs java or C#
results, and they want them to be as close as possible.
For one particular client it was crucial to have correct results during
manipulation with numbers with 20-25 digits in total, and with 10 digits after a decimal point.
Clients are definitely right, and we've introduced generation options to control
how to represent numbers in java and C# worlds; either as double or
BigDecimal (in java), and decimal (in C#).
That was our first implementation. Reasonable and clean. Was it enough? - Not at
all!
Client's reported that java's results (they use java and BigDecimal
for every number with decimal point) are too precise, comparing to Mainframe's
(MF) COBOL. This rather unusuall complain puzzles a litle, but client's
confirmed that they want no more precise results than those MF produces.
The reason of the difference was in that that both C# and especially java may
store much more decimal digits than is defined for the particualar result on MF.
So, whenever you define a field storing 5 digits after decimal point, you're
sure that exactly 5 digits will be stored. This contrasts very much with results
we had in java and C#, as both multiplication and division can produce many more
digits after the decimal point. The solution was to truncate(!) (not to round) the
numbers to the specific precision in property setters.
So, has it resolved the problem? - No, still not!
Client's reported that now results much more better (coincide with MF, in fact)
but still there are several instances when they observe differences in 9th and
10th digits after a decimal point, and again java's result are more accurate.
No astonishment this time from us but analisys of the reason of the difference.
It's turned out that previous solution is partial. We're doing a final truncation
but still there were intermediate results like in a/(b * c) , or in a * (b/c) .
For the intermediate results MF's COBOL has its, rather untrivial, formulas (and
options) per each operation defining the number of digits to keep after a
decimal point. After we've added similar options into the generator, several
truncations've manifested in the code to adjust intermediate results. This way
we've reached the same accurateness as MF has.
What have we learned (reiterated)?
- A simple problems may have far reaching impact.
- More precise is not always better. Client often prefers compatible rather than
more accurate results.
Recently we were visiting Ukraine, the capital city, and a
town we've come from.
Today's Ukraine makes a twofold impression.
On the one hand it's a childhood places and relatives, an enormous
pleasure of meeting university and school friends, a good surprise of meeting
university chancellor who was already hoary with age when we were studying.
On the other hand it's already a very different country from what the memory
draws. I must be wrong but my impression was that it's a country of traders and
endless political battles.
It's neither bad nor good but a point of history. Unfortunately we cannot think
ourselves now living in Ukraine.
On the question where is our home now, we have the only answer it's in Israel.
For some reason C# lacks a decimal truncation function
limiting result to a specified number of digits after a decimal point. Don't
know what's the reasoning behind, but it stimulates the thoughts. Internet
is plentiful with workarounds. A tipical answer is like this:
Math.Truncate(2.22977777 * 1000) / 1000; // Returns 2.229
So, we also want to provide our solution to this problem.
public static decimal Truncate(decimal value,
byte decimals)
{
decimal result = decimal.Round(value, decimals);
int c = decimal.Compare(value, result);
bool negative = decimal.Compare(value, 0) < 0;
if (negative ? c <= 0 : c >= 0)
{
return result;
}
return result - new decimal(1, 0, 0, negative, decimals);
}
Definitely, if the function were implemented by the framework it were much more efficient. We assume, however, that above's the best implementation that can be done externally.
A natural curiosity led us to the implementation of connection
pooling in Apache Tomcat (org.apache.commons.dbcp).
And what're results do you ask?
Uneasiness... Uneasiness for all those who use it. Uneasiness due to the
difference between our expectations and real implementation.
Briefly the design is following:
- wrap every jdbc object;
- cache prepared statements wrappers;
- lookup prepared statement wrappers in the cache before
asking original driver;
- upon close return wrappers into the cache.
It took us a couple of minutes to see that this is very problematic design, as
it does not address double close of statements properly (jdbc states that is
safe to call close() over closed jdbc object). With Apache's design it's safe
not to touch the object after the close() call, as it returned to the pool and
possibly already given to an other client who requested it.
The correct design would be:
- wrap every jdbc object;
- cache original prepared statements;
- lookup original prepared statement in the cache before asking original
driver, and return wrappers;
- detach wrapper upon close from original object, and put original object
into the cache.
A bit later. We've found a confirmation of our doubts on Apache site: see "JNDI Datasource HOW-TO
", chapter "Common Problems".
In a twitter I've found a conversation:
michaelhkay: @fgeorges [XSLT 2.1 - Still nothing public?] Afraid not. Why don't you join the WG and help to speed things up?
I think it's a tendency.
WGs are a very different world. There people think of eternity... There a
pace of time is less important than final words. But developers who're busy
with their projects, and have no enough spare time to help to WGs, cannot wait
for years for good specs and implementations. On the other hand good designers
succeed dealing with existing technologies.
Looking into the future I see different perspectives regarding WGs. The one I
think the most is that eventually WG'll run out of its enthusiasm, which I
suspect happens after the second generation of members, and a technology'll go
either to a museum, become a legacy but sill used one, or hopefully to a
university community.
At present I'm calm about Xslt/XQuery WGs, as they're only approaching to
the second generation. My fears're about C++ WG, which is on its third decade...
It's not a secret that we don't like JSF (something is very
wrong with whole its design), however we have no choice but to work with it. But
at times to lift hands up is only wish we have working with it.
The last pearl is with check box control:
selectBooleanCheckbox. It turns out that when you disable the control on a
client and assume that its value won't be databound on a server, you're wrong.
Browser does not send the value as you would expect, but JSF (reference
implementation at least) works like this:
private static String isChecked(String value)
{
return Boolean.toString("on".equalsIgnoreCase(value)
|| "yes".equalsIgnoreCase(value)
|| "true".equalsIgnoreCase(value));
}
where value is null , which means that JSF thinks checkbox is unchecked.
Our experience with facelets shows that when you're designing
a composition components you often want to add a level of customization. E.g.
generate
element with or without id, or define class/style if value is specified.
Consider for simplicity that you want to encapsulate a check box and pass
several attributes to it. The first version that you will probably think of is something like
this:
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:ui="http://java.sun.com/jsf/facelets"
xmlns:c="http://java.sun.com/jstl/core"
xmlns:h="http://java.sun.com/jsf/html"
xmlns:ex="http://www.nesterovsky-bros.com/jsf">
<body>
<!--
Attributes:
id - an optional id;
value - a data binding;
class - an optional element class;
style - an optional element inline style;
onclick - an optional script event handler for onclick event;
onchange - an optional script event handler for onchange event.
-->
<ui:component>
<h:selectBooleanCheckbox
id="#{id}"
value="#{value}"
style="#{style}"
class="#{class}"
onchange="#{onchange}"
onclick="#{onclick}"/>
</ui:component>
</body>
</html>
Be sure, this is not what you have expected. Output will contain all mentioned
attributes, even those, which weren't passed into a component (they will have empty
values). More than that, if you will omit "id", you will get an error like: "emtpy
string is not valid id".
The reason is in the EL! Attributes used in
this example are of type String , thus result of evaluation of value expression is coersed to String .
Values of attributes that weren't passed in are evaluated to null . EL returns ""
while coersing null to String . The interesting thing
is that, if EL were not changing null then those omitted attributes would not appear in the output.
The second attept would probably be:
<h:selectBooleanCheckbox value="#{value}">
<c:if test="#{!empty id}">
<f:attribute name="id" value="#{id}"/>
</c:if>
<c:if test="#{!empty onclick}">
<f:attribute name="onclick" value="#{onclick}"/>
</c:if>
<c:if test="#{!empty onchange}">
<f:attribute name="onchange" value="#{onchange}"/>
</c:if>
<c:if test="#{!empty class}">
<f:attribute name="class" value="#{class}"/>
</c:if>
<c:if test="#{!empty style}">
<f:attribute name="style" value="#{style}"/>
</c:if>
</h:selectBooleanCheckbox>
Be sure, this won't work either (it may work but not as you would expect). Instruction c:if
is evaluated on the stage of the building of a component tree, and not on the
rendering stage.
To workaround the problem you should prevent null to "" conversion in the EL.
That's, in fact, rather trivial to achieve: value expression should evaluate to
an object different from String , whose toString() method returns a required
value.
The final component may look like this:
<h:selectBooleanCheckbox
id="#{ex:object(id)}"
value="#{value}"
style="#{ex:object(style)}"
class="#{ex:object(class)}"
onchange="#{ex:object(onchange)}"
onclick="#{ex:object(onclick)}"/>
where ex:object() is a function defined like this:
public static Object object(final Object value)
{
return new Object()
{
public String toString()
{
return value == null ? null : value.toString();
}
}
}
A bit later: not everything works as we expected. Such approach doesn't work with the validator attribute, whereas it works with converter attribute. The difference between them is that the first attribute should be MethodExpression value, when the second one is ValueExpression value. Again, we suffer from ugly JSF implementation of UOutput component.
Suppose you have a library, which is out in the production and is used by many clients. At the same time the library evolves: API is extended, bugs are being fixed, code becomes faster and cleaner, bla bla bla...
At some point you're fixing some important bug that's been hiding for a long time in the bowels of your library. You're happy that you've spotted it before clients got into troubles. You're notifying all the clients that there is an important fix, and that they need to update the library.
What do you think you hear in return?
Well, we're not perfect, there are bugs in our software. We and our clients realize this. Nothing will eliminate bugs to creep into a code from time to time.
That's a train of thought of a particular client:
We agree that there is a bug and that it has to be fixed. We, however, want to touch the library in a minimal way, as who knows what other new bugs had they introduced, so let's ask them to fix this particular bug in our version of the library.
That's fair from the client's perspective. They don't want better code, they just want that particular bug fixed!
For us, however, this means branching some old version of the library, fixing bug and supporting this branch for the particular client. It's fair to expect similar position from each client, thus should we create and support library branches per client, and branch a main branch for a new client only?
For us (Arthur and Vladimir) it looks as enormous waste of resources. We (our company) should either hire more and more scaled people or experience gradual slowdown of support and development.
Our answer could be obvious if not position of top managers who value client relations so much that they easily promise whatever client wishes. No arguments that latest version is better tested, more conforming to specifications, more reliable, faster and so on are accepted. The main argument against our position is that the client's applications run in the production, and no new potential bugs are acceptable.
Here is our dilemma: we neither can convince the client (more precisely our managers) that we're right, nor are convinced with their arguments...
Recently we have seen a blog entry: "JSF: IDs and clientIds in Facelets", which provided wrong implementation of the feature.
I'm not sure how useful it is, but here is our approach to the same problem.
In the core is ScopeComponent. Example uses a couple of utility functions defined in Functions. Example itself is found at window.xhtml:
<html xmlns="http://www.w3.org/1999/xhtml"
xmlns:ui="http://java.sun.com/jsf/facelets"
xmlns:c="http://java.sun.com/jstl/core"
xmlns:h="http://java.sun.com/jsf/html"
xmlns:f="http://java.sun.com/jsf/core"
xmlns:fn="http://java.sun.com/jsp/jstl/functions"
xmlns:ex="http://www.nesterovsky-bros.com/jsf">
<body>
<h:form>
<ui:repeat value="#{ex:sequence(5)}">
<f:subview id="scope" binding="#{ex:scope().value}">
#{scope.id}, #{scope.clientId}
</f:subview>
<f:subview id="script" uniqueId="my-script"
binding="#{ex:scope().value}" myValue="#{2 + 2}">
, #{script.id}, #{script.clientId},
#{script.bindings.myValue.expressionString},
#{ex:value(script.bindings.myValue)},
#{script.attributes.myValue}
</f:subview>
<br/>
</ui:repeat>
</h:form>
</body>
</html>
Update: ex:scope() is made to return a simple bean with property "value".
Another useful example:
<f:subview id="group" binding="#{ex:scope().value}">
<h:inputText id="input" value="#{bean.property}"/>
<script type="text/javascript">
var element = document.getElementById('#{group.clientId}:input');
</script>
</f:subview>
In the section about AJAX, JSF 2.0 spec (final draft) talks about partial requests...
This sounds rather strange. My perception was that the AJAX is about partial responses. What a sense to send partial requests? Requests are comparatively small anyway! Besides, a partial request may complicate restoring component tree on the server and made things fragile, but this largely depends on what they mean with these words.
|