We are designing a rather complex xslt 2.0 application, dealing with semistructured
data. We must tolerate with errors during processing, as there are cases where an
input is not perfectly valid (or the program is not designed or ready to get
such an input).
The most typical error is unsatisfied expectation of tree structure like:
<xsl:variable name="element" as="element()" select="some-element"/>
Obviously, dynamic error occurs if a specified element is not present. To
concentrate on primary logic, and to avoid a burden of illegal (unexpected) case
recovery we have created a try/catch API. The goal of such API is:
- to be able to continue processing in case of error;
- report as much as possible useful information related to an error.
Alternatives:
Do not think this is our arrogance, which has turned us to create a custom API. No, we
were looking for alternatives! Please see
[xsl] saxon:try() discussion:
- saxon:try()
function - is a kind of pseudo function, which explicitly relies on lazy
evaluation of its arguments, and ... it's not available in SaxonB;
- ex:error-safe
extension instruction - is far from perfect in its implementation quality, and provides no error location.
We have no other way except to design this feature by ourselves. In our defence one
can say that we are using innovatory approach that encapsulates details of the
implementation behind template and calls handlers indirectly.
Use:
Try/catch API is designed as a template
<xsl:template name="t:try-block"/> calling a "try" handler, and, if
required, a "catch" hanler using
<xsl:apply-templates mode="t:call"/> instruction. Caller passes any
information to these handlers by the means of tunnel parameters.
Handlers must be in a "t:call " mode. The "catch" handler
may recieve following error info parameters:
<xsl:param name="error" as="xs:QName"/>
<xsl:param name="error-description" as="xs:string"/>
<xsl:param name="error-location" as="item()*"/>
where $error-location is a sequence of pairs (location as
xs:string, context as item())* .
A sample:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:t="http://www.nesterovsky-bros.com/xslt/public/"
exclude-result-prefixes="xs t">
<xsl:include href="try-block.xslt"/>
<xsl:template match="/"> <result> <xsl:for-each select="1 to 10">
<xsl:call-template name="t:try-block"> <xsl:with-param name="value" tunnel="yes"
select=". - 5"/> <xsl:with-param name="try" as="element()"> <try/>
</xsl:with-param> <xsl:with-param name="catch" as="element()">
<t:error-handler/> </xsl:with-param> </xsl:call-template> </xsl:for-each>
</result> </xsl:template>
<xsl:template mode="t:call" match="try"> <xsl:param
name="value" tunnel="yes" as="xs:decimal"/>
<value> <xsl:sequence select="1 div
$value"/> </value> </xsl:template>
</xsl:stylesheet>
The sample prints values according to the formula "1/(i - 5)", where "i" is a
variable varying from 1 to 10. Clearly, division by zero occurs when "i" is equal
to 5.
Please notice how to access try/catch API through
<xsl:include href="try-block.xslt"/> . The main logic is
executed in
<xsl:template mode="t:call" match="try"/> , which
recieves parameters using tunneling. A default error handler
<t:error-handler/> is used to report errors.
Error report:
Error: FOAR0001
Description:
Decimal divide by zero
Location:
1. systemID: "file:///D:/style/try-block-test.xslt", line: 34
2. template mode="t:call"
match="element(try, xs:anyType)"
systemID: "file:///D:/style/try-block-test.xslt", line: 30
context node:
/*[1][local-name() = 'try']
3. template mode="t:call"
match="element({http://www.nesterovsky-bros.com/xslt/private/try-block}try, xs:anyType)"
systemID: "file:///D:/style/try-block.xslt", line: 53
context node:
/*[1][local-name() = 'try']
4. systemID: "file:///D:/style/try-block.xslt", line: 40
5. call-template name="t:try-block"
systemID: "file:///D:/style/try-block-test.xslt", line: 17
6. for-each
systemID: "file:///D:/style/try-block-test.xslt", line: 16
context item: 5
7. template mode="saxon:_defaultMode"
match="document-node()"
systemID: "file:///D:/style/try-block-test.xslt", line: 14
context node:
/
Implementation details:
You were not expecting this API to be pure xslt, weren't you?
Well, you're right, there is an extension function. Its pseudo code is like
this:
function tryBlock(tryItems, catchItems)
{
try
{
execute xsl:apply-templates for tryItems.
}
catch
{
execute xsl:apply-templates for catchItems.
}
}
The last thing. Please get the implementation
saxon.extensions.zip. There you will find sources of the try/catch, and
tuples/maps API.
Right now we're inhabiting in the java world, thus all our tasks are (in)directly
related to this environment.
We want to store stylesheets as resources of java application, and at
the same time to point to these stylesheets without jar qualification. In .NET this idea would not
appear at all, as there are well defined boundaries between assemblies, but java uses
rather different approach. Whenever you have a resource name, it's up to
ClassLoader to find this resource. To exploit this feature we've created
an uri resolver for the stylesheet
transformation. The protocol we use has a following format: "resource:/resource-path ".
For example to store stylesheets in the
META-INF/stylesheets folder we use uri "resource:/META-INF/stylesheets/java/main.xslt ".
Relative path is resolved naturally. A path "../jxom/java-serializer.xslt "
in previously mentioned stylesheet is resolved to "resource:/META-INF/stylesheets/jxom/java-serializer.xslt ".
We've created a small class ResourceURIResolver . You need to
supply an instance of TransformerFactory with this resolver:
transformerFactory.setURIResolver(new ResourceURIResolver());
The class itself is so small that we qoute it here:
import java.io.InputStream;
import java.net.URI;
import java.net.URISyntaxException;
import javax.xml.transform.Source;
import javax.xml.transform.TransformerException;
import javax.xml.transform.URIResolver;
import javax.xml.transform.stream.StreamSource;
/**
* This class implements an interface that can be called by the processor
* to turn a URI used in document(), xsl:import, or xsl:include into a
* Source object.
*/
public class ResourceURIResolver implements URIResolver
{
/**
* Called by the processor when it encounters
* an xsl:include, xsl:import, or document() function.
*
* This resolver supports protocol "resource:".
* Format of uri is: "resource:/resource-path", where "resource-path" is
an
* argument of a {@link ClassLoader#getResourceAsStream(String)} call.
* @param href - an href attribute, which may be relative or absolute.
* @param base - a base URI against which the first argument will be
made
* absolute if the absolute URI is required.
* @return a Source object, or null if the href cannot be resolved, and
* the processor should try to resolve the URI itself.
*/
public Source resolve(String href, String base)
throws TransformerException
{
if (href == null)
{
return null;
}
URI uri;
try
{
if (base == null)
{
uri = new URI(href);
}
else
{
uri = new URI(base).resolve(href);
}
}
catch(URISyntaxException e)
{
// Unsupported uri. return null;
}
if (!"resource".equals(uri.getScheme()))
{
return null;
}
String resourceName = uri.getPath();
if ((resourceName == null) || (resourceName.length() == 0))
{
return null;
}
if (resourceName.charAt(0) == '/')
{
resourceName = resourceName.substring(1);
}
ClassLoader classLoader =
Thread.currentThread().getContextClassLoader();
InputStream stream =
classLoader.getResourceAsStream(resourceName);
if (stream == null)
{
return null;
}
return new StreamSource(stream, uri.toString());
}
}
We've uploaded an update for the jxom.
It has turned out that jxom schema is so powerful that you can do a great number of manipulations over xml representation of java program.
In our case this is an optimization of unreachable code, defined at
Sun's spec. We're facing this problem as result of translation from other ancient language, which also has well defined xml schema.
We also have introduced an ability to annotate jxom elements (see meta element), which in practice we use to annotate expressions with their types and perform "compile time" expression evaluation.
You may download jxom version at usual place.
See also: Java Xml Object Model.
The project we're working on requires us to generate a java web application from a some ancient language. The code being converted, we have transformed into java classes
(thanks to
jxom),
the presentation is converted into JSF (facelets) pages.
By the way, long before java (.net) platform has been conceived, there were
languages and environments, worked out so good that contemporary client - server
paradigms (like JSF, ASP.NET, and so on) are just their isomorphisms.
The problem we were dealing with recently is JSF databinding for a bean properties
of types java.sql.Date, java.sql.Time, java.sql.Timestamp .
At some point of design we have decided that these types are most natural
representation of data in the original language, as the program's activity is
tightly connected to the database. Later on it's became clear that JSF
databinding does not like these types at all. We were to decide either to fall
back and use java.util.Date as bean property types, or do something with
databinding.
It was not clear what's the best way until we have found an elegant solution,
namely: to create ELResolver to handle bean properties of these types. The solution
works because custom el resolvers are applied before standard resolvers (except
implicit one).
The class
DateELResolver is rather simple extension of the
BeanELResolver. To use it you only need to register it the faces-config.xml:
<faces-config version="1.2"
xmlns="http://java.sun.com/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
http://java.sun.com/xml/ns/javaee/web-facesconfig_1_2.xsd">
<application>
<el-resolver>com.nesterovskyBros.jsf.DateELResolver</el-resolver>
</application>
</faces-config>
Recently I've proposed to add two new atomic types
tuple and map to the xpath/xslt/xquery type system (see "Tuples an maps").
Later I've implemented
tuple and map pure xslt approximation. Now I want to present
java
implementation for Saxon.
I've created TupleValue and MapValue atomic types, and Collections class
exposing extension functions api. It's easy to use this api. I'll repeat an
example that I was showing earlier:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:f="http://www.nesterovsky-bros.com/xslt/functions/public"
xmlns:p="http://www.nesterovsky-bros.com/xslt/functions/private"
xmlns:c="java:com.nesterovskyBros.saxon.Functions"
exclude-result-prefixes="xs f p c">
<xsl:template match="/">
<root>
<xsl:variable name="tuples" as="item()*" select="
for $i in 1 to 20
return c:tuple(1 to $i)"/>
<total-items>
<xsl:sequence select="
sum
(
for $tuple in $tuples return
count(c:tuple-items($tuple))
)"/>
</total-items>
<tuples-size>
<xsl:sequence select="count($tuples)"/>
</tuples-size>
<sums-per-tuples>
<xsl:for-each select="$tuples">
<xsl:variable name="index"
as="xs:integer" select="position()"/>
<sum index="{$index}"
value="{sum(c:tuple-items(.))}"/>
</xsl:for-each>
</sums-per-tuples>
<xsl:variable name="cities" as="element()*">
<city name="Jerusalem" country="Israel"/>
<city name="London" country="Great Britain"/>
<city name="Paris" country="France"/>
<city name="New York" country="USA"/>
<city name="Moscow" country="Russia"/>
<city name="Tel Aviv" country="Israel"/>
<city name="St. Petersburg" country="Russia"/>
</xsl:variable>
<xsl:variable name="map" as="item()" select="
c:map
(
for $city in $cities return
(
$city/string(@country),
$city
)
)"/>
<xsl:for-each select="c:map-keys($map)">
<xsl:variable name="key" as="xs:string" select="."/>
<country name="{$key}">
<xsl:sequence select="c:map-value($map,
$key)"/>
</country>
</xsl:for-each>
</root>
</xsl:template>
</xsl:stylesheet>
Download java source.
P.S. I would wish this api be integrated into Saxon, as at present java
extension functions are called through reflection.
Many times I was tempted to create my bests books list.
No, I dare not to create it. There are so many brilliant masterpieces; I allow
to make such lists to someone, who is wiser than myself. However, I cannot be
silent about a book I've read recently. This is
Ulysses by
James Joyce.
It's hard reading: you need to know all creative work of Joyce,
Homer, state of the affairs in the world in the year of 1904, map of city of
Dublin in that year, and many many other information. That is the reason why you
must read annotated version.
The labour of reading is compensated with intellectual pleasure from it, because
it's not a story that is catching you, but narration. Having imagination, you're
immersing into a world created by the author.
I can believe that there are people for years living in the world created by Joyce.
Today I've found another new language (working draft in fact). It's
an XML Pipeline Language.
XProc: An XML Pipeline Language, a language for
describing operations to be performed on XML documents.
An XML Pipeline specifies a sequence of operations to be performed on zero or
more XML documents. Pipelines generally accept zero or more XML documents as
input and produce zero or more XML documents as output. Pipelines are made up of
simple steps which perform atomic operations on XML documents and constructs
similar to conditionals, iteration, and exception handlers which control which
steps are executed.
An experience shows a process of language invention is an essential part of
computer industry from the very beginning, however...
I must confess I must be too reluctant to any new language: I was happy with
C++, but then all these new languages like Delphi, Java, C#, and so many others
started to appear. It's correct to say that there is no efficient universal
language, however I think it's wrong to say that a domain specific language is
required to solve a particular problem in a most efficient way.
And now a question to the point: why do you need a new language for describing
operations to be performed on XML documents?
A project I'm currently working on, requires me to manipulate with a big number
of documents. This includes accessing these documents with key()
function.
I never thought this task poses any problem, until I've discovered that Saxon
caches documents loaded using document() function to preserve their identities:
By default, this function is ·stable·.
Two calls on this function return the same document node if the same
URI Reference (after resolution to an absolute URI Reference)
is supplied to both calls. Thus, the following expression
(if it does not raise an error) will always be true:
doc("foo.xml") is doc("foo.xml")
However, for performance reasons, implementations may provide a user option to
evaluate the function without a guarantee of stability. The manner in which any
such option is provided is implementation-defined. If the user has not selected
such an option, a call of the function must either return a stable result or
must raise an error: [err:FODC0003].
Saxon provides a saxon:discard-document() function to release documents from
cache. The use case is like this:
<xsl:variable name="document" as="document-node()"
select="saxon:discard-document(document(...))"/>
You may see, that saxon:discard-document() is bound to a place where document is
loaded. In my case this is inefficient, as my code repeatedly accesses documents
from different places. To release loaded documents I need to collect them after
main processing.
Other issue in Saxon is that, processor may keep document references through
xsl:key, thus saxon:discard-document() provides no guaranty of documents to be
garbage collected.
To deal with this, I've designed (Saxon specific) api to manage document pools:
t:begin-document-pool-scope() as item()
Begins document pool scope.
Returns scope id.
t:end-document-pool-scope(scope as item())
Terminates document pool scope.
$scope - scope id.
t:put-document-in-pool(document as document-node()) as
document-node()
Puts a document into a current scope of document pool.
$document - a document to put into the document pool.
Returns the same document node.
The use case is:
<xsl:variable name="scope" select="t:begin-document-pool-scope()"/>
<xsl:sequence select="t:assert($scope)"/>
...
<xsl:variable name="document" as="document-node()"
select="t:put-document-in-pool(...)"/>
...
<xsl:sequence
select="t:end-document-pool-scope($scope)"/>
Download
document-pool.xslt to use this api.
I was already writing about the logical
difference between tamplates and functions. This time I've realized another,
technical one. It's related to lazy evaluation, permitted by language
specification.
I was arguing as follows:
- suppose you define a function returning a sequence;
- this function at final step constructs document using
xsl:result-document;
- caller invokes this function and uses only first item of sequence;
- lazy evaluation allows to xslt processor to calculate first item only, thus
to avoid creation of output document altogether.
This conclusion looked ridiculous to me, as it means that I cannot reliably
expect creation of documents built with xsl:result-document instruction.
To resolve the issue I've checked specification. Someone has already thought of
this. This is what specification says:
[Definition: Each instruction in the
stylesheet is evaluated in one
of two possible output states: final output state or
temporary
output state].
[Definition: The first of the
two output states is called
final output state. This state applies when instructions are writing to a
final result tree.]
[Definition: The second
of the two output states is
called temporary output state. This state applies when instructions are
writing to a temporary tree
or any other non-final destination.]
The instructions in the
initial template are evaluated in final output state. An instruction is evaluated
in the same output state as
its calling instruction, except that
xsl:variable , xsl:param ,
xsl:with-param ,
xsl:attribute ,
xsl:comment ,
xsl:processing-instruction ,
xsl:namespace ,
xsl:value-of ,
xsl:function ,
xsl:key ,
xsl:sort , and xsl:message
always evaluate the instructions in their contained
sequence
constructor in temporary output state.
[ERR XTDE1480] It is a non-recoverable dynamic error to evaluate the
xsl:result-document
instruction in temporary output state.
As you can see, xsl:function is always evaluated in temporary output state, and
cannot contain xsl:result-document, in contrast to xsl:template, which may be
evaluated in final output state. This difference dictates the role of templates as
a "top level functions" and functions as standalone algorithms.
You can find more on subject at "Lazy evaluation and predicted results".
In the era of parallel processing it's so natural to inscribe your favorite programming language in the league of "Multithreading supporter". I've seen such appeals before "Wide Finder in XSLT --> deriving new requirements for efficiency in XSLT processors."
... I am not aware of any XSLT implementation that provides explicit or implicit support for parallel processing (with the obvious goal to take advantage of the multi-core processors that have almost reached a "prevalent" status today) ...
I think both xslt and xquery are well fitted for parrallel processing in terms of type system. This is because of "immutable" nature (until recent additions) of the execution state, which prevents many race conditions. The only missing ingredients are indirect function call, and a couple of core functions to queue parallel tasks.
Suppose there is a type to encapsulate a function call (say function-id), and a function accepting a sequence and a function-id. This function calls function-id for each element of the sequence in a parallel way, and then combines a final result, as if it were implemented serially.
Pretty simple, isn't it?
<!--
This function runs $id function for each item in a sequence.
$items - items to process.
$id - function id.
Returns a sequece of results of calls to $id function.
-->
<xsl:function name="x:queue-tasks" as="items()*">
<xsl:param name="items" as="item()*"/>
<xsl:param name="id" as="x:function-id"/>
<!-- The pseudo code. -->
<xsl:sequence select="$items/call $id (.)"/>
</xsl:function>
For the last several weeks I was on my military duty. We were patrolling Israeli border
near the Egypt. It was a completely different world, world of guns, Hummers,
heat sensors...
There I've met my army friends. It was fun to listen stories they were telling. At
some point I've started to realize that I'm growing older. Most of my friends
are married and have two or three children.
It seems, this was the genuine world, and my own one is fictitious.
Does WebSphere MQ library for .NET support a connection pool? This is the question, which ask many .NET developers who deal with IBM WebSphere MQ and write multithread applications. The answer to this question unfortunately is NO… The .NET version supports only individual connection types.
I have compared two MQ libraries Java's and one for .NET, and I’ve found that most of the classes have the same declarations except one crucial for me difference. As opposed to .NET, the Java MQ library provides several classes implementing MQ connection pooling. There is nothing similar in .NET library.
There are few common workarounds for this annoying restriction. One of such workarounds (is recommended by IBM in their “MQ using .NET”) is to keep open one MQ connection per thread. Unfortunately such approach is not working for ASP.NET applications (including web services).
The good news is that starting from service pack 5 for MQ 5.3, and of course for MQ 6.xx they are supporting sharing MQ connections in blocked mode:
“The implementation of WebSphere MQ .NET ensures that, for a given connection (MQQueueManager object instance), all access to the target WebSphere MQ queue manager is synchronized. The default behavior is that a thread that wants to issue a call to a queue manager is blocked until all other calls in progress for that connection are complete.”
This allows creating an MQ connection (pay attention that MQQueueManager object is a wrapper for MQ connection) in one thread and exclusive use it in another thread without side-effects caused by multithreading.
Taking in account this feature, I’ve created a simple MQ connection pool. It’s ease in use. The main class MQPoolManager has only two static methods:
public static MQQueueManager Get(string QueueManagerName, string ChannelName, string ConnectionName);
and
public static void Release(ref MQQueueManager queueManager);
The method Get returns MQ queue manager (either existing from pool or newly created one), and Release returns it to the connection pool. Internally the logic of MQPoolManager tracks expired connections and do some finalizations, if need.
So, you may use one MQ connection pool per application domain without additional efforts and big changes in existing applications.
By the way, this approach has allowed us to optimize performance of MQ part considerably in one of ours projects.
Later on...
To clarify using of MQPoolManager I've decided to show here following code snippet:
MQQueueManager queueManager = MQPoolManager.Get(QueueManagerName, ChannelName, ConnectionName);
try
{
// TODO: some work with MQ here
}
finally
{
MQPoolManager.Release(ref queueManager);
}
// at this point the queueManager is null
Yesterday's idea has inspired me as much as to create a prototype implementation of map and tuple in the xslt 2.0.
Definitely I wished these were a built-in types, and were considered as atomic values for purposes of comparasions and iteration. This way it were possible to create highly efficient grouping per several fields at once.
This pure implementation (xslt-tuple.zip) is rather scetchy, however it allows to feel what can be done with tuples and maps. I guess a good example may say more than many other words, so have a pleasure:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:f="http://www.nesterovsky-bros.com/xslt/functions"
exclude-result-prefixes="xs f">
<xsl:include href="tuple.xslt"/>
<xsl:include href="map.xslt"/>
<xsl:template match="/">
<root>
<xsl:variable name="tuples" as="item()*" select="
f:tuple
(
for $i in 1 to 10
return
f:tuple(1 to $i)
)"/>
<total-items>
<xsl:sequence select="count($tuples)"/>
</total-items>
<tuples-size>
<xsl:sequence select="f:tuple-size($tuples)"/>
</tuples-size>
<sums-per-tuples>
<xsl:for-each select="1 to f:tuple-size($tuples)">
<xsl:variable name="index" as="xs:integer" select="position()"/>
<sum
index="{$index}"
value="{sum(f:tuple-items(f:tuple-item($tuples,
$index)))}"/>
</xsl:for-each>
</sums-per-tuples>
<xsl:variable name="cities" as="element()*">
<city name="Jerusalem" country="Israel"/>
<city name="London" country="Great Britain"/>
<city name="Paris" country="France"/>
<city name="New York" country="USA"/>
<city name="Moscow" country="Russia"/>
<city name="Tel Aviv" country="Israel"/>
<city name="St. Petersburg" country="Russia"/>
</xsl:variable>
<xsl:variable name="map" as="item()*" select="
f:map
(
for $city in $cities
return
($city/string(@country), $city)
)"/>
<xsl:for-each select="f:map-keys($map)">
<xsl:variable name="key" as="xs:string" select="."/>
<country name="{$key}">
<xsl:sequence select="f:map-value($map, $key)"/>
</country>
</xsl:for-each>
</root>
</xsl:template>
</xsl:stylesheet>
The type system of xslt 2.0 is not complete (see
Sequence of sequences in xslt 2.0).
You cannot perform manipulations over items as you could do. The reason is in
the luck of set based constructs: xslt 2.0 supports sequences, but not
associative maps of items.
If you think that xml can be used as a good approximation of a map, I shan't agree
with you. Xml has an application in a very specific cases only. Maps I'm
thinking of, would allow associate items by reference, like sequences do.
This opens a perspective to create a state objects, to manage sequence of sequences,
to create cyclic graphs of items, and so on. These maps are richer than what
key() function provides right now, and allow to implement for-each-group in
xquery.
Such maps can be modeled with several functions, however I would wish they were
built in:
f:map($items as item()*) as item()
Returns a map from a sequence $items of pairs (key, value).
f:map-items($map as item()) as item()*
Returns a sequence of pairs (key, value) for a map $map.
f:map-keys($map as item()) as item()*
Returns a sequence of keys contained in a map $map.
f:map-values($map as item()) as item()*
Returns a sequence of values contained in a map $map.
f:map-value($map as item(), $key as item()) as item()*
Returns a sequence of values corresponding to a specified key $key contained a
specified map $map.
The other thing I would add is items tuple. It's like a sequence, however a sequence of tuples is never transformed into single sequence, but stays as sequence of tuples.
Fortunately it's possible to implement such extension functions.
|