We're facing a task of conversion of a java method into a state machine.
This is like to convert a SAX Parser, pushing data, into an Xml Reader, which
pulls data.
The task is formalized as:
- for a given method containing split markers create a class perimitting
iteration;
- each iteration performs part of a logic of a method.
We have defined rules converting all statements into a state machine except
of the statement synchronized . In fact the logic is rather linear, however the most untrivial conversion is for try statement.
Consider an example:
public class Test
{
void method()
throws Exception
{
try
{
A();
B();
}
catch(Exception e)
{
C(e);
}
finally
{
D();
}
E();
}
private void A()
throws Exception
{
// logic A
}
private void B()
throws Exception
{
// logic B
}
private void C(Exception e)
throws Exception
{
// logic C
}
private void D()
throws Exception
{
// logic D
}
private void E()
throws Exception
{
// logic E
}
}
Suppose we want to see method() as a state machine in a way that split markers are after calls to
methods A() , B() , C() , D() , E() . This is how it looks as a state machine:
Callable<Boolean> methodAsStateMachine()
throws Exception
{
return new Callable<Boolean>()
{
public Boolean call()
throws Exception
{
do
{
try
{
switch(state)
{
case 0:
{
A();
state = 1;
return true;
}
case 1:
{
B();
state = 3;
return true;
}
case 2:
{
C(ex);
state = 3;
return true;
}
case 3:
{
D();
if (currentException != null)
{
throw currentException;
}
state = 4;
return true;
}
case 4:
{
E();
state = -1;
return false;
}
}
if (currentException == null)
{
currentException = new IllegalStateException();
}
}
catch(Throwable e)
{
currentException = null;
switch(state)
{
case 0:
case 1:
{
if (e instanceof Exception)
{
ex = (Exception)e;
state = 2;
}
else
{
currentException = e;
state = 3;
}
continue;
}
case 2:
{
currentException = e;
state = 3;
continue;
}
}
currentException = e;
state = -1;
}
}
while(false);
return this.<Exception>error();
}
@SuppressWarnings("unchecked")
private <T extends Throwable> boolean error()
throws T
{
throw (T)currentException;
}
private int state = 0;
private Throwable currentException = null;
private Exception ex = null;
};
}
Believe it, or not but this transformation can be done purely in xslt 2.0 with the help of the
jxom (Java xml object model). We shall update
jxom.zip whenever
this module will be implemented and tested.
In the xslt one can express logically the same things in different words like:
exists($x)
and
every $y in $x satisfies exists($y)
newbie> Really the same?
expert> Ops... You're right, these are different things!
What's the difference?
I was already writing about tuples and maps in the xslt (see
Tuples and maps - Status: CLOSED, WONTFIX, and
Tuples and maps in Saxon).
Now, I want to argue on a use case, and on how xslt processor can detect such a
use case and implement it as map. This way, for a certain conditions, a sequences
could be treated as maps (or as sets).
Use case.
There are two stages:
- a logic collecting nodes/values satisfying some criteria.
- process data, and take a special action whenever a node/value is collected on
the previous stage.
Whenever we're talking of nodes than result of the first stage is
a sequence $set as node()* . The role of this sequence is a
set of nodes (order is not important).
The second stage is usually an xsl:for-each , an xsl:apply-templates ,
or something of this kind, which repeatedly verifies whether a some $node as
node()? belongs to the $set , like a following: $node intersect
$set , or $node except $set .
In spite of that we're still using regular xpath 2.0, we have managed to express
a set based operation. It's a matter of xslt processor's optimizer to detect
such a use case and consider a sequence as a set. In fact the detection rule is
rather simple.
For expressions $node except $set and $node intersect $set :
$set can be considered as a set, as order of elements is not important;
- chances are good that a
$set being implemented as a set outperforms implementation
using a list or an array.
Thus what to do? Well, I do not think I'm the smartest child, quite opposite...
however it worth to hint this idea to xslt implementers (see
Suggest optimization). I still do not know if it was fruitful...
P.S. A very similar use case exists for a function index-of($collection, $item).
I know we're not the first who create a parser in xslt.
However I still want to share our implementation, as I think it's beautiful.
In our project, which is conversion from a some legacy language to java, we're
dealing with dynamic expressions. For example in the legacy language one can
filter a collection using an expression defined by a string:
collection.filter("a > 0 and b = 7");
Whenever expression string is calculated there is nothing to do except to parse such
string at runtime and perform filtering dynamically. On the other hand we have
found that in the majority of cases literal strings are used. Thus we have decided to
optimize this route like this:
collection.filter(
new Filter<T>()
{
boolean filter(T value)
{
return (value.getA() > 0) and (value.getB() = 7);
}
});
This means that we're converting that expression string into java code on the
generation stage.
In the xslt - our generator engine - this means that we have to convert a string
into expression tree like this:
(a > 7 or a= 3) and c * d = 2.2
to
<and>
<or>
<gt>
<identifier>a</identifier>
<integer>7</integer>
</gt>
<eq>
<identifier>a</identifier>
<integer>3</integer>
</eq>
</or>
<eq>
<mul>
<identifier>c</identifier>
<identifier>d</identifier>
</mul>
<decimal>2.2</decimal>
</eq>
</and>
Our parser fits naturally to the world of parsers: it uses xsl:analyze-string instruction to tokenize input and
parses tokens according to an expression grammar. During implementation I've
found some new to me things. I think they worth mentioning:
-
As tokenizer is defined as a big regular expression, we have rather verbose
regex attribute over xsl:analyze-string . It was hard
to edit such a big line until I've found there is flag="x" option that solves
formatting problems:
The flags attribute may be used to control the interpretation of the regular expression... If it contains the letter x, then whitespace within the regular expression is ignored.
This means that I can use spaces to format regular expression and /s to specify space as part of expression.
-
Saxon 9.1.0.1 has inefficiency in implementation of
xsl:analyze-string
instruction, whenever regex contains literal value however with '{' character
(e.g. "\p{{L}}"), as it considers the value to be an AVT and delays pattern
compilation until runtime, which it does every time instruction is executed.
Use following link to see the xslt:
expression-parser.xslt.
To see how to generate java from an xml follow this link:
Xslt for the jxom (Java xml object model), jxom.zip.
Yesterday, incidentally, I've arrived to a problem of a dynamic error during evaluation of a template's match.
This reminded me
SFINAE in C++. There the principle is applied at compile time to find a
matching template.
I think people underestimate the meaning of this behaviour. The effect of
dynamic errors occurring during pattern evaluation is described in the
specification:
Any dynamic error or type error that occurs during the evaluation of a pattern against a particular node is treated as a recoverable error even if the error would not be recoverable under other circumstances. The optional recovery action is to treat the pattern as not matching that node.
This has far reaching consequences, like an error recovery. To illustrate what I'm talking about please look at this simple stylesheet that recovers from "Division by zero.":
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:template match="/">
<xsl:variable name="operator" as="element()+">
<div divident="10" divisor="0"/>
<div divident="10" divisor="2"/>
</xsl:variable>
<xsl:apply-templates select="$operator"/>
</xsl:template>
<xsl:param name="NaN" as="xs:double" select="1.0 div 0"/>
<xsl:template
match="div[(xs:integer(@divident) div xs:integer(@divisor)) ne $NaN]">
<xsl:message select="xs:integer(@divident) div xs:integer(@divisor)"/>
</xsl:template>
<xsl:template match="div">
<xsl:message select="'Division by zero.'"/>
</xsl:template>
</xsl:stylesheet>
Here, if there is a division by zero a template is not matched and other
template is selected, thus second template serves as an error handler for the
first one. Definitely, one may define much more complex construction to be
handled this way.
I never was a purist (meaning doing everything in xslt), however this example
along with
indirect function call, shows that xslt is rather equiped language. One just
need to be smart enough to understand how to do a things.
See also: Try/catch block in xslt 2.0 for Saxon 9.
Among other job activities, we're from time to time asked to check technical skills of job applicants.
Several times we were interviewing people who're far below the
acceptable professional skills. It's a torment for both sides, I should say.
To ease things we have designed a small
questionnaire (specific to our projects) for job applicants. It's sent to an applicant before the
meeting. Even partially answered, this
questionnaire constitutes a good filter against profanes:
<questionnaire> <item>
<question> Please estimate your knowledge in XML Schema
(xsd) as lacking, bad, good, or perfect.
</question> <answer/> </item> <item>
<question> Please estimate your
knowledge in xslt 2.0/xquery 1.0 as lacking, bad, good, or perfect.
</question> <answer/> </item> <item>
<question> Please estimate your
knowledge in xslt 1.0 as lacking, bad, good, or perfect. </question> <answer/> </item> <item>
<question> Please estimate your
knowledge in java as lacking, bad, good, or perfect. </question> <answer/> </item> <item>
<question> Please estimate your
knowledge in c# as lacking, bad, good, or perfect. </question> <answer/> </item> <item>
<question> Please estimate your
knowledge in sql as lacking, bad, good, or perfect. </question> <answer/> </item> <item>
<question> For logical values A, B,
please rewrite logical expression "A and B" using operator "or".
</question> <answer/> </item> <item>
<question> For logical values A, B,
please rewrite logical expression "A = B" using operators "and" and "or".
</question> <answer/> </item> <item>
<question> There are eight balls, with
only one heavier than some other.
What is a minimum number of weighings reveals the
heavier ball?
Please be suspicious about the "trivial" solution.
</question> <answer/> </item> <item>
<question> If A results in B. What one
may say about the reason of B? </question> <answer/> </item> <item>
<question> If only A or B result in C.
What one may say about the reason of C? </question> <answer/> </item> <item>
<question> Please define an xml schema
for this questionnaire. </question> <answer/> </item> <item>
<question> Please create a simple
stylesheet creating an html table based on this questionnaire.
</question> <answer/> </item> <item>
<question> For a table A with columns
B, C, and D, please create an sql query selecting B groupped by C and ordered by
D. </question> <answer/> </item> <item>
<question> For a sequence of xml
elements A with attribute B, please write a stylesheet excerpt creating a
sequence of elements D, grouping elements A with the same string value of
attribute B, sorted in the order of ascending of B. </question> <answer/> </item> <item>
<question> Having a java class A with
properties B and C, please sort a collection of A for B in ascending, and C in
descending order.
</question> <answer/> </item> <item>
<question> What does a following line
mean in c#?
int? x; </question> <answer/> </item> <item>
<question> What is a parser? </question> <answer/> </item> <item>
<question> How to issue an error in the
xml stylesheet? </question> <answer/> </item> <item>
<question> What is a lazy evaluation? </question> <answer/> </item> <item>
<question> How do you understand a
following sentence?
For each line of code there should be a comment.
</question> <answer/> </item> <item>
<question> Have you used any
supplemental information to answer these questions? </question> <answer/> </item> <item>
<question> Have you independently
answered these questions? </question> <answer/> </item> </questionnaire>
I've found that proposition to introduce tuples and maps to xslt/xquery type system has not found a support:
At the joint meeting of the XSL and XQuery Working groups 2008-06-23
it was decided that a change of this nature would be too large for the
next "point" release of the Recommendations. The request for new
functionality will be considered for a future "main" release.
Boor> *****!
Pessimist> Ah, there won't be tuples and maps in xslt/xquery...
Optimist> Wow, chances are good to see this addition by the year 2018!
We are designing a rather complex xslt 2.0 application, dealing with semistructured
data. We must tolerate with errors during processing, as there are cases where an
input is not perfectly valid (or the program is not designed or ready to get
such an input).
The most typical error is unsatisfied expectation of tree structure like:
<xsl:variable name="element" as="element()" select="some-element"/>
Obviously, dynamic error occurs if a specified element is not present. To
concentrate on primary logic, and to avoid a burden of illegal (unexpected) case
recovery we have created a try/catch API. The goal of such API is:
- to be able to continue processing in case of error;
- report as much as possible useful information related to an error.
Alternatives:
Do not think this is our arrogance, which has turned us to create a custom API. No, we
were looking for alternatives! Please see
[xsl] saxon:try() discussion:
- saxon:try()
function - is a kind of pseudo function, which explicitly relies on lazy
evaluation of its arguments, and ... it's not available in SaxonB;
- ex:error-safe
extension instruction - is far from perfect in its implementation quality, and provides no error location.
We have no other way except to design this feature by ourselves. In our defence one
can say that we are using innovatory approach that encapsulates details of the
implementation behind template and calls handlers indirectly.
Use:
Try/catch API is designed as a template
<xsl:template name="t:try-block"/> calling a "try" handler, and, if
required, a "catch" hanler using
<xsl:apply-templates mode="t:call"/> instruction. Caller passes any
information to these handlers by the means of tunnel parameters.
Handlers must be in a "t:call " mode. The "catch" handler
may recieve following error info parameters:
<xsl:param name="error" as="xs:QName"/>
<xsl:param name="error-description" as="xs:string"/>
<xsl:param name="error-location" as="item()*"/>
where $error-location is a sequence of pairs (location as
xs:string, context as item())* .
A sample:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:t="http://www.nesterovsky-bros.com/xslt/public/"
exclude-result-prefixes="xs t">
<xsl:include href="try-block.xslt"/>
<xsl:template match="/"> <result> <xsl:for-each select="1 to 10">
<xsl:call-template name="t:try-block"> <xsl:with-param name="value" tunnel="yes"
select=". - 5"/> <xsl:with-param name="try" as="element()"> <try/>
</xsl:with-param> <xsl:with-param name="catch" as="element()">
<t:error-handler/> </xsl:with-param> </xsl:call-template> </xsl:for-each>
</result> </xsl:template>
<xsl:template mode="t:call" match="try"> <xsl:param
name="value" tunnel="yes" as="xs:decimal"/>
<value> <xsl:sequence select="1 div
$value"/> </value> </xsl:template>
</xsl:stylesheet>
The sample prints values according to the formula "1/(i - 5)", where "i" is a
variable varying from 1 to 10. Clearly, division by zero occurs when "i" is equal
to 5.
Please notice how to access try/catch API through
<xsl:include href="try-block.xslt"/> . The main logic is
executed in
<xsl:template mode="t:call" match="try"/> , which
recieves parameters using tunneling. A default error handler
<t:error-handler/> is used to report errors.
Error report:
Error: FOAR0001
Description:
Decimal divide by zero
Location:
1. systemID: "file:///D:/style/try-block-test.xslt", line: 34
2. template mode="t:call"
match="element(try, xs:anyType)"
systemID: "file:///D:/style/try-block-test.xslt", line: 30
context node:
/*[1][local-name() = 'try']
3. template mode="t:call"
match="element({http://www.nesterovsky-bros.com/xslt/private/try-block}try, xs:anyType)"
systemID: "file:///D:/style/try-block.xslt", line: 53
context node:
/*[1][local-name() = 'try']
4. systemID: "file:///D:/style/try-block.xslt", line: 40
5. call-template name="t:try-block"
systemID: "file:///D:/style/try-block-test.xslt", line: 17
6. for-each
systemID: "file:///D:/style/try-block-test.xslt", line: 16
context item: 5
7. template mode="saxon:_defaultMode"
match="document-node()"
systemID: "file:///D:/style/try-block-test.xslt", line: 14
context node:
/
Implementation details:
You were not expecting this API to be pure xslt, weren't you?
Well, you're right, there is an extension function. Its pseudo code is like
this:
function tryBlock(tryItems, catchItems)
{
try
{
execute xsl:apply-templates for tryItems.
}
catch
{
execute xsl:apply-templates for catchItems.
}
}
The last thing. Please get the implementation
saxon.extensions.zip. There you will find sources of the try/catch, and
tuples/maps API.
Right now we're inhabiting in the java world, thus all our tasks are (in)directly
related to this environment.
We want to store stylesheets as resources of java application, and at
the same time to point to these stylesheets without jar qualification. In .NET this idea would not
appear at all, as there are well defined boundaries between assemblies, but java uses
rather different approach. Whenever you have a resource name, it's up to
ClassLoader to find this resource. To exploit this feature we've created
an uri resolver for the stylesheet
transformation. The protocol we use has a following format: "resource:/resource-path ".
For example to store stylesheets in the
META-INF/stylesheets folder we use uri "resource:/META-INF/stylesheets/java/main.xslt ".
Relative path is resolved naturally. A path "../jxom/java-serializer.xslt "
in previously mentioned stylesheet is resolved to "resource:/META-INF/stylesheets/jxom/java-serializer.xslt ".
We've created a small class ResourceURIResolver . You need to
supply an instance of TransformerFactory with this resolver:
transformerFactory.setURIResolver(new ResourceURIResolver());
The class itself is so small that we qoute it here:
import java.io.InputStream;
import java.net.URI;
import java.net.URISyntaxException;
import javax.xml.transform.Source;
import javax.xml.transform.TransformerException;
import javax.xml.transform.URIResolver;
import javax.xml.transform.stream.StreamSource;
/**
* This class implements an interface that can be called by the processor
* to turn a URI used in document(), xsl:import, or xsl:include into a
* Source object.
*/
public class ResourceURIResolver implements URIResolver
{
/**
* Called by the processor when it encounters
* an xsl:include, xsl:import, or document() function.
*
* This resolver supports protocol "resource:".
* Format of uri is: "resource:/resource-path", where "resource-path" is
an
* argument of a {@link ClassLoader#getResourceAsStream(String)} call.
* @param href - an href attribute, which may be relative or absolute.
* @param base - a base URI against which the first argument will be
made
* absolute if the absolute URI is required.
* @return a Source object, or null if the href cannot be resolved, and
* the processor should try to resolve the URI itself.
*/
public Source resolve(String href, String base)
throws TransformerException
{
if (href == null)
{
return null;
}
URI uri;
try
{
if (base == null)
{
uri = new URI(href);
}
else
{
uri = new URI(base).resolve(href);
}
}
catch(URISyntaxException e)
{
// Unsupported uri. return null;
}
if (!"resource".equals(uri.getScheme()))
{
return null;
}
String resourceName = uri.getPath();
if ((resourceName == null) || (resourceName.length() == 0))
{
return null;
}
if (resourceName.charAt(0) == '/')
{
resourceName = resourceName.substring(1);
}
ClassLoader classLoader =
Thread.currentThread().getContextClassLoader();
InputStream stream =
classLoader.getResourceAsStream(resourceName);
if (stream == null)
{
return null;
}
return new StreamSource(stream, uri.toString());
}
}
We've uploaded an update for the jxom.
It has turned out that jxom schema is so powerful that you can do a great number of manipulations over xml representation of java program.
In our case this is an optimization of unreachable code, defined at
Sun's spec. We're facing this problem as result of translation from other ancient language, which also has well defined xml schema.
We also have introduced an ability to annotate jxom elements (see meta element), which in practice we use to annotate expressions with their types and perform "compile time" expression evaluation.
You may download jxom version at usual place.
See also: Java Xml Object Model.
The project we're working on requires us to generate a java web application from a some ancient language. The code being converted, we have transformed into java classes
(thanks to
jxom),
the presentation is converted into JSF (facelets) pages.
By the way, long before java (.net) platform has been conceived, there were
languages and environments, worked out so good that contemporary client - server
paradigms (like JSF, ASP.NET, and so on) are just their isomorphisms.
The problem we were dealing with recently is JSF databinding for a bean properties
of types java.sql.Date, java.sql.Time, java.sql.Timestamp .
At some point of design we have decided that these types are most natural
representation of data in the original language, as the program's activity is
tightly connected to the database. Later on it's became clear that JSF
databinding does not like these types at all. We were to decide either to fall
back and use java.util.Date as bean property types, or do something with
databinding.
It was not clear what's the best way until we have found an elegant solution,
namely: to create ELResolver to handle bean properties of these types. The solution
works because custom el resolvers are applied before standard resolvers (except
implicit one).
The class
DateELResolver is rather simple extension of the
BeanELResolver. To use it you only need to register it the faces-config.xml:
<faces-config version="1.2"
xmlns="http://java.sun.com/xml/ns/javaee"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
http://java.sun.com/xml/ns/javaee/web-facesconfig_1_2.xsd">
<application>
<el-resolver>com.nesterovskyBros.jsf.DateELResolver</el-resolver>
</application>
</faces-config>
Recently I've proposed to add two new atomic types
tuple and map to the xpath/xslt/xquery type system (see "Tuples an maps").
Later I've implemented
tuple and map pure xslt approximation. Now I want to present
java
implementation for Saxon.
I've created TupleValue and MapValue atomic types, and Collections class
exposing extension functions api. It's easy to use this api. I'll repeat an
example that I was showing earlier:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:f="http://www.nesterovsky-bros.com/xslt/functions/public"
xmlns:p="http://www.nesterovsky-bros.com/xslt/functions/private"
xmlns:c="java:com.nesterovskyBros.saxon.Functions"
exclude-result-prefixes="xs f p c">
<xsl:template match="/">
<root>
<xsl:variable name="tuples" as="item()*" select="
for $i in 1 to 20
return c:tuple(1 to $i)"/>
<total-items>
<xsl:sequence select="
sum
(
for $tuple in $tuples return
count(c:tuple-items($tuple))
)"/>
</total-items>
<tuples-size>
<xsl:sequence select="count($tuples)"/>
</tuples-size>
<sums-per-tuples>
<xsl:for-each select="$tuples">
<xsl:variable name="index"
as="xs:integer" select="position()"/>
<sum index="{$index}"
value="{sum(c:tuple-items(.))}"/>
</xsl:for-each>
</sums-per-tuples>
<xsl:variable name="cities" as="element()*">
<city name="Jerusalem" country="Israel"/>
<city name="London" country="Great Britain"/>
<city name="Paris" country="France"/>
<city name="New York" country="USA"/>
<city name="Moscow" country="Russia"/>
<city name="Tel Aviv" country="Israel"/>
<city name="St. Petersburg" country="Russia"/>
</xsl:variable>
<xsl:variable name="map" as="item()" select="
c:map
(
for $city in $cities return
(
$city/string(@country),
$city
)
)"/>
<xsl:for-each select="c:map-keys($map)">
<xsl:variable name="key" as="xs:string" select="."/>
<country name="{$key}">
<xsl:sequence select="c:map-value($map,
$key)"/>
</country>
</xsl:for-each>
</root>
</xsl:template>
</xsl:stylesheet>
Download java source.
P.S. I would wish this api be integrated into Saxon, as at present java
extension functions are called through reflection.
Many times I was tempted to create my bests books list.
No, I dare not to create it. There are so many brilliant masterpieces; I allow
to make such lists to someone, who is wiser than myself. However, I cannot be
silent about a book I've read recently. This is
Ulysses by
James Joyce.
It's hard reading: you need to know all creative work of Joyce,
Homer, state of the affairs in the world in the year of 1904, map of city of
Dublin in that year, and many many other information. That is the reason why you
must read annotated version.
The labour of reading is compensated with intellectual pleasure from it, because
it's not a story that is catching you, but narration. Having imagination, you're
immersing into a world created by the author.
I can believe that there are people for years living in the world created by Joyce.
Today I've found another new language (working draft in fact). It's
an XML Pipeline Language.
XProc: An XML Pipeline Language, a language for
describing operations to be performed on XML documents.
An XML Pipeline specifies a sequence of operations to be performed on zero or
more XML documents. Pipelines generally accept zero or more XML documents as
input and produce zero or more XML documents as output. Pipelines are made up of
simple steps which perform atomic operations on XML documents and constructs
similar to conditionals, iteration, and exception handlers which control which
steps are executed.
An experience shows a process of language invention is an essential part of
computer industry from the very beginning, however...
I must confess I must be too reluctant to any new language: I was happy with
C++, but then all these new languages like Delphi, Java, C#, and so many others
started to appear. It's correct to say that there is no efficient universal
language, however I think it's wrong to say that a domain specific language is
required to solve a particular problem in a most efficient way.
And now a question to the point: why do you need a new language for describing
operations to be performed on XML documents?
|