We have created
Java Xml Object Model purely for purposes of our project. In fact jxom at
present has siblings: xml models for sql dialects. There are also different APIs
like name normalizations, refactorings, compile time evaluation.
It turns out that jxom is also good enough for other developers.
The drawback of jxom, however, is rather complex xml schema. It takes time to understand it. To simplify things
we have created (and planning to create more) a couple of examples allowing to feel
how jxom xml looks like.
The latest version can be loaded from
jxom.zip
We would be pleased to see more comments on the subject.
Although in last our projects we're using more Java and XSLT, we always compare Java and .NET features. It's not a secret that in most applications we may find cache solutions used to improve performance. Unlike .NET providing a robust cache solution Java doesn't provide anything standard. Of course Java's adept may find a lot of caching frameworks or just to say: "use HashMap (ArrayList etc.) instead", but this is not the same.
Think about options for Java:
1. Caching frameworks (caching systems). Yes, they do their work. Do it perfectly. Some of them are brought to the state of the art, but there are drawbacks. The crucial one is that for simple data caching one should use a whole framework. This option requires too many efforts to solve a simple problem.
2. Collection classes (HashMap , ArrayList etc.) for caching data. This is very straightforward solution, and very productive. Everyone knows these classes, nothing to configure. One should declare an instance of such class, take care of data access synchronization and everything starts working immediately. An admirable caching solution but for "toy applications", since it solves one problem and introduces another one. If an application works for hours and there are a lot of data
to cache, the amount of data grows only and never reduces, so this is the reason why such caching is very quickly surrounded with all sort of rules that somehow reduce its size at run-time. The solution very quickly lost its shine and become not portable, but it's still applicable for some applications.
3. Using Java reference objects for caching data. The most appropriate for cache solution is a java.util.WeekHashMap class. WeakHashMap works exactly like a hash table but uses weak references internally. In practice, entries in the WeakHashMap are reclaimed at any time if they are not refered outside of map. This caching strategy
depends on GC's whims and is not entirely reliable, may increase a number of cache misses.
We've decided to create our simple cache with sliding expiration of data.
One may create many cache instances but there is only one global service that tracks expired objects among these instances:
private Cache<String, Object> cache = new Cache<String, Object>();
There is a constructor that specifies an expiration interval in milliseconds for all cached objects:
private Cache<String, Object> cache = new Cache<String, Object>(15 * 60 * 1000)
Access is similar to HashMap :
instance = cache.get("key"); and cache.put("key", instance);
That's all one should know to start use it. Click here to download the Java source of this class. Feel free to use it in your applications.
Recently, working on completely different thing, I've realized that one may create a
"generator", function returning different values per each call. I was somewhat
puzzled with this conclusion, as I thought xslt functions have no side effects,
and for the same arguments xslt function returns the same result.
I've confirmed the conclusion at the forum. See
Scope of uniqueness of generate-id().
In short:
- each node has an unique identity;
- function in the course of work creates a temporary node and produces a result
depending on identity of that node.
Example:
<xsl:stylesheet version="2.0"
xmlns:f="data:,f"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:message select="
for $i in 1 to 8 return
f:fun()"/>
</xsl:template>
<xsl:function name="f:fun" as="xs:string">
<xsl:variable name="x">!</xsl:variable>
<xsl:sequence select="generate-id($x)"/>
</xsl:function>
</xsl:stylesheet>
The next thought was that if you may create a generator then it's easy to create
a good random number generator (that's a trivial math task).
Hey gurus, take a chance!
Yesterday I've read of a new Garbage Collection implementation
G1.
To be honest I was not impressed.
I think Garbage Collection is an evil, or at least its present implementations.
I do not believe in algorithms that in their very core assume a centralized
execution.
On the other hand it's clear it's not in my power to change the status quo. My
lot is to give advices mostly incompetent and ignorable.
I'm waiting for the time when someone will reach the idea to bring some parts of
GC logic out of runtime scope. This will require more VM intelligence,
however will bear its fruits.
JIT or compiler during a static analysis may prove that some objects being
collected may make some of their referring objects unreachable, provided it can
prove that referring objects are not reachable through the other means (e.g.
private field which is not stored in other places). This is close to the ideas
expressed in
Muse on value types in java. It's possible to prepare a garbage graph in
advance before runtime.
In many cases it's also possible to prove that when method's variable goes out
of scope it's not reachable through the other means and may be collected. This
allows to implement a stage of automatic garbage collection when objects that
are proven to be a garbage be immedeately added to a free memory set.
As an example I'm thinking of java's ArrayList object which stores private
array. When ArrayList is reclaimed or resized a reference to the private array
is getting lost and memory can be added to the free set immediately.
This mechanics being integrated as the first stage of GC will make it less
centralized, as I believe many objects will be collected this way.
Suppose you have constructed a sequence of attributes.
How do you access a value of attribute "a"?
Simple, isn't it? It has taken a couple of minutes to find a solution!
<xsl:variable name="attributes" as="attribute()*">
<xsl:apply-templates mode="t:generate-attributes" select="."/>
</xsl:variable>
<xsl:variable name="value" as="xs:string?"
select="$attributes[self::attribute(a)]"/>
Saying
Our project, containing many different xslt files, generates many different
outputs (e.g: code that uses DB2 SQL, or Oracle SQL, or DAO, or some
other flavor of code). This results in usage of
indirect calls to handle different generation options, however to allow xslt
to work we had to create a big main xslt including stylesheets for each kind of
generation. This impacts on a compilation time.
Alternatives
- A big main xslt including everything.
- A big main xslt including everything and using "use-when" attribute.
- Compose main xslt on the fly.
We were eagerly inclined to the second alternative. Unfortunately a limited set of information is available when "use-when" is evaluated. In
particular there are neither parameters nor documents available. Using
Saxon's extensions one may reach only static variables, or access
System.getProperty(). This isn't flexible.
We've decided to try the third alternative.
Solution
We think we have found a nice solution: to create XsltSource ,
which receives a list of includes upon construction, and creates an xslt
when getReader() is called.
import java.io.Reader;
import java.io.StringReader;
import javax.xml.transform.stream.StreamSource;
/**
* A source to read generated stylesheet, which includes other stylesheets.
*/
public class XsltSource extends StreamSource
{
/**
* Creates an {@link XsltSource} instance.
*/
public XsltSource()
{
}
/**
* Creates an {@link XsltSource} instance.
* @param systemId a system identifier for root xslt.
*/
public XsltSource(String systemId)
{
super(systemId);
}
/**
* Creates an {@link XsltSource} instance.
* @param systemId a system identifier for root xslt.
* @param includes a list of includes.
*/
public XsltSource(String systemId, String[] includes)
{
super(systemId);
this.includes = includes;
}
/**
* Gets stylesheet version.
* @return a stylesheet version.
*/
public String getVersion()
{
return version;
}
/**
* Sets a stylesheet version.
* @param value a stylesheet version.
*/
public void setVersion(String value)
{
version = value;
}
/**
* Gets a list of includes.
* @return a list of includes.
*/
public String[] getIncludes()
{
return includes;
}
/**
* Sets a list of includes.
* @param value a list of includes.
*/
public void setIncludes(String[] value)
{
includes = value;
}
/**
* Generates an xslt on the fly.
*/
public Reader getReader()
{
String[] includes = getIncludes();
if (includes == null)
{
return super.getReader();
}
String version = getVersion();
if (version == null)
{
version = "2.0";
}
StringBuilder builder = new StringBuilder(1024);
builder.append("<stylesheet version=\"");
builder.append(version);
builder.append("\" xmlns=\"http://www.w3.org/1999/XSL/Transform\">");
for(String include: includes)
{
builder.append("<include href=\"");
builder.append(include);
builder.append("\"/>");
}
builder.append("</stylesheet>");
return new StringReader(builder.toString());
}
/**
* An xslt version. By default 2.0 is used.
*/
private String version;
/**
* A list of includes.
*/
private String[] includes;
}
To use it one just needs to write:
Source source = new XsltSource(base, stylesheets);
Templates templates = transformerFactory.newTemplates(source);
...
where:
base is a base uri for the generated stylesheet; it's used to
resolve relative includes;
stylesheets is an array of hrefs.
Such implementation resembles a dynamic linking when separate parts are bound at
runtime. We would like to see dynamic modules in the next version of xslt.
Why we've turned our attention to the Saxon implementation?
A considerable part (~75%) of project we're working on at present is creating
xslt(s). That's not stylesheets to create page presentations, but rather
project's business logic. To fulfill the project we were in need of xslt 2.0
processor. In the current state of affairs I doubt someone can point to a good
alternative to the Saxon implementation.
The open source nature of the SaxonB project and intrinsic curiosity act like a
hook for such species like ourselves.
We want to say that we're rather sceptical
observers of a code: the code should prove it have merits. Saxon looks
consistent. It takes not too much time to grasp implementation concepts taking
into account that the code routinely follows xpath/xslt/xquery specifications.
These code observation and practice with live xslt tasks helped us to form an
opinion on the Saxon itself. That's why we dare to critique it.
1. Compilation is fused with execution.
An xslt before being executed passes several stages including xpath data model, and a graph of expressions - objects implementing
parts of runtime logic.
Expression graph is optimized to achieve better runtime performace. The
optimization logic is distributed throughout the code, and in particular lives
in expression objects. This means that expression completes two roles: runtime
execution and optimization.
I would prefer to see a smaller and cleaner run time objects (expressions),
and optimization logic separately. On the other hand I can guess why Michael Kay
fused these roles: to ease lazy optimizations (at runtime).
2. Optimizations are xslt 1.0 by origin
This is like a heritage. There are two main techniques: cached
sequences, and global indices of rooted nodes.
This might be enough in xslt 1.0, but in 2.0 where there are diverse set of
types, where sequences extend node sets to other types, where sequences may
logically be grouped by pairs, tripples, and so on, this is not enough.
XPath data model operates with sequences only (in math sense). On the other hand it
defines many set based functions (operators) like: $a intersect $b , $a except $b ,
$a = $b , $a != $b . In these examples XPath sequences are better to consider as sets, or maps of items.
Other example: for $i in index-of($names, $name) return $values[$i] , where
$names as xs:string* , $values as element()* shows that
a closure of ($names , $values ) is in fact a map, and
$names might be implemented as a composition of a sequence and a map of
strings to indices.
There are other use case examples, which lead me to think that Saxon lacks set
based operators. Global indices are poor substitution, which work for rooted
trees only.
Again, I guess why Michael Kay is not implementing these operators: not everyone
loads xslt with stressful tasks requiring these features. I think xslt is mostly
used to render pages, and one rarely deviates from rooted trees.
In spite of the objections we think that Saxon is a good xslt 2.0 implementation,
which unfortunately lacks competitors.
We strongly object against persistence frameworks in their contemporary meaning.
This includes a long row of names like Hibernate, Java Persistence API, LINQ,
and others.
Consider how one of them describes itself:
...high performance object/relational persistence and query service... lets you
develop persistent classes following object-oriented idiom - including
association, inheritance, polymorphism, composition, and collections... allows you to express queries in its own portable SQL extension...
Sounds good, right?
We think not! Words "own" and "portable" regarding SQL are heard
almost like antonyms. When one creates a unified language (a noble rush, opposed to a
proprietary one (?)) she will inevitably adds a peer, increasing
plurality in the family of languages.
Attempts to create similar layers between data and business logic are not new.
This happens throughout the computer history. IDMS, NATURAL, COOL:GEN these are
20-30 years old examples.
Our reasoning (nothing new).
One need to approach to a design (development and maintainance) from different
perspectives, thus she will understand the question under the design better, and
will estimate skills to accomplish the problem. This will lead to a
modularization e.g: business layer, data layer, appearance; and to development
(maintainance) roles: program developer, database specialist, appearance
speciaist. On a small scale several roles are often fulfilled with one person;
this should not mean, however, that these roles are redundant, one just need to
try on different roles.
Why does one separate business layer and data layer?
Pragmatic perspective. There are databases, which may accomplish most of data
storage tasks in a more efficient way than one may achieve without database.
There are two worlds of database specialists and program developers. These two
layers and roles are facts of reality.
A desiner's goal is to keep these roles separate:
- do not force a database specialist to know the business logic details;
- do not force a program developer to know details on how to organize a storage
in more efficient way, or on how to optimize a particular query;
Modularity helps here. Databases are well equipped to solve these tasks: the data
layer should expose a database API through stored procedures, functions, and
views, while the business layer should use this API to access the database.
With persistence frameworks there are two alterantives:
- still use data layer API;
- rely on a persistence framework.
When the first case is selected then a framework provides almost no aditional
value comparing to traditional database access (jdbc, ado.net, an so on).
When one relies on a framework then a data layer interface virtually disappears
(in fact a framework substitutes this interface). Database specialist has very
little control over tuning the data structure, and optimizing queries, unless
she starts digging in the business code but even then she always cannot control
queries to the database. Moreover database specialist must learn a proprietary
query language.
Result is that a persistence framework erodes a division of responsibilities,
complicating development and maintainance.
We often hear a following explanation on why one should use Persistence
Frameworks: "It eases database vendor switch". This is the most stupid reason to use
Persistence Frameworks! It looks as if they plan to switch vendors once a
day.
A design needs to focus on a modularity. This will make code more robust, faster
and maintainable. This also eases potential migration process, as the data layer
should be migrated only, with minimal (mostly configurational) changes in the
business layer.
We are certain xslt/xquery are the best for web application frameworks from the
design perspective; or, in other words, pipeline frameworks allowing use of
xslt/xquery are preferable way to create web applications.
Advantages are obvious:
-
clear separation of business logic, data, and presentation;
-
richness of languages, allowing to implement simple presentation, complex
components, and sophisticated data binding;
-
built-in extensibility, allowing comunication with business logic, written in
other languages and/or located at different site.
It seems the agitation for a such technologies is like to force an open
door. There are such frameworks out there:
Orbeon Forms, Cocoon, and others.
We're not qualified to judge of their virtues, however...
Look at the current state of affairs. The main players in this area (well, I
have a rather limited vision) push other technologies: JSP/JSF/Faceletes and
alike in the Java world, and ASP.NET in the .NET world. The closest thing they
are providing is xslt servlet/component allowing to generate an output.
Their variants of syntaxis, their data binding techniques allude to similar
paradigms in xslt/xquery:
<select>
<c:forEach var="option" items="#{bean.options}">
<option value="#{option.key}">#{parameter.value}</option>
</c:forEach>
</select>
On the surface, however, we see much more limited (in design and in the
application) frameworks.
And here is a contradiction: how can it be that at present such a good design is
not as popular, as its competitors, at least?
Someone can say, there is no such a problem. You can use whatever you want. You
have a choice! Well, he's lucky. From our perspective it's not that simple.
We're creating rather complex web applications. Their nature isn't important in
this context, but what is important is that there are customers. They are not
thoroughly enlightened in the question, and exactly because of this they prefer
technologies proposed by leaders. It seems, everything convince them: main
stream, good support, many developers who know technology.
There is no single chance to promote anything else.
We believe that the future may change this state, but we're creating at present,
and cannot wait...
I've uploaded jxom.zip
Now, it contains a state machine generator. See "What you can do with jxom".
The code is in the java-state-machine-generator.xslt. The test is in the java-state-machine-test.xslt.
Java has no value types: objects allocated inplace, in contrast to objects
referred by a pointer in the heap. This, in my opinion, has a negative impact on
a program design and on a performance.
Incidentally, I've thought of a use case, which can be understood as a value
type by the jvm implementations. Consider an example:
class A
{
private final B b = new B();
}
Implementation may layout class A, in a way that field b will be a content of
an instance of class B itself rather than a pointer to an instance of a class B. This way we
save a pointer and a heap allocation of instance B. Another example:
class C
{
C(int size)
{
values = new D[size];
for(int i = 0; i < values.length; i++)
{
values[i] = new D();
}
}
private final D[] values;
}
Here field values is never a null and each item of array contains a non null
value. Assuming these conditions are kept for a whole life cycle, and values are
not passed by reference, we can consider values as an array of value types.
A use case conditions are following:
- a field contains a non null value;
- the field value is an instance of the field type and not
descendant type;
- if the field is an array, then all elements of the array are
initialized with instances of element type, and not descendant type.
- the field or an element of the array can be assigned through the
operator
new only (field = new T() , array[i] = new T() );
- the array field is not passed by reference
(
Arrays.sort(array) never happens).
JIT's allowed to interpret a field as a
value type provided it proves these conditions.
Later...
There is another use case to detect value types:
- a method variable contains no null value, and
- that variable is never stored in any field, and
- no synchronization is used on the instance of value in variable, and
- a value to the variable is assigned through the operator
new only.
A variable can be layed out directly onto the stack, provided a preceding conditions are satisfied.
P.S. In spite that .NET has built in value types, it may use the very same technique to optimize reference types.
We're facing a task of conversion of a java method into a state machine.
This is like to convert a SAX Parser, pushing data, into an Xml Reader, which
pulls data.
The task is formalized as:
- for a given method containing split markers create a class perimitting
iteration;
- each iteration performs part of a logic of a method.
We have defined rules converting all statements into a state machine except
of the statement synchronized . In fact the logic is rather linear, however the most untrivial conversion is for try statement.
Consider an example:
public class Test
{
void method()
throws Exception
{
try
{
A();
B();
}
catch(Exception e)
{
C(e);
}
finally
{
D();
}
E();
}
private void A()
throws Exception
{
// logic A
}
private void B()
throws Exception
{
// logic B
}
private void C(Exception e)
throws Exception
{
// logic C
}
private void D()
throws Exception
{
// logic D
}
private void E()
throws Exception
{
// logic E
}
}
Suppose we want to see method() as a state machine in a way that split markers are after calls to
methods A() , B() , C() , D() , E() . This is how it looks as a state machine:
Callable<Boolean> methodAsStateMachine()
throws Exception
{
return new Callable<Boolean>()
{
public Boolean call()
throws Exception
{
do
{
try
{
switch(state)
{
case 0:
{
A();
state = 1;
return true;
}
case 1:
{
B();
state = 3;
return true;
}
case 2:
{
C(ex);
state = 3;
return true;
}
case 3:
{
D();
if (currentException != null)
{
throw currentException;
}
state = 4;
return true;
}
case 4:
{
E();
state = -1;
return false;
}
}
if (currentException == null)
{
currentException = new IllegalStateException();
}
}
catch(Throwable e)
{
currentException = null;
switch(state)
{
case 0:
case 1:
{
if (e instanceof Exception)
{
ex = (Exception)e;
state = 2;
}
else
{
currentException = e;
state = 3;
}
continue;
}
case 2:
{
currentException = e;
state = 3;
continue;
}
}
currentException = e;
state = -1;
}
}
while(false);
return this.<Exception>error();
}
@SuppressWarnings("unchecked")
private <T extends Throwable> boolean error()
throws T
{
throw (T)currentException;
}
private int state = 0;
private Throwable currentException = null;
private Exception ex = null;
};
}
Believe it, or not but this transformation can be done purely in xslt 2.0 with the help of the
jxom (Java xml object model). We shall update
jxom.zip whenever
this module will be implemented and tested.
In the xslt one can express logically the same things in different words like:
exists($x)
and
every $y in $x satisfies exists($y)
newbie> Really the same?
expert> Ops... You're right, these are different things!
What's the difference?
I was already writing about tuples and maps in the xslt (see
Tuples and maps - Status: CLOSED, WONTFIX, and
Tuples and maps in Saxon).
Now, I want to argue on a use case, and on how xslt processor can detect such a
use case and implement it as map. This way, for a certain conditions, a sequences
could be treated as maps (or as sets).
Use case.
There are two stages:
- a logic collecting nodes/values satisfying some criteria.
- process data, and take a special action whenever a node/value is collected on
the previous stage.
Whenever we're talking of nodes than result of the first stage is
a sequence $set as node()* . The role of this sequence is a
set of nodes (order is not important).
The second stage is usually an xsl:for-each , an xsl:apply-templates ,
or something of this kind, which repeatedly verifies whether a some $node as
node()? belongs to the $set , like a following: $node intersect
$set , or $node except $set .
In spite of that we're still using regular xpath 2.0, we have managed to express
a set based operation. It's a matter of xslt processor's optimizer to detect
such a use case and consider a sequence as a set. In fact the detection rule is
rather simple.
For expressions $node except $set and $node intersect $set :
$set can be considered as a set, as order of elements is not important;
- chances are good that a
$set being implemented as a set outperforms implementation
using a list or an array.
Thus what to do? Well, I do not think I'm the smartest child, quite opposite...
however it worth to hint this idea to xslt implementers (see
Suggest optimization). I still do not know if it was fruitful...
P.S. A very similar use case exists for a function index-of($collection, $item).
I know we're not the first who create a parser in xslt.
However I still want to share our implementation, as I think it's beautiful.
In our project, which is conversion from a some legacy language to java, we're
dealing with dynamic expressions. For example in the legacy language one can
filter a collection using an expression defined by a string:
collection.filter("a > 0 and b = 7");
Whenever expression string is calculated there is nothing to do except to parse such
string at runtime and perform filtering dynamically. On the other hand we have
found that in the majority of cases literal strings are used. Thus we have decided to
optimize this route like this:
collection.filter(
new Filter<T>()
{
boolean filter(T value)
{
return (value.getA() > 0) and (value.getB() = 7);
}
});
This means that we're converting that expression string into java code on the
generation stage.
In the xslt - our generator engine - this means that we have to convert a string
into expression tree like this:
(a > 7 or a= 3) and c * d = 2.2
to
<and>
<or>
<gt>
<identifier>a</identifier>
<integer>7</integer>
</gt>
<eq>
<identifier>a</identifier>
<integer>3</integer>
</eq>
</or>
<eq>
<mul>
<identifier>c</identifier>
<identifier>d</identifier>
</mul>
<decimal>2.2</decimal>
</eq>
</and>
Our parser fits naturally to the world of parsers: it uses xsl:analyze-string instruction to tokenize input and
parses tokens according to an expression grammar. During implementation I've
found some new to me things. I think they worth mentioning:
-
As tokenizer is defined as a big regular expression, we have rather verbose
regex attribute over xsl:analyze-string . It was hard
to edit such a big line until I've found there is flag="x" option that solves
formatting problems:
The flags attribute may be used to control the interpretation of the regular expression... If it contains the letter x, then whitespace within the regular expression is ignored.
This means that I can use spaces to format regular expression and /s to specify space as part of expression.
-
Saxon 9.1.0.1 has inefficiency in implementation of
xsl:analyze-string
instruction, whenever regex contains literal value however with '{' character
(e.g. "\p{{L}}"), as it considers the value to be an AVT and delays pattern
compilation until runtime, which it does every time instruction is executed.
Use following link to see the xslt:
expression-parser.xslt.
To see how to generate java from an xml follow this link:
Xslt for the jxom (Java xml object model), jxom.zip.
|