There are some complications with streamed tree that we have implemented in saxon. They are due to the fact that only a view of input data is available at any time. Whenever you access some element that's is not available you're getting an exception.
Consider an example. We have a log created with java logging. It looks like this:
<log> <record> <date>...</date> <millis>...</millis> <sequence>...</sequence> <logger>...</logger> <level>INFO</level> <class>...</class> <method>...</method> <thread>...</thread> <message>...</message> </record> <record> ... </record> ...
We would like to write an xslt that returns a page of log as html:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:t="http://www.nesterovsky-bros.com/xslt/this" xmlns="http://www.w3.org/1999/xhtml" exclude-result-prefixes="xs t"> <xsl:param name="start-page" as="xs:integer" select="1"/> <xsl:param name="page-size" as="xs:integer" select="50"/> <xsl:output method="xhtml" byte-order-mark="yes" indent="yes"/> <!-- Entry point. --> <xsl:template match="/log"> <xsl:variable name="start" as="xs:integer" select="($start-page - 1) * $page-size + 1"/> <xsl:variable name="records" as="element()*" select="subsequence(record, $start, $page-size)"/> <html> <head> <title> <xsl:text>A log file. Page: </xsl:text> <xsl:value-of select="$start-page"/> </title> </head> <body> <table border="1"> <thead> <tr> <th>Level</th> <th>Message</th> </tr> </thead> <tbody> <xsl:apply-templates mode="t:record" select="$records"/> </tbody> </table> </body> </html> </xsl:template> <xsl:template mode="t:record" match="record"> <!-- Make a copy of record to avoid streaming access problems. --> <xsl:variable name="log"> <xsl:copy-of select="."/> </xsl:variable> <xsl:variable name="level" as="xs:string" select="$log/record/level"/> <xsl:variable name="message" as="xs:string" select="$log/record/message"/> <tr> <td> <xsl:value-of select="$level"/> </td> <td> <xsl:value-of select="$message"/> </td> </tr> </xsl:template> </xsl:stylesheet>
This code does not work. Guess why? Yes, it's subsequence(), which is too greedy. It always wants to know what's the next node, so it naturally skips a content of the current node. Algorithmically, such saxon code could be rewritten, and could possibly work better also in modes other than streaming.
subsequence()
A viable workaround, which does not use subsequence, looks rather untrivial:
<!-- Entry point. --> <xsl:template match="/log"> <xsl:variable name="start" as="xs:integer" select="($start-page - 1) * $page-size + 1"/> <xsl:variable name="end" as="xs:integer" select="$start + $page-size"/> <html> <head> <title> <xsl:text>A log file. Page: </xsl:text> <xsl:value-of select="$start-page"/> </title> </head> <body> <table border="1"> <thead> <tr> <th>Level</th> <th>Message</th> </tr> </thead> <tbody> <xsl:sequence select=" t:generate-records(record, $start, $end, ())"/> </tbody> </table> </body> </html> </xsl:template> <xsl:function name="t:generate-records" as="element()*"> <xsl:param name="records" as="element()*"/> <xsl:param name="start" as="xs:integer"/> <xsl:param name="end" as="xs:integer?"/> <xsl:param name="result" as="element()*"/> <xsl:variable name="record" as="element()?" select="$records[$start]"/> <xsl:choose> <xsl:when test="(exists($end) and ($start > $end)) or empty($record)"> <xsl:sequence select="$result"/> </xsl:when> <xsl:otherwise> <!-- Make a copy of record to avoid streaming access problems. --> <xsl:variable name="log"> <xsl:copy-of select="$record"/> </xsl:variable> <xsl:variable name="level" as="xs:string" select="$log/record/level"/> <xsl:variable name="message" as="xs:string" select="$log/record/message"/> <xsl:variable name="next-result" as="element()*"> <tr> <td> <xsl:value-of select="$level"/> </td> <td> <xsl:value-of select="$message"/> </td> </tr> </xsl:variable> <xsl:sequence select=" t:generate-records ( $records, $start + 1, $end, ($result, $next-result) )"/> </xsl:otherwise> </xsl:choose> </xsl:function>
Here we observed the greediness of saxon, which too early tried to consume more input than it's required. In the other cases we have seen that it may defer actual data access to the point when there is no data anymore.
So, without tuning internal saxon logic it's possible but not easy to write stylesheets that exploit streaming features.
P.S. Updated sources are at streamedtree.zip
Remember Me
a@href@title, b, blockquote@cite, em, i, strike, strong, sub, super, u