RSS 2.0
Sign In
# Tuesday, 12 August 2008

I know we're not the first who create a parser in xslt. However I still want to share our implementation, as I think it's beautiful.

In our project, which is conversion from a some legacy language to java, we're dealing with dynamic expressions. For example in the legacy language one can filter a collection using an expression defined by a string: collection.filter("a > 0 and b = 7");

Whenever expression string is calculated there is nothing to do except to parse such string at runtime and perform filtering dynamically. On the other hand we have found that in the majority of cases literal strings are used. Thus we have decided to optimize this route like this:

  collection.filter(
    new Filter<T>()
    {
      boolean filter(T value)
      {
        return (value.getA() > 0) and (value.getB() = 7);
      }
    });

This means that we're converting that expression string into java code on the generation stage.

In the xslt - our generator engine - this means that we have to convert a string into expression tree like this:

(a > 7 or a= 3) and c * d = 2.2

to

<and>
  <or>
    <gt>
      <identifier>a</identifier>
      <integer>7</integer>
    </gt>
    <eq>
      <identifier>a</identifier>
      <integer>3</integer>
    </eq>
  </or>
  <eq>
    <mul>
      <identifier>c</identifier>
      <identifier>d</identifier>
    </mul>
    <decimal>2.2</decimal>
  </eq>
</and>

Our parser fits naturally to the world of parsers: it uses xsl:analyze-string instruction to tokenize input and parses tokens according to an expression grammar. During implementation I've found some new to me things. I think they worth mentioning:

  • As tokenizer is defined as a big regular expression, we have rather verbose regex attribute over xsl:analyze-string. It was hard to edit such a big line until I've found there is flag="x" option that solves formatting problems:

    The flags attribute may be used to control the interpretation of the regular expression... If it contains the letter x, then whitespace within the regular expression is ignored.

    This means that I can use spaces to format regular expression and /s to specify space as part of expression.
  • Saxon 9.1.0.1 has inefficiency in implementation of xsl:analyze-string instruction, whenever regex contains literal value however with '{' character (e.g. "\p{{L}}"), as it considers the value to be an AVT and delays pattern compilation until runtime, which it does every time instruction is executed.

Use following link to see the xslt: expression-parser.xslt.
To see how to generate java from an xml follow this link: Xslt for the jxom (Java xml object model), jxom.zip.

Tuesday, 12 August 2008 14:45:54 UTC  #    Comments [2] -
xslt
Thursday, 09 October 2008 05:45:51 UTC
This is really cool.
Friday, 10 October 2008 04:02:45 UTC
Probably you'd be interested to read about the LR Parsing Framework of FXSL.

Using it, several parsers have been implemented in pure XSLT (a JSON parser: the f:json-document() function), an XPath 2.0 parser and more...


Cheers,
Dimitre Novatchev
All comments require the approval of the site owner before being displayed.
Name
E-mail
Home page

Comment (Some html is allowed: a@href@title, b, blockquote@cite, em, i, strike, strong, sub, super, u) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.  

[Captcha]Enter the code shown (prevents robots):

Live Comment Preview
Archive
<2025 January>
SunMonTueWedThuFriSat
2930311234
567891011
12131415161718
19202122232425
2627282930311
2345678
Statistics
Total Posts: 387
This Year: 0
This Month: 0
This Week: 0
Comments: 2508
Locations of visitors to this page
Disclaimer
The opinions expressed herein are our own personal opinions and do not represent our employer's view in anyway.

© 2025, Nesterovsky bros
All Content © 2025, Nesterovsky bros
DasBlog theme 'Business' created by Christoph De Baene (delarou)