Update on Incremental Parser - Nesterovsky bros

Sunday, 05 April 2009

Praises: I dare not to think how could we live without AnkhSVN.

At present we have:

a generic parser;
fully functional xquery parser;
detailed error report, and syntax suggestion;
high performance.

The idea of runtime grammar tree and a reader like parser results in a high performace, as we able to build a lookup tables to probe tokens. This allows us to start parsing immediately from the most specific grammar chain. For example, consider the xquery grammar:

[1] Module ::= VersionDecl? (LibraryModule | MainModule) [2] VersionDecl ::= "xquery" "version" StringLiteral ("encoding" StringLiteral)? Separator [3] MainModule ::= Prolog QueryBody [4] LibraryModule ::= ModuleDecl Prolog [5] ModuleDecl ::= "module" "namespace" NCName "=" URILiteral Separator [6] Prolog ::= ((DefaultNamespaceDecl | Setter | NamespaceDecl | Import) Separator)* ((VarDecl | FunctionDecl | OptionDecl) Separator)* ... [87] VarRef ::= "$" VarName

Formally, to parse xquery "$v" one needs to go deep into a grammar hierarchy. That's what is usually done. On the contrast, a lookup table for the grammar "Module", containing 80 different token runs, allows us to identify grammar chain just with a couple of probes:

[0] "xquery" "version" [1] "module" "namespace" [2] "declare" "default" "element" "namespace" [3] "declare" "default" "function" "namespace" [4] "declare" "boundary-space" [5] "declare" "default" "collation" [6] "declare" "base-uri" [7] "declare" "construction" [8] "declare" "ordering" [9] "declare" "default" "order" "empty" [10] "declare" "copy-namespaces" [11] "declare" "namespace" [12] "declare" "schema" [13] "import" "module" [14] "declare" "variable" "$" [15] "declare" "function" [16] "declare" "option" [17] "for" "$" [18] "let" "$" [19] "some" "$" [20] "every" "$" [21] "typeswitch" "(" [22] "if" "(" [23] "-" [24] "+" [25] "validate" "{" [26] "validate" "lax" [27] "validate" "strict" [28] "/" [29] "//" [30] <integer> [31] <decimal> [32] <double> [33] <string> [34] "$" [35] "(" [36] "." [37] <functionname> "(" [38] "ordered" "{" [39] "unordered" "{" [40] "<" <qname> [41]  [42] <?pi literal?> [43] "document" "{" [44] "element" <qname> [45] "element" "{" [46] "attribute" <qname> [47] "attribute" "{" [48] "text" "{" [49] "comment" "{" [50] "processing-instruction" <ncname> [51] "processing-instruction" "{" [52] "parent" "::" [53] "ancestor" "::" [54] "preceding-sibling" "::" [55] "preceding" "::" [56] "ancestor-or-self" "::" [57] ".." [58] "child" "::" [59] "descendant" "::" [60] "attribute" "::" [61] "self" "::" [62] "descendant-or-self" "::" [63] "following-sibling" "::" [64] "following" "::" [65] "@" [66] "document-node" "(" [67] "element" "(" [68] "attribute" "(" [69] "schema-element" "(" [70] "schema-attribute" "(" [71] "processing-instruction" "(" [72] "comment" "(" [73] "text" "(" [74] "node" "(" [75] <qname> [76] "*" [77] <ncname:*> [78] <*:ncname> [79] "(#"

This way, algorithmically, we outperform most of conventional parsers.

On the other hand, a parsed tree we're building, has a compact representation. Each tree node is defined with two text bookmarks, grammar chain, and a grammar specific data. What's important is that the production of garbage memory is very low, as the rate of parser's fail assumptions is small.

What should be done:

Attach events to the xquery grammar to collect program constructions: variables, functions, namespaces in scope. This will provide auto completion info.
Release inactive parsed subtrees. E.g. we can free tree of function body, and preserve its text range (two bookmarks).

Well, I'd like to think someone could understand anything in all this mumbling. All sources are at "Incremental parser" home.

Sunday, 05 April 2009 15:50:49 UTC

Comments [0] -
Incremental Parser

All comments require the approval of the site owner before being displayed.

Name *
E-mail
Home page

	Remember Me
Comment (Some html is allowed: `a@href@title, b, blockquote@cite, em, i, strike, strong, sub, super, u`) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.
Enter the code shown (prevents robots):
Live Comment Preview