Praises: I dare not to think how could we live without AnkhSVN.
At present we have:
The idea of runtime grammar tree and a reader like parser results in a high performace, as we able to build a lookup tables to probe tokens. This allows us to start parsing immediately from the most specific grammar chain. For example, consider the xquery grammar:
[1] Module ::= VersionDecl? (LibraryModule | MainModule) [2] VersionDecl ::= "xquery" "version" StringLiteral ("encoding" StringLiteral)? Separator [3] MainModule ::= Prolog QueryBody [4] LibraryModule ::= ModuleDecl Prolog [5] ModuleDecl ::= "module" "namespace" NCName "=" URILiteral Separator [6] Prolog ::= ((DefaultNamespaceDecl | Setter | NamespaceDecl | Import) Separator)* ((VarDecl | FunctionDecl | OptionDecl) Separator)* ... [87] VarRef ::= "$" VarName
Formally, to parse xquery "$v" one needs to go deep into a grammar hierarchy. That's what is usually done. On the contrast, a lookup table for the grammar "Module", containing 80 different token runs, allows us to identify grammar chain just with a couple of probes:
[0] "xquery" "version" [1] "module" "namespace" [2] "declare" "default" "element" "namespace" [3] "declare" "default" "function" "namespace" [4] "declare" "boundary-space" [5] "declare" "default" "collation" [6] "declare" "base-uri" [7] "declare" "construction" [8] "declare" "ordering" [9] "declare" "default" "order" "empty" [10] "declare" "copy-namespaces" [11] "declare" "namespace" [12] "declare" "schema" [13] "import" "module" [14] "declare" "variable" "$" [15] "declare" "function" [16] "declare" "option" [17] "for" "$" [18] "let" "$" [19] "some" "$" [20] "every" "$" [21] "typeswitch" "(" [22] "if" "(" [23] "-" [24] "+" [25] "validate" "{" [26] "validate" "lax" [27] "validate" "strict" [28] "/" [29] "//" [30] <integer> [31] <decimal> [32] <double> [33] <string> [34] "$" [35] "(" [36] "." [37] <functionname> "(" [38] "ordered" "{" [39] "unordered" "{"
[40] "<" <qname> [41] <!--literal--> [42] <?pi literal?> [43] "document" "{" [44] "element" <qname> [45] "element" "{" [46] "attribute" <qname> [47] "attribute" "{" [48] "text" "{" [49] "comment" "{" [50] "processing-instruction" <ncname> [51] "processing-instruction" "{" [52] "parent" "::" [53] "ancestor" "::" [54] "preceding-sibling" "::" [55] "preceding" "::" [56] "ancestor-or-self" "::" [57] ".." [58] "child" "::" [59] "descendant" "::" [60] "attribute" "::" [61] "self" "::" [62] "descendant-or-self" "::" [63] "following-sibling" "::" [64] "following" "::" [65] "@" [66] "document-node" "(" [67] "element" "(" [68] "attribute" "(" [69] "schema-element" "(" [70] "schema-attribute" "(" [71] "processing-instruction" "(" [72] "comment" "(" [73] "text" "(" [74] "node" "(" [75] <qname> [76] "*" [77] <ncname:*> [78] <*:ncname> [79] "(#"
This way, algorithmically, we outperform most of conventional parsers.
On the other hand, a parsed tree we're building, has a compact representation. Each tree node is defined with two text bookmarks, grammar chain, and a grammar specific data. What's important is that the production of garbage memory is very low, as the rate of parser's fail assumptions is small.
What should be done:
Attach events to the xquery grammar to collect program constructions: variables, functions, namespaces in scope. This will provide auto completion info.
Release inactive parsed subtrees. E.g. we can free tree of function body, and preserve its text range (two bookmarks).
Well, I'd like to think someone could understand anything in all this mumbling. All sources are at "Incremental parser" home.
Remember Me
a@href@title, b, blockquote@cite, em, i, strike, strong, sub, super, u