Accidentally we have found that implementation of String and StringBuilder
have been considerably revised, while public interface has remained the
same.
public sealed class String
{
private int m_arrayLength;
private int m_stringLength;
private char
m_firstChar;
}
This layout is dated to .NET 1.0.
VM, in fact, allocates more memory than that defined in C# class, as
&m_firstChar refers to an inline char buffer.
This way string's buffer length and string's length were two different
values, thus StringBuilder used this fact and stored its content in a private string
which it modified in place.
In .NET 4, string is different:
public sealed class String
{
private int m_stringLength;
private char
m_firstChar;
}
Memory footprint of such structure is smaller, but string's length should
always be the same as its buffer. In fact layout of string is now the same as
layout of char[] .
This modification leads to implementation redesign of the StringBuilder .
Earlier, StringBuilder looked like the following:
public sealed class StringBuilder
{
internal IntPtr m_currentThread;
internal int m_MaxCapacity;
internal volatile
string m_StringValue;
}
Notice that m_StringValue is used as a storage, and
m_currentThread is used to preserve thread affinity of the internal
string value.
Now, guys at Microsoft have decided to implement StringBuilder very differently:
public sealed class StringBuilder
{
internal int m_MaxCapacity;
internal int m_ChunkLength;
internal int m_ChunkOffset;
internal char[] m_ChunkChars;
internal StringBuilder m_ChunkPrevious;
}
Inspection of this layout immediately reveals implementation technique. It's a
list of chunks. Instance itself references the last chunk (most recently
appended), and the previous chunks.
Characteristics of this design are:
- while
Length is small, performance almost the same as it was earlier;
- there are no more thread affinity checks;
Append() , and ToString() works as fast a in the old version.
Insert() in the middle works faster, as only a chuck should be splitted and
probably reallocated (copied), instead of the whole string;
- Random access is fast at the end O(1) and slows when you approaching the start
O(chunk-count).
Personally, we would select a slightly different design:
public sealed class StringBuilder
{
private struct Chunk
{
public int length; // Chunk length.
public int offset; // Chunk offset.
public char[] buffer;
}
private int m_MaxCapacity;
// Alternatively, one can use
// private List<Chunk> chunks;
private int chunkCount; // Number of used chunks.
private Chunk[] chunks; // Array of chunks except last.
private Chunk last; // Last chunk.
private bool nonHomogenous; // false if all chunks are of the same size.
}
This design has better memory footprint, and random access time is O(1) when there were no
inserts in the middle (nonHomogenous=false ), and
O(log(chunkCount)) after such inserts. All other characteristics are the
same.
Earlier, there was a hype about how good VS 2010 is.
When we tried the beta and found that it's noticeably slower than VS 2008, we assumed that release will do better.
Unfortunately, that was an optimistic assumption.
Comparing VS 2008 and VS 2010 we can confirm that later:
- eats more memory;
- exhibits slower experience with C# projects (often hangs for a long periods and even crushes);
- incapable to work with xslt 2.0 files;
- has removed Shift+Enter key stroke to insert
<br/> in html editor (why?);
- has removed visualizer of the StringBuilder (in debugger).
Are we using too outdated hardware (laptops Lenovo T60 2GHz Core Duo/2GB RAM)? Other reason?
We have updated C# XOM (csharpxom) to support C# 4.0 (in fact there are very few
changes).
From the grammar perspective this includes:
- Dynamic types;
- Named and optional arguments;
- Covariance and contravariance of generic parameters for interfaces and
delegates.
Dynamic type, C#:
dynamic dyn = 1;
C# XOM:
<var name="dyn">
<type name="dynamic"/>
<initialize>
<int value="1"/>
</initialize>
</var>
Named and Optional Arguments, C#:
int Increment(int value, int increment = 1)
{
return value + increment;
}
void
Test()
{
// Regular call.
Increment(7, 1);
// Call with named parameter.
Increment(value: 7, increment: 1);
// Call with default.
Increment(7);
}
C# XOM:
<method name="Increment">
<returns>
<type name="int"/>
</returns>
<parameters>
<parameter name="value">
<type name="int"/>
</parameter>
<parameter
name="increment">
<type name="int"/>
<initialize>
<int value="1"/>
</initialize>
</parameter>
</parameters>
<block>
<return>
<add>
<var-ref name="value"/>
<var-ref name="increment"/>
</add>
</return>
</block>
</method>
<method
name="Test">
<block>
<expression>
<comment>Regular call.</comment>
<invoke>
<method-ref name="Increment"/>
<arguments>
<int value="7"/>
<int value="1"/>
</arguments>
</invoke>
</expression>
<expression>
<comment>Call with named
parameter.</comment>
<invoke>
<method-ref name="Increment"/>
<arguments>
<argument name="value">
<int value="7"/>
</argument>
<argument name="increment">
<int value="1"/>
</argument>
</arguments>
</invoke>
</expression>
<expression>
<comment>Call with default.</comment>
<invoke>
<method-ref name="Increment"/>
<arguments>
<int value="7"/>
</arguments>
</invoke>
</expression>
</block>
</method>
Covariance and contravariance, C#:
public interface Variance<in T, out P, Q>
{
P X(T
t);
}
C# XOM:
<interface access="public" name="Variance">
<type-parameters>
<type-parameter
name="T" variance="in"/>
<type-parameter name="P" variance="out"/>
<type-parameter name="Q"/>
</type-parameters>
<method name="X">
<returns>
<type name="P"/>
</returns>
<parameters>
<parameter name="t">
<type name="T"/>
</parameter>
</parameters>
</method>
</interface>
Other cosmetic fixes were also introduced into Java XOM (jxom), COBOL XOM
(cobolxom), and into sql XOM (sqlxom).
The new version is found at
languages-xom.zip.
See also: What's
New in Visual C# 2010
We have run into another xslt bug, which depends on several independent
circumstances and often behaves differently being observed. That's clearly a
Heisenbug.
Xslt designers failed to realize that a syntactic suggar they introduce into
xpath can turn into obscure bugs. Well, it's easy to be wise afterwards...
To the point.
Consider you have a sequence consisting of text nodes and
elements, and now you want to "normalize" this sequence wrapping
adjacent text nodes into
separate elements. The following stylesheet is supposed to do the work:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:t="http://www.nesterovsky-bros.com/xslt/this"
exclude-result-prefixes="xs t">
<xsl:template match="/">
<xsl:variable
name="nodes" as="node()*">
<xsl:text>Hello, </xsl:text>
<string value="World"/>
<xsl:text>! </xsl:text>
<xsl:text>Well, </xsl:text>
<string value="hello"/>
<xsl:text>, if not joking!</xsl:text>
</xsl:variable>
<result>
<xsl:sequence
select="t:normalize($nodes)"/>
</result>
</xsl:template>
<xsl:function
name="t:normalize" as="node()*">
<xsl:param name="nodes" as="node()*"/>
<xsl:for-each-group select="$nodes" group-starting-with="*">
<xsl:variable
name="string" as="element()?" select="self::string"/>
<xsl:variable name="texts"
as="node()*"
select="current-group() except $string"/>
<xsl:sequence
select="$string"/>
<xsl:if test="exists($texts)">
<string
value="{string-join($texts, '')}"/>
</xsl:if>
</xsl:for-each-group>
</xsl:function>
</xsl:stylesheet>
We're expecting the following output:
<result>
<string value="Hello, "/>
<string value="World"/>
<string value="! Well, "/>
<string value="hello"/>
<string value=", if not joking!"/>
</result>
But often we're getting other results, like:
<result>
<string value="Hello, "/>
<string value="World"/>
<string value="Well, ! "/>
<string value="hello"/>
<string value=", if not joking!"/>
</result>
Such output may seriously confuse, unless you will recall the rule for the
xpath except operator:
The except operator takes two node sequences as operands and returns a sequence containing all the nodes that occur in the first operand but not in the second operand.
... these operators eliminate duplicate nodes from their result sequences based
on node identity. The resulting sequence is returned in document order..
...
The relative order of nodes in distinct trees is stable but implementation-dependent
These words mean that result sequence may be very different from original
sequence.
In contrast, if we change $text definition to:
<xsl:variable name="texts"
as="node()*"
select="current-group()[not(. is $string)]"/>
then the result becomes stable, but less clear.
See also
Xslt Heisenbug
It does not matter that DataBindExtender looks not usual in the ASP.NET. It turns to be so handy that built-in data binding is not considered to be an option.
After a short try, you uderstand that people tried very hard and have invented many controls and methods like ObjectDataSource, FormView, Eval(), and Bind() with outcome, which is very specific and limited.
In contrast DataBindExtender performs:
- Two or one way data binding of any business data property to any control property;
- Converts value before it's passed to the control, or into the business data;
- Validates the value.
See an example:
<asp:TextBox id=Field8 EnableViewState="false" runat="server"></asp:TextBox> <bphx:DataBindExtender runat='server' EnableViewState='false' TargetControlID='Field8' ControlProperty='Text' DataSource='<%# Import.ClearingMemberFirm %>' DataMember='Id' Converter='<%# Converters.AsString("XXXXX", false) %>' Validator='<%# (extender, value) => Functions.CheckID(value as string) %>'/>
Here, we beside a regualar two way data binding of a property Import.ClearingMemberFirm.Id to a property Field8.Text , format (parse) Converters.AsString("XXXXX", false) , and finally validate an input value with a lambda function (extender, value) => Functions.CheckID(value as string) .
DataBindExtender works also well in template controls like asp:Repeater, asp:GridView, and so on. Having your business data available, you may reduce a size of the ViewState with EnableViewState='false' . This way DataBindExtender approaches page development to a pattern called MVC.
Recently, we have found that it's also useful to have a way to run a javascript during the page load (e.g. you want to attach some client side event, or register a component). DataBindExtender provides this with OnClientInit property, which is a javascript to run on a client, where this refers to a DOM element:
... OnClientInit='$addHandler(this, "change", function() { handleEvent(event, "Field8"); } );'/>
allows us to attach onchange javascript event to the asp:TextBox .
So, meantime we're very satisfied with what we can achieve with DataBindExtender. It's more than JSF allows, and much more stronger and neater to what ASP.NET has provided.
The sources can be found at DataBindExtender.cs
Lately, we have found that we've accustomed to declare C#'s local variables using var :
var exitStateName = exitState == null ? "" : exitState.Name;
var rules = Environment.NavigationRules;
var rule = rules[caller.Name];
var flow = rule.NavigationCases[procedure.OriginExitState];
This makes code cleaner, and in presense of good IDE still allows to figure out
types very easely.
We, howerer, found that var tends to have exceptions in its
uses. E.g. for some reason most of boolean locals in our code tend to remain explicit
(matter of taste?):
bool succeed = false;
try
{
...
succeed = true;
}
finally
{
if (!succeed)
{
...
}
}
Also, type often survives in for , but not in foreach :
for(int i = 0; i < sourceDataMapping.Length;
++i)
{
...
}
foreach(var property in properties)
{
...
}
In addition var has some limitations, as one cannot easily
initialize such local with null. From the following we prefer the first approach:
IWindowContext context = null;
var context = (IWindowContext)null;
var context = null as IWindowContext;
var context = default(IWindowContext);
We might need to figure out a consistent code style as for var . It
might be like that:
- Numeric, booleans and string locals should use explicit type;
- Try to avoid locals initialized with null, or without initializer, or use type
if such variable cannot be avoided;
- Use var in all other cases.
Another code style could be like that:
- For the consistency, completely avoid the use of keyword
var .
Recently we were raising a question about serialization of ASPX output in xslt.
The question went like this:
What's the recommended way of ASPX page generation? E.g.:
------------------------ <%@ Page AutoEventWireup="true" CodeBehind="CurMainMenuP.aspx.cs" EnableSessionState="True" Inherits="Currency.CurMainMenuP" Language="C#" MaintainScrollPositionOnPostback="True" MasterPageFile="Screen.Master" %>
<asp:Content ID="Content1" runat="server" ContentPlaceHolderID="Title">CUR_MAIN_MENU_P</asp:Content>
<asp:Content ID="Content2" runat="server" ContentPlaceHolderID="Content"> <span id="id1222146581" runat="server" class="inputField system UpperCase" enableviewstate="false"> <%# Dialog.Global.TranCode %> </span> ... ------------------------
Notice aspx page directives, data binding expessions, and prefixed tag names without namespace declarations.
There was a whole range of expected answers. We, however, looked whether somebody have already dealed with the task and has a ready solution at hands.
In general it seems that xslt community is very angry about ASPX: both format and technology. Well, put this aside.
The task of producing ASPX, which is almost xml, is not solvable when you're staying with pure xml serializer. Xslt's xsl:character-map does not work at all. In fact it looks as a childish attempt to address the problem, as it does not support character escapes but only grabs characters and substitutes them with strings.
We have decided to create ASPX serializer API producing required output text. This way you use <xsl:output method="text"/> to generate ASPX pages.
With this goal in mind we have defined a little xml schema to describe ASPX irregularities in xml form. These are:
<xs:element name="declared-prefix"> - to describe known prefixes, which should not be declared;
<xs:element name="directive"> - to describe directives like <%@ Page %>;
<xs:element name="content"> - a transparent content wrapper;
<xs:element name="entity"> - to issue xml entity;
<xs:element name="expression"> - to describe aspx expression like <%# Eval("A") %>;
<xs:element name="attribute"> - to describe an attribute of the parent element.
This approach greately simplified for us an ASPX generation process.
The API includes:
In previous posts we were crying about problems with JSF to ASP.NET migration. Let's point to another one.
Consider that you have an input field, whose value should be validated:
<input type="text" runat="server" ID="id1222146409" maxlength="4"/> <bphx:DataBindExtender runat="server" TargetControlID="id1222146409" ControlProperty="Value" DataSource="<%# Import.AaControlAttributes %>" DataMember="UserEnteredTrancode"/>
Here we have an input control, whose value is bound to Import.AaControlAttributes.UserEnteredTrancode property. But what is missed is a value validation. Somewhere we have a function that could answer the question whether the value is valid. It should be called like this: Functions.IsTransactionCodeValid(value) .
Staying within standard components we can use a custom validator on the page:
<asp:CustomValidator runat="server" ControlToValidate="id1222146409" OnServerValidate="ValidateTransaction" ErrorMessage="Invalid transaction code."/>
and add the following code-behind:
protected void ValidateTransaction(object source, ServerValidateEventArgs args) { args.IsValid = Functions.IsTransactionCodeValid(args.Value); }
This approach works, however it pollutes the code-behind with many very similar methods. The problem is that the validation rules in most cases are not property of page but one of data model. That's why page validation methods just forward check to somewhere.
While thinking on how to simplify the code we have came up with more conscious and short way to express validators, namely using lambda functions. To that end we have introduced a Validator property of type ValueValidator over DataBindExtender . Where
/// <summary>A delegate to validate values.</summary> /// <param name="extender">An extender instance.</param> /// <param name="value">A value to validate.</param> /// <returns>true for valid value, and false otherwise.</returns> public delegate bool ValueValidator(DataBindExtender extender, object value);
/// <summary>An optional data member validator.</summary> public virtual ValueValidator Validator { get; set; }
With this new property the page markup looks like this:
<input type="text" runat="server" ID="id1222146409" maxlength="4"/> <bphx:DataBindExtender runat="server" TargetControlID="id1222146409" ControlProperty="Value" DataSource="<%# Import.AaControlAttributes %>" DataMember="UserEnteredTrancode" Validator='<%# (extender, value) => Functions.IsTransactionCodeValid(value as string) %>' ErrorMessage="Invalid transaction code."/>
This is almost like an event handler, however it allowed us to call data model validation logic without unnecessary code-behind.
The updated DataBindExtender can be found at DataBindExtender.cs.
Being well behind of the latest news and traps of the ASP.NET, we're readily falling on each problem.
This time it's a script injection during data binding.
In JSF there is a component to output data called h:outputText. Its use is like this:
<span jsfc="h:outputText" value="#{myBean.myProperty}"/>
The output is a span element with data bound value embeded into content. The natural alternative in ASP.NET seems to be an asp:Label control:
<asp:Label runat="server" Text="<%# Eval("MyProperty") %>"/>
This almost works except that the h:outputText escapes data (you may override this and specify attribute escape="false" ), and asp:Label never escapes the data.
This looks as a very serious omission in ASP.NET (in fact very close to a security hole). What are chances that when you're creating a new page, which uses data binding, you will not forget to fix code that wizard created for you and to change it to:
<asp:Label runat="server" Text="<%# Server.HtmlEncode(Eval("MyProperty")) %>"/>
Eh? Think what will happen if MyProperty will return a text that looks like a script (e.g.: <script>alert(1)</script> ), while you just wanted to output a label?
To address the issue we've also introduced a property Escape into DataBindExtender. So at present we have a code like this:
<asp:Label runat="server" ID="MyLabel"/> <bphx:DataBindExtender runat="server" TargetControlID="MyLabel" ControlProperty="Text" ReadOnly="true" Escape="true" DataSource="<%# MyBean %>" DataMember="MyProperty"/>
See also: A DataBindExtender, Experience of JSF to ASP.NET migration
After struggling with ASP.NET data binding we found no other way but to introduce our little extender control to address the issue.
We were trying to be minimalistic and to introduce two way data binding and to support data conversion. This way extender control (called DataBindExtender ) have following page syntax:
<asp:TextBox id=TextBox1 runat="server"></asp:TextBox> <cc1:DataBindExtender runat="server" DataSource="<%# Data %>" DataMember="ID" TargetControlID="TextBox1" ControlProperty="Text" />
Two way data binding is provided with DataSource object (notice data binding over this property) and a DataMember property from the one side, and TargetControlID and ControlProperty from the other side. DataBindExtender supports Converter property of type TypeConverter to support custom converters.
DataBindExtender is based on AjaxControlToolkit.ExtenderControlBase class and implements System.Web.UI.IValidator . ExtenderControlBase makes implementation of extenders extremely easy, while IValidator plugs natuarally into page validation (Validate method, Validators collections, ValidationSummary control).
The good point about extenders is that they are not visible in designer, while it exposes properties in extended control itself. The disadvantage is that it requires Ajax Control Toolkit, and also ScriptManager component of the page.
To simplify the use DataBindExtender gets data from control and puts the value into data source in Validate method, and puts data into control in OnPreRender method; thus no specific action is required to perform data binding.
Source for the DataBindExtender is DataBindExtender.cs.
We used to think that ASP.NET is a way too powerful than JSF. It might be still true, but not when you are accustomed to JSF and spoiled with its code practice...
Looking at both technologies from a greater distance, we now realize that they give almost the same level of comfort during development, but they are different. You can feel this after you were working for some time with one technology and now are to implement similar solution in opposite one. That is where we have found ourselves at present.
The funny thing is that we did expect some problems but in a different place. Indeed, both ASP.NET and JSF are means to define a page layout and to map input and output of business data. While with the presentation (controls, their compositions, masters, styles and so on) you can find more or less equal analogies, the differences of implementation of data binding is a kind of a pain.
We have found that data binding in ASP.NET is somewhat awkward. Its Eval and Bind is bearable in simple cases but almost unusable when you business data is less trivial, or if you have to apply custom data formatting.
In JSF, with its Expression Language, we can perform two way data binding for rather complex properties like ${data.items[index + 5].property} , or to create property adapters ${my:asSomething(data.bean, "property").Value} , or add standard or custom property converters. In contrast data binding in ASP.NET is limited to simple property path (no expressions are supported), neither custom formatters are supported (try to format number as a telephone number).
Things work well when you're designing ASP.NET application from scratch, as you naturally avoid pitfalls, however when you got existing business logic and need to expose it to the web, you have no other way but to write a lot of code behind just to smooth out the problems that ASP.NET exhibits.
Another solution would be to design something like extender control that would attach more proper data binding and formatting facilities to control properties. That would allow to make page definitions in more declarative way, like what we have now in JSF.
While porting a solution from JSF to ASP.NET we have seen an issue with synchronization of access to a data stored in a session from multiple requests.
Consider a case when you store a business object in a session.
Going through the request lifecycle we observe that this business object may be accessed at different stages: data binding, postback event handler, security filters, other.
Usually this business object is mutable and does not assume concurent access. Browsers, however, may easily issue multiple requests to the same session at the same time. In fact, such behaviour, is not even an exception, as browsers nowadays are often sending concurrent requests.
In the JSF we're using a sync object, which is part of business object itself; lock it and unlock at the begin and at the end of a request correspondingly. This works perfectly as JSF guarantees that:
- lock is released after it's acquired (we use request scope bean with
@PostConstruct and @PreDestroy annotations to lock and unlock);
- both lock and unlock take place in the same thread.
ASP.NET, in contrast, tries to be more asynchronous, and allows for different stages of request to take place in different threads. This could be seen indirectly in the documentation, which does not give any commitments in this regards, and with code inspection where you can see that request can begin in one thread, and a next stage can be queued for the execution into the other thread.
In addition, ASP.NET does not guarantee that if BeginRequest has been executed then EndRequest will also run.
The conclusion is that we should not use locks to synchronize access to the same session object, but rather try to invent other means to avoid data races.
Update msdn states:
Concurrent Requests and Session State
Access to ASP.NET session state is exclusive per session, which means that if two different users make concurrent requests, access to each separate session is granted concurrently. However, if two concurrent requests are made for the same session (by using the same SessionID value), the first request gets exclusive access to the session information. The second request executes only after the first request is finished. (The second session can also get access if the exclusive lock on the information is freed because the first request exceeds the lock time-out.)
This means that the required synchronization is already built into ASP.NET. That's good.
We have implemented report parser in C#. Bacause things are spinned around C#, a
schema definition is changed.
We have started from classes defining a report definition tree, annotated these
classes for xml serialization, and, finally, produced xml schema for such tree.
So, at present, it is not an xml schema with annotations but a separate xml
schema.
In addition we have defined APIs:
- to enumerate report data (having report definition and report data one can get
IEnumerable<ViewValue> to iterate report data in structured form);
- to read report through
XmlReader , which allows, for example, to
have report as input for an xslt tranformation.
- to write report directly into
XmlWriter .
An example of report definition as C# code is:
MyReport.cs. The very same report definition but serialized into xml is
my-report.xml. A generated xml schema for a report definition is:
schema0.xsd.
The good point about this solution is that it's already flexible enough to
describe every report layout we have at hands, and it's extendable. Our
measurments show that report parsing is extremely fast and have very small
memory footprint due to forward only nature of report definitions.
From the design point of view report definition is a view of original text data
with view info attached.
At present we have defined following views:
- Element - a named view to generate output from a content view;
- Content - a view to aggregate other views together;
- Choice - a view to produce output from one of content views;
- Sequence - a view to sequence input view by key expressions, and to attach an
index to each sequence item;
- Iterator - a view to generate output from input view while some condition is
true, and to attach an iteration index to each part of output view;
- Page - a view to remove page headers and footers in the input view, and to
attach an index to each page;
- Compute - a named view to produce result of evaluation of expression as output
view;
- Data - a named view to produce output value from some bounds of input view,
and optionally to convert, validate and format the value.
To specify details of definitions there are:
- expressions to deal with integers:
Add , Div ,
Integer , MatchProperty , Max , Min ,
Mod , Mul , Neg , Null ,
Sub , VariableRef , ViewProperty , Case ;
- conditions to deal with booleans:
And , EQ , GE ,
GT , IsMatch , LE , LT ,
NE , Not , Or .
At present there is no a specification of a report definitions. Probably, it's
the most complex part to create such a spec for a user without deep knowledge.
At present, our idea is that one should use xml schema (we should polish
generated schema) for the report definition and schema aware editor to build
report definitions. That's very robust approach working perfectly with
languages xom.
C# sources can be found at:
ReportLayout.zip including report definition classes and a sample report.
We're facing a task of parsing reports produced from legacy applications and
converting them into a structured form, e.g. into xml. These xml files can be
processed further with up to date tools to produce good looking reports.
Reports at hands are of very different structure and of size: from a couple of KB
to a several GB. The good part is that they mostly have a tabular form, so it's
easy to think of specific parsers in case of each report type.
Our goal is to create an environment where a less qualified person(s) could
create and manage such parsers, and only rarely to engage someone who will handle
less untrivial cases.
Our analysis has shown that it's possible to write such parser in almost any
language: xslt, C#, java.
Our approach was to create an xml schema annotations
that from one side define a data structure, and from the other map report
layout. Then we're able to create an xslt that will generate either xslt, C#, or
java parser according to the schema definitions. Because of
languages
xom, providing XML Object Model and serialization stylesheets for C# and
java, it does not really matter what we shall generate xslt or C#/java, as
code will look the same.
The approach we're going to use to describe reports is
not as powerfull as conventional parsers. Its virtue, however, is simplicity of
specification.
Consider a report sample (a data to extract is in bold):
1 TITLE ... PAGE: 1
BUSINESS DATE: 09/30/09 ... RUN DATE: 02/23/10
CYCLE : ITD RUN: 001 ... RUN TIME: 09:22:39
CM BUS ...
CO NBR FRM FUNC ...
----- ----- ----- -----
XXX 065 065 CLR ...
YYY ...
...
1 TITLE ... PAGE: 2
BUSINESS DATE: 09/30/09 ... RUN DATE: 02/23/10
CYCLE : ITD RUN: 001 ... RUN TIME: 09:22:39
CM BUS ...
CO NBR FRM FUNC ...
----- ----- ----- -----
AAA NNN MMM PPP ...
BBB ...
...
* * * * * E N D O F R E P O R T * * * * *
We're approaching to the report through a sequence of views (filters) of this
report. Each veiw localizes some report data either for the subsequent
filterring or for the extraction of final data.
Looking into the example one can build following views of the report:
- View of data before the "E N D O F R E P O R T" line.
- View of remaining data without page headers and footers.
- Views of table rows.
- Views of cells.
A sequence of filters allows us to build a pipeline of transformations of
original text. This also allows us to generate a clean xslt, C# or java code
to parse the data.
At first, our favorite language for such parser was xslt.
Unfortunatelly, we're dealing with Saxon xslt implementation, which is not very
strong in streaming processing. Without a couple of extension functions to
prevent caching, it tends to cache whole input in the memory, which is not
acceptable.
At present we have decided to start from C# code, which is pure C# naturally.
Code still is in the development but at present we would like to share the xml
schema annotations describing report layout:
report-mapping.xsd, and a sample of report description:
test.xsd.
A few little changes in streaming and in name normalization algorithms in jxom
and in csharpxom and the generation speed almost doubled (especially for big files).
We suspect, however, that our xslt code is tuned for saxon engine.
It would be nice to know if anybody used languages XOM with other engines. Is
anyone using it at all (well, at least there are downloads)?
Languages XOM (jxom, csharpxom, cobolxom, sqlxom) can be loaded from:
languages-xom.zip
|