Being well behind of the latest news and traps of the ASP.NET, we're readily falling on each problem.
This time it's a script injection during data binding.
In JSF there is a component to output data called h:outputText. Its use is like this:
<span jsfc="h:outputText" value="#{myBean.myProperty}"/>
The output is a span element with data bound value embeded into content. The natural alternative in ASP.NET seems to be an asp:Label control:
<asp:Label runat="server" Text="<%# Eval("MyProperty") %>"/>
This almost works except that the h:outputText escapes data (you may override this and specify attribute escape="false" ), and asp:Label never escapes the data.
This looks as a very serious omission in ASP.NET (in fact very close to a security hole). What are chances that when you're creating a new page, which uses data binding, you will not forget to fix code that wizard created for you and to change it to:
<asp:Label runat="server" Text="<%# Server.HtmlEncode(Eval("MyProperty")) %>"/>
Eh? Think what will happen if MyProperty will return a text that looks like a script (e.g.: <script>alert(1)</script> ), while you just wanted to output a label?
To address the issue we've also introduced a property Escape into DataBindExtender. So at present we have a code like this:
<asp:Label runat="server" ID="MyLabel"/> <bphx:DataBindExtender runat="server" TargetControlID="MyLabel" ControlProperty="Text" ReadOnly="true" Escape="true" DataSource="<%# MyBean %>" DataMember="MyProperty"/>
See also: A DataBindExtender, Experience of JSF to ASP.NET migration
After struggling with ASP.NET data binding we found no other way but to introduce our little extender control to address the issue.
We were trying to be minimalistic and to introduce two way data binding and to support data conversion. This way extender control (called DataBindExtender ) have following page syntax:
<asp:TextBox id=TextBox1 runat="server"></asp:TextBox> <cc1:DataBindExtender runat="server" DataSource="<%# Data %>" DataMember="ID" TargetControlID="TextBox1" ControlProperty="Text" />
Two way data binding is provided with DataSource object (notice data binding over this property) and a DataMember property from the one side, and TargetControlID and ControlProperty from the other side. DataBindExtender supports Converter property of type TypeConverter to support custom converters.
DataBindExtender is based on AjaxControlToolkit.ExtenderControlBase class and implements System.Web.UI.IValidator . ExtenderControlBase makes implementation of extenders extremely easy, while IValidator plugs natuarally into page validation (Validate method, Validators collections, ValidationSummary control).
The good point about extenders is that they are not visible in designer, while it exposes properties in extended control itself. The disadvantage is that it requires Ajax Control Toolkit, and also ScriptManager component of the page.
To simplify the use DataBindExtender gets data from control and puts the value into data source in Validate method, and puts data into control in OnPreRender method; thus no specific action is required to perform data binding.
Source for the DataBindExtender is DataBindExtender.cs.
We used to think that ASP.NET is a way too powerful than JSF. It might be still true, but not when you are accustomed to JSF and spoiled with its code practice...
Looking at both technologies from a greater distance, we now realize that they give almost the same level of comfort during development, but they are different. You can feel this after you were working for some time with one technology and now are to implement similar solution in opposite one. That is where we have found ourselves at present.
The funny thing is that we did expect some problems but in a different place. Indeed, both ASP.NET and JSF are means to define a page layout and to map input and output of business data. While with the presentation (controls, their compositions, masters, styles and so on) you can find more or less equal analogies, the differences of implementation of data binding is a kind of a pain.
We have found that data binding in ASP.NET is somewhat awkward. Its Eval and Bind is bearable in simple cases but almost unusable when you business data is less trivial, or if you have to apply custom data formatting.
In JSF, with its Expression Language, we can perform two way data binding for rather complex properties like ${data.items[index + 5].property} , or to create property adapters ${my:asSomething(data.bean, "property").Value} , or add standard or custom property converters. In contrast data binding in ASP.NET is limited to simple property path (no expressions are supported), neither custom formatters are supported (try to format number as a telephone number).
Things work well when you're designing ASP.NET application from scratch, as you naturally avoid pitfalls, however when you got existing business logic and need to expose it to the web, you have no other way but to write a lot of code behind just to smooth out the problems that ASP.NET exhibits.
Another solution would be to design something like extender control that would attach more proper data binding and formatting facilities to control properties. That would allow to make page definitions in more declarative way, like what we have now in JSF.
While porting a solution from JSF to ASP.NET we have seen an issue with synchronization of access to a data stored in a session from multiple requests.
Consider a case when you store a business object in a session.
Going through the request lifecycle we observe that this business object may be accessed at different stages: data binding, postback event handler, security filters, other.
Usually this business object is mutable and does not assume concurent access. Browsers, however, may easily issue multiple requests to the same session at the same time. In fact, such behaviour, is not even an exception, as browsers nowadays are often sending concurrent requests.
In the JSF we're using a sync object, which is part of business object itself; lock it and unlock at the begin and at the end of a request correspondingly. This works perfectly as JSF guarantees that:
- lock is released after it's acquired (we use request scope bean with
@PostConstruct and @PreDestroy annotations to lock and unlock);
- both lock and unlock take place in the same thread.
ASP.NET, in contrast, tries to be more asynchronous, and allows for different stages of request to take place in different threads. This could be seen indirectly in the documentation, which does not give any commitments in this regards, and with code inspection where you can see that request can begin in one thread, and a next stage can be queued for the execution into the other thread.
In addition, ASP.NET does not guarantee that if BeginRequest has been executed then EndRequest will also run.
The conclusion is that we should not use locks to synchronize access to the same session object, but rather try to invent other means to avoid data races.
Update msdn states:
Concurrent Requests and Session State
Access to ASP.NET session state is exclusive per session, which means that if two different users make concurrent requests, access to each separate session is granted concurrently. However, if two concurrent requests are made for the same session (by using the same SessionID value), the first request gets exclusive access to the session information. The second request executes only after the first request is finished. (The second session can also get access if the exclusive lock on the information is freed because the first request exceeds the lock time-out.)
This means that the required synchronization is already built into ASP.NET. That's good.
We have implemented report parser in C#. Bacause things are spinned around C#, a
schema definition is changed.
We have started from classes defining a report definition tree, annotated these
classes for xml serialization, and, finally, produced xml schema for such tree.
So, at present, it is not an xml schema with annotations but a separate xml
schema.
In addition we have defined APIs:
- to enumerate report data (having report definition and report data one can get
IEnumerable<ViewValue> to iterate report data in structured form);
- to read report through
XmlReader , which allows, for example, to
have report as input for an xslt tranformation.
- to write report directly into
XmlWriter .
An example of report definition as C# code is:
MyReport.cs. The very same report definition but serialized into xml is
my-report.xml. A generated xml schema for a report definition is:
schema0.xsd.
The good point about this solution is that it's already flexible enough to
describe every report layout we have at hands, and it's extendable. Our
measurments show that report parsing is extremely fast and have very small
memory footprint due to forward only nature of report definitions.
From the design point of view report definition is a view of original text data
with view info attached.
At present we have defined following views:
- Element - a named view to generate output from a content view;
- Content - a view to aggregate other views together;
- Choice - a view to produce output from one of content views;
- Sequence - a view to sequence input view by key expressions, and to attach an
index to each sequence item;
- Iterator - a view to generate output from input view while some condition is
true, and to attach an iteration index to each part of output view;
- Page - a view to remove page headers and footers in the input view, and to
attach an index to each page;
- Compute - a named view to produce result of evaluation of expression as output
view;
- Data - a named view to produce output value from some bounds of input view,
and optionally to convert, validate and format the value.
To specify details of definitions there are:
- expressions to deal with integers:
Add , Div ,
Integer , MatchProperty , Max , Min ,
Mod , Mul , Neg , Null ,
Sub , VariableRef , ViewProperty , Case ;
- conditions to deal with booleans:
And , EQ , GE ,
GT , IsMatch , LE , LT ,
NE , Not , Or .
At present there is no a specification of a report definitions. Probably, it's
the most complex part to create such a spec for a user without deep knowledge.
At present, our idea is that one should use xml schema (we should polish
generated schema) for the report definition and schema aware editor to build
report definitions. That's very robust approach working perfectly with
languages xom.
C# sources can be found at:
ReportLayout.zip including report definition classes and a sample report.
We're facing a task of parsing reports produced from legacy applications and
converting them into a structured form, e.g. into xml. These xml files can be
processed further with up to date tools to produce good looking reports.
Reports at hands are of very different structure and of size: from a couple of KB
to a several GB. The good part is that they mostly have a tabular form, so it's
easy to think of specific parsers in case of each report type.
Our goal is to create an environment where a less qualified person(s) could
create and manage such parsers, and only rarely to engage someone who will handle
less untrivial cases.
Our analysis has shown that it's possible to write such parser in almost any
language: xslt, C#, java.
Our approach was to create an xml schema annotations
that from one side define a data structure, and from the other map report
layout. Then we're able to create an xslt that will generate either xslt, C#, or
java parser according to the schema definitions. Because of
languages
xom, providing XML Object Model and serialization stylesheets for C# and
java, it does not really matter what we shall generate xslt or C#/java, as
code will look the same.
The approach we're going to use to describe reports is
not as powerfull as conventional parsers. Its virtue, however, is simplicity of
specification.
Consider a report sample (a data to extract is in bold):
1 TITLE ... PAGE: 1
BUSINESS DATE: 09/30/09 ... RUN DATE: 02/23/10
CYCLE : ITD RUN: 001 ... RUN TIME: 09:22:39
CM BUS ...
CO NBR FRM FUNC ...
----- ----- ----- -----
XXX 065 065 CLR ...
YYY ...
...
1 TITLE ... PAGE: 2
BUSINESS DATE: 09/30/09 ... RUN DATE: 02/23/10
CYCLE : ITD RUN: 001 ... RUN TIME: 09:22:39
CM BUS ...
CO NBR FRM FUNC ...
----- ----- ----- -----
AAA NNN MMM PPP ...
BBB ...
...
* * * * * E N D O F R E P O R T * * * * *
We're approaching to the report through a sequence of views (filters) of this
report. Each veiw localizes some report data either for the subsequent
filterring or for the extraction of final data.
Looking into the example one can build following views of the report:
- View of data before the "E N D O F R E P O R T" line.
- View of remaining data without page headers and footers.
- Views of table rows.
- Views of cells.
A sequence of filters allows us to build a pipeline of transformations of
original text. This also allows us to generate a clean xslt, C# or java code
to parse the data.
At first, our favorite language for such parser was xslt.
Unfortunatelly, we're dealing with Saxon xslt implementation, which is not very
strong in streaming processing. Without a couple of extension functions to
prevent caching, it tends to cache whole input in the memory, which is not
acceptable.
At present we have decided to start from C# code, which is pure C# naturally.
Code still is in the development but at present we would like to share the xml
schema annotations describing report layout:
report-mapping.xsd, and a sample of report description:
test.xsd.
A few little changes in streaming and in name normalization algorithms in jxom
and in csharpxom and the generation speed almost doubled (especially for big files).
We suspect, however, that our xslt code is tuned for saxon engine.
It would be nice to know if anybody used languages XOM with other engines. Is
anyone using it at all (well, at least there are downloads)?
Languages XOM (jxom, csharpxom, cobolxom, sqlxom) can be loaded from:
languages-xom.zip
At times a simple task in xslt looks like a puzzle. Today we have this one.
For a string and a regular expression find a position and a length of the matched
substring.
The problem looks so simple that you do not immediaty realize that you are going
to spend ten minutes trying to solve it in the best way.
Try it yourself before proceeding:
<xsl:variable name="match" as="xs:integer*">
<xsl:analyze-string select="$line"
regex="my-reg-ex">
<xsl:matching-substring>
<xsl:sequence select="1, string-length(.)"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:sequence select="0, string-length(.)"/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:variable>
<xsl:choose>
<xsl:when test="$match[1]">
<xsl:sequence
select="1, $match[2]"/>
</xsl:when>
<xsl:when test="$match[3]">
<xsl:sequence select="$match[2], $match[4]"/>
</xsl:when>
</xsl:choose>
To see that the problem with Generator functions in xslt
is a bit more complicated compare two functions.
The first one is quoted from the earlier post:
<xsl:function name="t:generate" as="xs:integer*">
<xsl:param name="value" as="xs:integer"/>
<xsl:sequence select="$value"/>
<xsl:sequence select="t:generate($value * 2)"/>
</xsl:function>
It does not work in Saxon: crashes with out of memory.
The second one is slightly modified version of the same function:
<xsl:function name="t:generate" as="xs:integer*">
<xsl:param name="value" as="xs:integer"/>
<xsl:sequence select="$value + 0"/>
<xsl:sequence select="t:generate($value * 2)"/>
</xsl:function>
It's working without problems. In first case Saxon decides to cache all
function's output, in the second case it decides to evaluate data lazily on
demand.
It seems that optimization algorithms implemented in Saxon are so plentiful and
complex that at times they fool one another. :-)
See also:
Generator functions
There are some complications with streamed tree that we have implemented in
saxon. They are due to the fact that only a view of input data is available at
any time. Whenever you access some element that's is not available you're
getting an exception.
Consider an example. We have a log created with java logging. It looks like
this:
<log>
<record>
<date>...</date>
<millis>...</millis>
<sequence>...</sequence>
<logger>...</logger>
<level>INFO</level>
<class>...</class>
<method>...</method>
<thread>...</thread>
<message>...</message>
</record>
<record>
...
</record>
...
We would like to write an xslt that returns a page
of log as html:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:t="http://www.nesterovsky-bros.com/xslt/this"
xmlns="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="xs t">
<xsl:param name="start-page" as="xs:integer" select="1"/>
<xsl:param name="page-size" as="xs:integer" select="50"/>
<xsl:output method="xhtml" byte-order-mark="yes" indent="yes"/>
<!-- Entry point. -->
<xsl:template match="/log">
<xsl:variable name="start" as="xs:integer"
select="($start-page - 1) * $page-size + 1"/>
<xsl:variable name="records" as="element()*"
select="subsequence(record, $start, $page-size)"/>
<html>
<head>
<title>
<xsl:text>A log file. Page: </xsl:text>
<xsl:value-of select="$start-page"/>
</title>
</head>
<body>
<table border="1">
<thead>
<tr>
<th>Level</th>
<th>Message</th>
</tr>
</thead>
<tbody>
<xsl:apply-templates mode="t:record" select="$records"/>
</tbody>
</table>
</body>
</html>
</xsl:template>
<xsl:template mode="t:record" match="record">
<!-- Make a copy of record to avoid streaming access problems. -->
<xsl:variable name="log">
<xsl:copy-of select="."/>
</xsl:variable>
<xsl:variable name="level" as="xs:string"
select="$log/record/level"/>
<xsl:variable name="message" as="xs:string"
select="$log/record/message"/>
<tr>
<td>
<xsl:value-of select="$level"/>
</td>
<td>
<xsl:value-of select="$message"/>
</td>
</tr>
</xsl:template>
</xsl:stylesheet>
This code does not work. Guess why? Yes, it's subsequence() , which is too greedy.
It always wants to know what's the next node, so it naturally skips a content of
the current node. Algorithmically, such saxon code could be rewritten, and could possibly work
better also in modes other than streaming.
A viable workaround, which does not use subsequence, looks rather untrivial:
<!-- Entry point. -->
<xsl:template match="/log">
<xsl:variable name="start" as="xs:integer"
select="($start-page - 1) * $page-size + 1"/>
<xsl:variable name="end" as="xs:integer"
select="$start + $page-size"/>
<html>
<head>
<title>
<xsl:text>A log file. Page: </xsl:text>
<xsl:value-of select="$start-page"/>
</title>
</head>
<body>
<table border="1">
<thead>
<tr>
<th>Level</th>
<th>Message</th>
</tr>
</thead>
<tbody>
<xsl:sequence select="
t:generate-records(record, $start, $end, ())"/>
</tbody>
</table>
</body>
</html>
</xsl:template>
<xsl:function name="t:generate-records" as="element()*">
<xsl:param name="records" as="element()*"/>
<xsl:param name="start" as="xs:integer"/>
<xsl:param name="end" as="xs:integer?"/>
<xsl:param name="result" as="element()*"/>
<xsl:variable name="record" as="element()?" select="$records[$start]"/>
<xsl:choose>
<xsl:when test="(exists($end) and ($start > $end)) or empty($record)">
<xsl:sequence select="$result"/>
</xsl:when>
<xsl:otherwise>
<!-- Make a copy of record to avoid streaming access problems. -->
<xsl:variable name="log">
<xsl:copy-of select="$record"/>
</xsl:variable>
<xsl:variable name="level" as="xs:string"
select="$log/record/level"/>
<xsl:variable name="message" as="xs:string"
select="$log/record/message"/>
<xsl:variable name="next-result" as="element()*">
<tr>
<td>
<xsl:value-of select="$level"/>
</td>
<td>
<xsl:value-of select="$message"/>
</td>
</tr>
</xsl:variable>
<xsl:sequence select="
t:generate-records
(
$records,
$start + 1,
$end,
($result, $next-result)
)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
Here we observed the greediness of saxon, which too early tried to consume
more input than it's required. In the other cases we have seen that it may
defer actual data access to the point when there is no data anymore.
So, without tuning internal saxon logic it's possible but not easy to write
stylesheets that exploit streaming features.
P.S. Updated sources are at
streamedtree.zip
At some point we needed to have an array with volatile elements in java.
We knew that such beast is not found in the java world. So we searched
the Internet and found the answers that are so wrong, and introduce so obscure
threading bugs that the guys who provided them would better hide them and run immediately to fix their
buggy programs...
The first one is
Volatile arrays
in Java. They suggest such solution:
volatile int[] arr = new int[...];
...
arr[4] = 100;
arr = arr;
The number two:
What Volatile Means in Java
A guy assures that this code works:
Fields:
int answer = 0;
volatile boolean ready = false;
Thread1:
answer = 42;
ready = true;
Thread2:
if (ready)
{
print(answer);
}
They are very wrong! Non volatile access can be reordered by the implementation.
See Java's
Threads and Locks:
The rules for volatile variables effectively require that main memory be touched exactly once for each use or assign of a volatile variable by a thread, and that main memory be touched in exactly the order dictated by the thread execution semantics. However, such memory actions are not ordered with respect to read and write actions on nonvolatile variables.
They probably thought of locks when they argued about volatiles:
a lock action acts as if it flushes all variables from the
thread's working memory; before use they must be assigned or loaded from main
memory.
P.S. They would better recommend
AtomicReferenceArray.
When time has come to process big xml log files we've decided to implement
streamable tree in saxon the very same way it was implemented in .net eight
years ago (see
How would we approach to streaming facility in xslt).
It's interesting enough that the implementation is similar to one of
composable tree. There a node never stores a reference to a parent, while in
the streamed tree no references to children are stored. This way only a limited
subview of tree is available at any time. Implementation does not support
preceding and preceding-sibling axes. Also, one cannot navigate to a node that
is out of scope.
Implementation is external (there are no changes to saxon itself). To use it one
needs to create an instance of DocumentInfo , which pulls data from
XMLStreamReader , and
to pass it as an input to a transformation:
Controller controller =
(Controller)transformer;
XMLInputFactory factory = XMLInputFactory.newInstance();
StreamSource inputSource = new StreamSource(new File(input));
XMLStreamReader reader = factory.createXMLStreamReader(inputSource);
StaxBridge bridge = new StaxBridge();
bridge.setPipelineConfiguration(
controller.makePipelineConfiguration());
bridge.setXMLStreamReader(reader);
inputSource = new DocumentImpl(bridge);
transformer.transform(inputSource, new StreamResult(output));
This helped us to format an xml log file of arbitrary size. An xslt like this can do the work:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="xs">
<xsl:template match="/log">
<html>
<head>
<title>Log</title>
</head>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="message">
...
</xsl:template>
<xsl:template match="message[@error]">
...
</xsl:template>
...
</xsl:stylesheet>
Implementation can be found at:
streamedtree.zip
jxom else if (google search)
Google helps with many things but with retrospective support.
Probably guy's trying to build a nested if then else
jxom elements.
We expected this and have defined a function
t:generate-if-statement() in
java-optimizer.xslt.
Its signature:
<!--
Generates if/then/else if ... statements.
$closure - a series of conditions and blocks.
$index - current index.
$result - collected result.
Returns if/then/else if ... statements.
-->
<xsl:function name="t:generate-if-statement" as="element()">
<xsl:param name="closure" as="element()*"/>
<xsl:param name="index" as="xs:integer"/>
<xsl:param name="result" as="element()?"/>
Usage is like this:
<!-- Generate a sequence of pairs: (condition, scope). -->
<xsl:variable name="branches" as="element()+">
<xsl:for-each select="...">
<!-- Generate condition. -->
<scope>
<!-- Generate statemetns. -->
</scope>
</xsl:for-each>
</xsl:variable>
<xsl:variable name="else" as="element()?">
<!-- Generate final else, if any. -->
</xsl:variable>
<!-- This generates if statement. -->
<xsl:sequence
select="t:generate-if-statement($branches, count($branches)
- 1, $else)"/>
P.S. By the way, we like that someone is looking into jxom.
By the generator we assume a function that produces an infinitive output
sequence for a particular input.
That's a rather theoretical question, as xslt does not allow infinitive
sequence, but look at the example:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:t="http://www.nesterovsky-bros.com/xslt"
exclude-result-prefixes="xs t">
<xsl:template match="/">
<xsl:variable name="value" as="xs:string" select="'10100101'"/>
<xsl:variable name="values" as="xs:integer+"
select="t:generate(1)"/>
<!--<xsl:variable name="values" as="xs:integer+">
<xsl:call-template name="t:generate">
<xsl:with-param name="value" select="1"/>
</xsl:call-template>
</xsl:variable>-->
<xsl:variable name="integer" as="xs:integer" select="
sum
(
for $index in 1 to string-length($value)
return
$values[$index][substring($value, $index, 1) = '1']
)"/>
<xsl:message select="$integer"/>
</xsl:template>
<xsl:function name="t:generate" as="xs:integer*">
<xsl:param name="value" as="xs:integer"/>
<xsl:sequence select="$value"/>
<xsl:sequence select="t:generate($value * 2)"/>
</xsl:function>
<!--<xsl:template name="t:generate" as="xs:integer*">
<xsl:param name="value" as="xs:integer"/>
<xsl:sequence select="$value"/>
<xsl:call-template name="t:generate">
<xsl:with-param name="value" select="$value * 2"/>
</xsl:call-template>
</xsl:template>-->
</xsl:stylesheet>
Here the logic uses such a generator and decides by itself where to break.
Should such code be valid?
From the algorithmic perspective example would better to work, as separation of
generator logic and its use are two different things.
Lately, after playing a little with saxon tree models, we thought that design
would be more cleaner and implementation faster if NamePool were implemented
differently.
Now, saxon is very pessimistic about java objects thus it prefers to encode
qualified names with integers. The encoding and decoding is done in the
NamePool . Other parts of code use these integer values.
Operations done over these integers are:
- equality comparision of two such integers in order to check whether to
qualified or extended names are equal;
- get different parts of qualified name from
NamePool .
We would design this differently. We would:
- create a
QualifiedName class to store all name parts.
- declare
NamePool to create and cache QualifiedName instances.
This way:
- equality comparision would be a reference comparision of two instances;
- different parts of qualified name would become a trivial getter;
- contention of such name pool would be lower.
That's the implementation we would propose:
QualifiedName.java,
NameCache.java
|