RESTful is well known and established archetecture. It does not need our approvals or recomendations.
It's something different that we want to express. We'd like to thank all those people who formulated those ideas.
While one can argue that REST has "always" existed along with the web, from our experience we can see that most of web applications we have created up to the late 200x were stateful.
We can see that many web applications should support the application (session) state. And here REST has given a rule:
client stores a session state;
server does not store a session, and is represented as set of services;
if there is a state that cannot be stored on a client (e.g. due to security reasons), then server should use caches, and be able to reconstruct that state upon cache miss.
Now is a good time for the proliferation of the REST, as even weakest clients (browsers) can implement its requirements.
REST allowed us to drastically simplify development and support, to unload server, and to build better web applications.
We compare web applications we have written a decade ago using ASP.NET, and in last 3-4 years using RESTful ideas both in java and in .NET. Clearly, the later have better performance, but from the support standpoint the most appealing is that you can instantly upgrade the application without impacting users, as there are no sessions on the server. For the same reason you should not puzzle over whether you should use in-process sessions and session stickness with load balancing server, or out-of-process sessions.
Things became simpler:
server now is pure logic through services (WCF, Web API, JAX-RS);
client is gui - jquery, kendoui or other;
aspx/jsf pages gone completely;
Earlier, in the article How To: Load KendoUI Templates from External Files, we were talking about the way to combine project's templates into a single file using Text Templates. Now, we would like to suggest the next step.
KendoUI defines text templates that it knows to transform into functions, at runtime obviously. Thus a template like this:
<tr>
<td data-bind="
text: name"></td>
<td>#: kendo.toString(get("price"), "C") #</td>
<td data-bind="text: unitsInStock"></td>
<td><button
class="k-button"
data-bind="click: deleteProduct">
Delete</button></td>
</tr>
is transformed into a function:
function(data)
{
var o,e=kendo.htmlEncode;
with(data)
{
o='<tr><td data-bind="text: name"></td><td>'+
e( kendo.toString(get("price"), "C") )+
'</td><td data-bind="text: unitsInStock"></td>'
+
'<td><button class="k-button" ' +
'data-bind="click: deleteProduct">Delete</button></td></tr>';
}
return o;
}
The transformation is done through a sequence of of regex replaces.
Now, what's the fastest javascript template engine?
Right! That, which does not work at runtime.
What we thought is that we can generate those functions at compile time rather than defining templates.
We have updated
templates.tt to generate template functions, and optionally to generate
<script>
tags that call those functions. This way, for an
input footer.tmpl.html:
<!DOCTYPE html>
<html>
<head>
<title>Test</title>
<base href="/" />
<link href="styles/kendo.common.min.css" rel="stylesheet" />
<link href="styles/kendo.default.min.css" rel="stylesheet" />
</head>
<body>
<table data-template-id="view">
<tr>
<td>Products count: #: total() #</td>
<td>Total price: #: totalPrice() #</td>
<td colspan="2">Units in stock: #: totalUnitsInStock() #</td>
</tr>
</table>
</body>
</html>
templates.js will look like this:
nesterovskyBros.templates=
{
...
"footer-view":function(data)
{
var o,e=kendo.htmlEncode;
with(data)
{
...
}
return o;
},
...
};
document.write('<script id="footer-view-template" type="text/x-kendo-template">#=nesterovskyBros.templates["footer-view"](data)#</script>');
To get template function at runtime you simply refer to
nesterovskyBros.templates["footer-view"]
.
template.tt now allows you to specify:
scope
- a javascript scope for tempate functions, e.g. "nesterovskyBros.templates
";data-script
attribute over each template (default is true
) to prevent
generation of <script>
tag;data-with-block
attribute (default is true
) to prevent with(data) {...}
statement in javascript.See a sample application that shows how nicely KendoUI UserControls work with those compiled templates.
Awhile ago we have created a set of xml schemas and xslt to represent different languages as xml, and to generate source from those xmls. This way we know to represent and generate: java, c#, cobol, and several sql dialects (read about languages xom on this site).
Here, we'd like to expose a nuisance we had with sql dialects schema.
Our goal was to define a basic sql schema, and dialect extensions. This way we assumed to express general and dialect specific constructs. So, lets consider an example.
General:
-- Select one row
select * from A
DB2:
select * from A fetch first row only
T-SQL:
select top 1 * from A
Oracle:
select * from A where rownum = 1
All these queries have common core syntax, while at the same time have dialect specific means to express intention to return first row only.
Down to the xml schema basic select statement looks like this:
<xs:complexType name="select-statement">
<xs:complexContent>
<xs:extension base="full-select-statement">
<xs:sequence>
<xs:element name="columns" type="columns-clause">
<xs:element name="from" type="from-clause" minOccurs="0">
<xs:element name="where" type="unary-expression" minOccurs="0"/>
<xs:element name="group-by" type="expression-list" minOccurs="0"/>
<xs:element name="having" type="unary-expression" minOccurs="0"/>
<xs:element name="order-by" type="order-by-clause" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="specification" type="query-specification"
use="optional" default="all"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
Here all is relatively clear. The generic select looks like:
<sql:select>
<sql:columns>
<sql:column
wildcard="true"/>
</sql:columns>
<sql:from>
<sql:table name="A"/>
</sql:from>
</sql:select>
But how would you define dialect specifics?
E.g. for T-SQL we would like to see a markup:
<sql:select>
<tsql:top>
<sql:number value="1"/>
</tsql:top>
<sql:columns>
<sql:column
wildcard="true"/>
</sql:columns>
<sql:from>
<sql:table name="A"/>
</sql:from>
</sql:select>
While for DB2 there should be:
<sql:select>
<sql:columns>
<sql:column
wildcard="true"/>
</sql:columns>
<sql:from>
<sql:table name="A"/>
</sql:from>
<db2:fetch-first rows="1"/>
</sql:select>
So, again the quesions are:
Though we have tried several solutions to that problem, none is satisfactory enough.
To allow extensions we have defined that all elements in sql schema are based on
sql-element
, which allows extensions:
<xs:complexType name="sql-element" abstract="true">
<xs:sequence>
<xs:element ref="extension" minOccurs="0"
maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:element name="extension" type="extension"/>
<xs:complexType name="extension" abstract="true">
<xs:complexContent>
<xs:extension base="sql-element"/>
</xs:complexContent>
</xs:complexType>
...
<xs:element name="top" type="top-extension"
substitutionGroup="sql:extension"/>
<xs:complexType name="top-extension">
<xs:complexContent>
<xs:extension base="sql:extension">
<xs:sequence>
<xs:element ref="sql:expression"/>
</xs:sequence>
<xs:attribute name="percent" type="xs:boolean"
use="optional" default="false"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
Unfortunately, this creates too weak typed schema for extensions, thus intellisence suggests too many options.
On a way to a home we usually listen audio books in our car. In most cases these are science fiction stories. One of our favorite authors is Ray Bradbury. Probably you know his short story collection "The Martian Chronicles". Also, we like to glance over technology and science news to stay tuned in the latest innovations.
Recently, we've read an article that states that NASA is going to send a group of scientists-colonists to Mars in an observable future. It sounds in Ray Bradbury style.
Although, we're not specialists in this field, we discussed what difficulties may face such an expedition.
As far as we understand, there are following issues that must be solved before expedition may start:
speed of a spaceship is yet too slow
a shield from radiation doesn't exist
big cargo section to contain sufficient reserve of water, food and oxygen for the expedition to survive
a way to produce enough energy to survive on Mars
Actually, the second issue depends on the first one. At present there is no reliable long term protection from cosmic radiation, which can be installed on a spacecraft. At least we didn't hear about such thing. Thus, long staying in the open space will seriously harm colonists and may bring to naught the whole mission.
Сonsidering these facts, we concluded that a space travel further than the moon is not possible at present.
What could be done to solve these issues?
As a solution of space colonization could be the following. The earthlings won't send people but robots with corresponding equipment and containers for man, animals and plants DNAs or embryos. At the destination place robots will build a colony and will begin to grow people, animals, plants; to train them and to serve them later (at least to man " src="http://www.nesterovsky-bros.com/weblog/smilies/wink.gif"/> ).
How all this may solve the mentioned issues?
It's much easier to create reliable shield from radiation for small DNA or embrios containers.
time and speed of a spacecraft, in such case, will impact less on the mission.
weight of spacecraft in this case could be less, or it may get more payload.
Thus, to bring the era of cosmic expansion, human kind must invest in the development of robotics, artificial intelligence, genetic engineering, development of rapid learning, among other scientific fields in space exploration.
Right now, you may see a beginning? of this trend in sciense here DFKI's robot ape to colonize the Moon?
In the far future, after the beginning will be forgotten, all this may lead to question: who was the first a man or a robot? " src="http://www.nesterovsky-bros.com/weblog/smilies/wink.gif"/>
We heavily use kendo.ui.Splitter widget. Unfortunately it has several drawbacks:
Although we don't like to patch widgets, in this case we found no better
way but to patch two functions: kendo.ui.Splitter.fn._initPanes
,
and kendo.ui.Splitter.fn._resize
.
After the fix, splitter markup may look like the following:
<div style="height: 100%"
data-role="splitter"
data-orientation="vertical">
<div data-pane='{ size: "auto", resizable: false, scrollable: false }'>
Header with size depending on content.
</div>
<div data-pane='{ resizable: false, scrollable: true }'>
Body with size equal to a remaining area.
</div>
<div data-pane='{ size: "auto", resizable: false, scrollable: false }'>
Footer with size depending on content.
</div>
</div>
Each pane may define a data-pane
attribute with pane parameters. A pane may
specify size = "auto"
to take space according to its content.
The code can be found at splitter.js A test can be seen at splitter.html.
To simplify KendoUI development we have defined nesterovskyBros.data.Model
, which extends kend.data.Model
class.
nesterovskyBros.data.Model
kendo.data.Model
there is fields
Object - a set of key/value pairs to configure the model fields, but fields have some more options:
fields.fieldName.serializable Boolean
- indicates whether the field appears in an object returned in model.toJSON()
. Default is true
.fields.fieldName.updateDirty Boolean
- indicates whether the change of the property should trigger dirty
field change. Default is true
.model.set()
method then dirty
change event is triggered (provided that fields.fieldName.updateDirty !== false). This helps to build a dependcy graph on that property.Here is an example of a model:
nesterovskyBros.data.ProductModel = nesterovskyBros.data.Model.define(
{
fields:
{
name: { type: "string", defaultValue: "Product Name" },
price: { type: "number", defaultValue: 10 },
unitsInStockValue: { type: "number", defaultValue: 10, serializable: false },
unitsInStock: { type: "string" }
},
unitsInStock: function(value)
{
if (value === undefined)
{
var count = this.get("unitsInStockValue");
return ["one", "two", "three", "four"][count] || (count + "");
}
else
{
this.set("unitsInStockValue", ({one: 1, two: 2, three: 3, four: 4 })[value] || value);
}
}
});
Notice that:
unitsInStock
property is implemented as a function - this helps to map model values to presentation values. model.toJSON()
, or JSON.stringify()
you will see in result name
, price
, unitsInStock
values only - this helps to get model's state and to store it somewhere (e.g. in sessionStorage
). var model = new nesterovskyBros.data.ProductModel({ price: "7", unitsInStock: "one" });
(typeof(model.price) == "number") && (mode.price == 7) && (model.name == "Product Name") && (model.unitsInStockValue == 1)
As with UserControl the implemntation is defined in the controls.js. The sample page is the same index.html
Two weeks ago we've gotten new Lenovo 13" laptops (Yoga-13 with touch screens and Windows 8 Pro on board).
The first expression was WOW! Touch screens! Windows 8! Now we'll try our hand on that new (for us) API. So new, so cool...
A day later. What a shit this new UI. Where are my desktop, "Start" button, all the programs... After googling we've understood - we're not alone.
Few more days later. We've recognized that our SSD hard disk won't live long life with our projects. We generates output several GB a day. Thus we've decided to buy external SD cards - additional 64Gb, class 10. That's enough for us. No sooner said than done. After several attempts to copy our projects from hard drive to SD card (~9Gb of sources) we strongly believe that such a vigorous mix (Lenovo + Win 8 + external SD card) won't survive. Windows 8 hangs up when display off (in middle of data copy, after an hour of work). What a .... of .... this Windows 8, Lenovo and SD cards all together.
Developing with KendoUI we try to formalize tasks. With this in mind we would like to have user controls.
We define user control as following:
It is a javascript class that extends Widget.
It offers a way to reuse UI.
It allows to define a model and a template with UI and data binding.
Unfortunately, KendoUI does not have such API, though one can easily define it; so we have defined our version.
Here we review our solution. We have taken a grid KendoUI example and converted it into a user control.
See index.html
<!DOCTYPE html>
<html>
<head>
<title>Test</title>
<!-- (1) Include templates for controls. -->
<script src="scripts/templates.js"></script>
<script src="scripts/jquery/jquery.js"></script>
<script src="scripts/kendo/kendo.web.min.js"></script>
<!-- (2) UserControl definition. -->
<script src="scripts/controls.js"></script>
<!-- (3) Confirm dialog user control. -->
<script src="scripts/controls/confirm.js"></script>
<!-- (4) Products user control. -->
<script src="scripts/controls/products.js"></script>
<link href="styles/kendo.common.min.css" rel="stylesheet" />
<link href="styles/kendo.default.min.css" rel="stylesheet" />
<script>
$(function ()
{
// (5) Bind the page.
kendo.bind(
document.body,
// (6) Model as a datasource.
{ source: [new nesterovskyBros.data.ProductsModel] });
});
</script>
</head>
<body>
<!-- (7) User control and its binding. -->
<div data-role="products" data-bind="source: source"></div>
</body>
</html>
That's what we see here:
Declaration consists of a view and a model.
View is html with data binding. See products.tmpl.html
We build our project using Visual Studio, so templates packaging is done with templates.tt. This transformation converts products template into a tag:
<script id="products-template" type="text/x-kendo-template">
thus template can be referred by a utility function: nesterovskyBros.template("products-template")
.
Model inherits kedo.data.Model. Here how it looks:
// (1) Define a ProducsModel class.
nesterovskyBros.data.ProductsModel = kendo.data.Model.define(
{
// (2) Model properties.
fields:
{
productName: { type: "string", defaultValue: "Product Name" },
productPrice: { type: "number", defaultValue: 10 },
productUnitsInStock: { type: "number", defaultValue: 10 },
products: { type: "default", defaultValue: [] }
},
// (3) Model methods.
addProduct: function ()
{
...
},
deleteProduct: function (e)
{
...
},
...
});
// (4) Register user control.
nesterovskyBros.ui.Products = nesterovskyBros.defineControl(
{
name: "Products",
model: nesterovskyBros.data.ProductsModel
});
That's what we have here:
nesterovskyBros.defineControl(proto)
call, where:proto.name
- defines user control name;proto.model
- defines model type;proto.template
- defines optional template. If not specified, a template is retrieved from $("#" + proto.name.toLowerCase() + "-template").html()
.Now, what's remained is API for the UserControl. See controls.js.
change
- triggered when data source is changed;dataBound
- triggered when widget is data bound;dataBinding
- triggered befor widget data binding;save
- used to notify user to save model state.autoBind (default false)
- autoBind data source;template (default $.noop)
- user control template.dataSource
field and setDataSource()
method.rebind()
method to manually rebuild widget's view from the template and model.model.refresh
method is called, if any.nesterovskyBros.defineControl(proto)
. See above.options.name
- a user control name (used in the data-role);options.model
- a model type;options.windowOptions
- a window options.kendo.ui.Window
) based on the user control.model()
function that returns an instance of model.dialog()
function that returns an instance of the dialog.result()
function that returns an instance of deferred object used to track dialog completion.
deleteProduct: function(e)
{
var that = this;
return nesterovskyBros.dialog.confirm(
{
title: "Please confirm",
message: "Do you want to delete the record?",
confirm: "Yes",
cancel: "No"
}).
open().
center().
result().
then(
function(confirmed)
{
if (!confirmed)
{
return;
}
...
});
}
User controls along with technique to manage and cache templates allow us to build robust web applications. As the added value it's became a trivial task to build SPA.
See also: Compile KendoUI templates.
Our goal is to generate reports in streaming mode.
At some point we need to deal with data streams (e.g. xml streams for xslt
transformations). Often a nature of report demands several passes through the data.
To increase performance we have defined a class named StreamResource
.
This class encapsulates input data, reads it once and caches it into a temp
file; thus data can be traversed many times. StreamResource
can
read data lazily or in a eager way thus releasing resources early.
This class can be used as a variation of PipeStream
, which never blocks, as if
a size of a buffer is not limited, and which can be read many times.
The API looks like this:
public class StreamResource: IDisposable
{
/// <summary>
/// Creates a StreamSource instance.
/// </summary>
/// <param name="source">
/// A function that returns source as an input stream.
/// </param>
/// <param name="settings">Optional settings.</param>
public StreamResource(Func<Stream> source, Settings settings = null);
/// <summary>
/// Creates a StreamSource instance.
/// </summary>
/// <param name="source">
/// A function that writes source data into an output stream.
/// </param>
/// <param name="settings">Optional settings.</param>
public StreamResource(Action<Stream> source, Settings settings = null);
/// <summary>
/// Gets an input stream.
/// </summary>
/// <param name="shared">
/// Indicates that this StreamResouce should be disposed when returned
/// stream is closed and there are no more currently opened cache streams.
/// </param>
/// <returns>A input stream.</returns>
public Stream GetStream(bool shared = false);
}
The use pattern is following:
// Acquire resource.
using(var resource = new StreamResource(() =>
CallService(params...)))
{
// Read stream.
using(var stream = resource.GetStream())
{
...
}
...
// Read stream again.
using(var stream = resource.GetStream())
{
...
}
}
StreamResource
is efficient even if you need to process content only once, as
it monitors timings of reading of source data and compares it with timings of
data consumption. If the difference exceeds some threshold then StreamResource
caches source greedily, otherwise source is pooled lazily. Thus, input resources
can be released promptly. This is important, for example, when the source
depends on a database connection.
The use pattern is following:
// Acquire resource and get shared stream.
using(var stream = new StreamResource(() =>
CallService(params...)).GetStream(true))
{
...
}
Finally, StreamResource
allows to process
data in a pipe stream mode. This is when you have a generator function
Action<Stream>
that can write to a stream, and you want to read that data.
The advantage of StreamResource
over real pipe stream is that it
can work without blocking of generator, thus releasing resources early.
The use pattern is similar to the previous one:
using(var stream = new StreamResource(output =>
Generate(output, params...)).GetStream(true))
{
...
}
The source of the class can be found at Streaming.zip.
If you deal with web applications you probably have already dealt with export data to Excel. There are several options to prepare data for Excel:
You may find a good article with pros and cons of each solution here. We, in our turn, would like to share our experience in this field. Let's start from requirements:
All these requirements led us to a solution based on XSLT processing of streamed data. The advantage of this solution is that the result is immediately forwarded to a client as fast as XSLT starts to generate output. Such approach is much productive than generating of XLSX using of Open XML SDK or any other third party library, since it avoids keeping a huge data-sets in memory on the server side.
Another advantage - is simple maintenance, as we achieve clear separation of data and presentation layers. On each request to change formatting or apply another style to a cell you just have to modify xslt file(s) that generate variable parts of XLSX.
As result, our clients get XLSX files according with Open XML specifications. The details of implementations of our solution see in our next posts.
We're implementing UDT changes in the big database. Earlier, that
User Defined Type was based on smallint
, and now we have to use int
as the base.
The impact here is manyfold:
Now, we're trying to address (3), (5) and to implement (4), while trying to keep interface with clients using old types.
As for database size impact, we have found that an index fragmentation is a primary disk space waster (see Reorganize index in SQL Server). We have performed some partial index reorganization and can see now that we can gain back hundreds of GB of a disk space. On the other hand we use page compression, so we expect that change of types will not increase sizes of tables considerably. Indeed, our measurments show that tables will only be ~1-3% bigger.
The change of types of columns is untrivial task. The problem is that if you try to change column's type (which is part of clustered index) directly then you should temporary remove foreign keys, and to rebuild all indices. This won't work neither due to disk space required for the operation (a huge transaction log is required), nor due to availability of tables (we're talking about days or even weeks to rebuild indices).
To work-around the problem we have selected another way. For each target table T we performed the following:
Note that:
After transfer will be complete we shall drop T_old tables, and T views, and will rename T_new tables into T.
This will complete part 4 of the whole task. Our estimations are that it will take a month or even more to complete the transfer. However solution is rather slow, the database will stay online whole this period, which is required condition.
The next task is to deal with type changes in parameters of stored procedures and column types of output result sets. We're not sure yet what's the best way to deal with it, and probably shall complain about in in next posts.
We have a large table in the form:
create table dbo.Data
(
Date date not null,
Type int not null,
Value nvarchar(50) null,
primary key clustered(Date, Type)
);
create unique nonclustered index IX_Data on dbo.Data(Type, Date);
Among other queries we often need a snapshot of data per each Type
for a latest
Date
available:
select
max(Date) Date,
Type
from
dbo.Data
group by
Type
We have
found that the above select does not run well on our data set. In fact dbo.Data
grows with time, while snapshot we need stays more or less of the same size. The
best solution to such query is to precalculate it. One way would be to create an
indexed view, but SQL Server does not support max()
aggregate in indexed views.
So, we have decided to add additional bit
field dbo.Data.Last
indicating that
a row belongs to a last date snapshot, and to create filtered index to access
that snapshot:
create table dbo.Data
(
Date date not null,
Type int not null,
Value nvarchar(50) null,
Last bit not null default 0,
primary key clustered(Date, Type)
);
create unique nonclustered index IX_Data on dbo.Data(Type, Date);
create unique nonclustered index IX_Data_Last on dbo.Data(Type)
include(Date)
where Last = 1;
One way to support Last
indicator is to create a trigger that will adjust Last
value:
create trigger dbo.Data_Update on dbo.Data
after insert,delete,update
as
begin
if (trigger_nestlevel(@@procid) < 2)
begin
set nocount on;
with
D as
(
select Date, Type from deleted
union
select Date, Type from inserted
),
U as
(
select
V.Date, V.Type
from
D
inner join
dbo.Data V
on
(V.Last = 1) and
(V.Type = D.Type)
union
select
max(V.Date) Date,
V.Type
from
D
inner join
dbo.Data V
on
V.Type = D.Type
group by
V.Type
),
V as
(
select
rank()
over(partition by
V.Type
order by
V.Date desc) Row,
V.*
from
dbo.Data V
inner join
U
on
(V.Date = U.Date) and
(V.Type = U.Type)
)
update V
set
Last = 1 - cast(Row - 1 as bit);
end;
end;
With Last
indicator in action, our original query has been transformed to:
select Date, Type
from dbo.Data where Last = 1
Execution plan shows that a new filtered index
IX_Data_Last
is used. Execution speed has increased considerably.
As our actual table contains other bit fields, so Last
indicator did not
increase the table size, as SQL Server packs each 8 bit fields in one byte.
Earlier we have shown how to build streaming xml reader from business data and have reminded about ForwardXPathNavigator which helps to create a streaming xslt transformation. Now we want to show how to stream content produced with xslt out of WCF service.
To achieve streaming in WCF one needs:
1. To configure service to use streaming. Description on how to do this can be found in the internet. See web.config of the sample Streaming.zip for the details.
2. Create a service with a method returning Stream
:
[ServiceContract(Namespace = "http://www.nesterovsky-bros.com")]
[AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)]
public class Service
{
[OperationContract]
[WebGet(RequestFormat = WebMessageFormat.Json)]
public Stream GetPeopleHtml(int count,
int seed)
{
...
}
}
2. Return a Stream
from xsl transformation.
Unfortunately (we mentioned it already), XslCompiledTransform
generates its
output into XmlWriter
(or into output Stream
) rather than exposes result as
XmlReader
, while WCF gets input stream and passes it to a client.
We could generate xslt output into a file or a memory Stream
and then return
that content as input Stream
, but this will defeat a goal of streaming, as
client would have started to get data no earlier that the xslt completed its
work. What we need instead is a pipe that form xslt output Stream
to an input
Stream
returned from WCF.
.NET implements pipe streams, so our task is trivial.
We have defined a utility method that creates an input Stream
from a generator
populating an output Stream
:
public static Stream GetPipedStream(Action<Stream> generator)
{
var output = new AnonymousPipeServerStream();
var input = new AnonymousPipeClientStream(
output.GetClientHandleAsString());
Task.Factory.StartNew(
() =>
{
using(output)
{
generator(output);
output.WaitForPipeDrain();
}
},
TaskCreationOptions.LongRunning);
return input;
}
We wrapped xsl transformation as such a generator:
[OperationContract]
[WebGet(RequestFormat = WebMessageFormat.Json)]
public Stream GetPeopleHtml(int count, int seed)
{
var context = WebOperationContext.Current;
context.OutgoingResponse.ContentType = "text/html";
context.OutgoingResponse.Headers["Content-Disposition"] =
"attachment;filename=reports.html";
var cache = HttpRuntime.Cache;
var path = HttpContext.Current.Server.MapPath("~/People.xslt");
var transform = cache[path] as XslCompiledTransform;
if (transform == null)
{
transform = new XslCompiledTransform();
transform.Load(path);
cache.Insert(path, transform, new CacheDependency(path));
}
return Extensions.GetPipedStream(
output =>
{
// We have a streamed business data.
var people = Data.CreateRandomData(count, seed, 0, count);
// We want to see it as streamed xml data.
using(var stream =
people.ToXmlStream("people", "http://www.nesterovsky-bros.com"))
using(var reader = XmlReader.Create(stream))
{
// XPath forward navigator is used as an input source.
transform.Transform(
new ForwardXPathNavigator(reader),
new XsltArgumentList(),
output);
}
});
}
This way we have build a code that streams data directly from business data to a client in a form of report. A set of utility functions and classes helped us to overcome .NET's limitations and to build simple code that one can easily support.
The sources can be found at Streaming.zip.
In the previous
post about streaming we have dropped at the point where we have XmlReader
in hands, which continously gets data from IEnumerable<Person>
source.
Now we shall remind about ForwardXPathNavigator
- a class we have built
back in 2002, which adds streaming transformations to .NET's xslt processor.
While XslCompiledTransform
is desperately obsolete, and no upgrade
will possibly follow; still it's among the fastest xslt 1.0 processors. With
ForwardXPathNavigator
we add ability to transform input data of arbitrary size to this processor.
We find it interesting that
xslt 3.0 Working Draft defines streaming processing in a way that closely
matches rules for ForwardXPathNavigator
:
Streaming achieves two important objectives: it allows large documents to be transformed without requiring correspondingly large amounts of memory; and it allows the processor to start producing output before it has finished receiving its input, thus reducing latency.
The rules for streamability, which are defined in detail in 19.3 Streamability Analysis, impose two main constraints:
The only nodes reachable from the node that is currently being processed are its
attributes and namespaces, its ancestors and their attributes and namespaces, and
its descendants and their attributes and namespaces. The siblings of the node, and
the siblings of its ancestors, are not reachable in the tree, and any attempt to
use their values is a static error. However, constructs (for example, simple forms
of xsl:number
, and simple positional patterns) that require knowledge
of the number of preceding elements by name are permitted.
When processing a given node in the tree, each descendant node can only be visited once. Essentially this allows two styles of processing: either visit each of the children once, and then process that child with the same restrictions applied; or process all the descendants in a single pass, in which case it is not possible while processing a descendant to make any further downward selection.
The only significant difference between ForwardXPathNavigator
and
xlst 3.0 streaming is in that we reported violations of rules for streamability
at runtime, while xslt 3.0 attempts to perform this analysis at compile time.
Here the C# code for the xslt streamed transformation:
var transform = new XslCompiledTransform();
transform.Load("People.xslt");
// We have a streamed business data.
var people = Data.CreateRandomData(10000, 0, 0, 10000);
// We want to see it as streamed xml data.
using(var stream =
people.ToXmlStream("people", "http://www.nesterovsky-bros.com"))
using(var reader = XmlReader.Create(stream))
using(var output = File.Create("people.html"))
{
// XPath forward navigator is used as an input source.
transform.Transform(
new ForwardXPathNavigator(reader),
new XsltArgumentList(),
output);
}
Notice how XmlReader
is wrapped into ForwardXPathNavigator
.
To complete the picture we need xslt that follows the streaming rules:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:msxsl="urn:schemas-microsoft-com:xslt"
xmlns:d="http://www.nesterovsky-bros.com"
exclude-result-prefixes="msxsl d">
<xsl:output method="html" indent="yes"/>
<!-- Root template processed in the streaming mode. -->
<xsl:template match="/d:people">
<html>
<head>
<title>List of persons</title>
<style type="text/css">
.even
{
}
.odd
{
background: #d0d0d0;
}
</style>
</head>
<body>
<table border="1">
<tr>
<th>ID</th>
<th>First name</th>
<th>Last name</th>
<th>City</th>
<th>Title</th>
<th>Age</th>
</tr>
<xsl:for-each select="d:person">
<!--
Get element snapshot.
A
snapshot allows arbitrary access to the element's content.
-->
<xsl:variable name="person">
<xsl:copy-of select="."/>
</xsl:variable>
<xsl:variable name="position" select="position()"/>
<xsl:apply-templates mode="snapshot" select="msxsl:node-set($person)/d:person">
<xsl:with-param name="position" select="$position"/>
</xsl:apply-templates>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
<xsl:template mode="snapshot" match="d:person">
<xsl:param name="position"/>
<tr>
<xsl:attribute name="class">
<xsl:choose>
<xsl:when test="$position mod 2 = 1">
<xsl:text>odd</xsl:text>
</xsl:when>
<xsl:otherwise>
<xsl:text>even</xsl:text>
</xsl:otherwise>
</xsl:choose>
</xsl:attribute>
<td>
<xsl:value-of select="d:Id"/>
</td>
<td>
<xsl:value-of select="d:FirstName"/>
</td>
<td>
<xsl:value-of select="d:LastName"/>
</td>
<td>
<xsl:value-of select="d:City"/>
</td>
<td>
<xsl:value-of select="d:Title"/>
</td>
<td>
<xsl:value-of select="d:Age"/>
</td>
</tr>
</xsl:template>
</xsl:stylesheet>
So, we have started with a streamed entity data, proceeded to the streamed XmlReader and reached to the streamed xslt transformation.
But at the final post about streaming we shall remind a simple way of building WCF service returning html stream from our xslt transformation.
The sources can be found at Streaming.zip.
For some reason neither .NET's XmlSerializer
nor DataContractSerializer
allow
reading data through an XmlReader
. These APIs work other way round writing data
into an XmlWriter
. To get data through XmlReader
one needs to write it to some
destination like a file or memory stream, and then to read it using XmlReader
.
This complicates streaming design considerably.
In fact the very same happens with other .NET APIs.
We think the reason of why .NET designers preferred XmlWriter
to XmlReader
in
those APIs is that XmlReader
's implementation is a state machine like, while
XmlWriter
's implementation looks like a regular procedure. It's much harder to
manually write and to support a correct state machine logic
than a procedure.
If history would have gone slightly
different way, and if yield return, lambda, and Enumerator API appeared before
XmlReader
, and XmlWriter
then, we think, both these classes looked differently.
Xml source would have been described with a IEnumerable<XmlEvent>
instead of
XmlReader
, and XmlWriter
must be looked like a function receiving
IEnumerable<XmlEvent>
. Implementing XmlReader
would have meant a creating a
enumerator. Yield return and Enumerable API would have helped to implement it in
a procedural way.
But in our present we have to deal with the fact that DataContractSerializer
should write the data into XmlWriter
, so let's assume we have a project that
uses Entity Framework to access the database, and that you have a data class
Person
, and data access method GetPeople()
:
[DataContract(Name = "person", Namespace = "http://www.nesterovsky-bros.com")]
public class Person
{
[DataMember] public int Id { get; set; }
[DataMember] public string FirstName { get; set; }
[DataMember] public string LastName { get; set; }
[DataMember] public string City { get; set; }
[DataMember] public string Title { get; set; }
[DataMember] public DateTime BirthDate { get; set; }
[DataMember] public int Age { get; set; }
}
public static IEnumerable<Person> GetPeople() { ... }
And your goal is to expose result of GetPeople()
as XmlReader
.
We achieve result with three simple steps:
JoinedStream
- an input Stream
implementation that
reads data from a enumeration of streams (IEnumerable<Stream>
).IEnumerable<Stream>
.The code is rather simple, so here we qoute its essential part:
public static class Extensions
{
public static Stream JoinStreams(this IEnumerable<Stream> streams, bool closeStreams = true)
{
return new JoinedStream(streams, closeStreams);
}
public static Stream ToXmlStream<T>(
this IEnumerable<T> items,
string rootName = null,
string rootNamespace = null)
{
return items.ToXmlStreamParts<T>(rootName, rootNamespace).
JoinStreams(false);
}
private static IEnumerable<Stream> ToXmlStreamParts<T>(
this IEnumerable<T> items,
string rootName = null,
string rootNamespace = null)
{
if (rootName == null)
{
rootName = "ArrayOfItems";
}
if (rootNamespace == null)
{
rootNamespace = "";
}
var serializer = new DataContractSerializer(typeof(T));
var stream = new MemoryStream();
var writer = XmlDictionaryWriter.CreateTextWriter(stream);
writer.WriteStartDocument();
writer.WriteStartElement(rootName, rootNamespace);
writer.WriteXmlnsAttribute("s", XmlSchema.Namespace);
writer.WriteXmlnsAttribute("i", XmlSchema.InstanceNamespace);
foreach(var item in items)
{
serializer.WriteObject(writer, item);
writer.WriteString(" ");
writer.Flush();
stream.Position = 0;
yield return stream;
stream.Position = 0;
stream.SetLength(0);
}
writer.WriteEndElement();
writer.WriteEndDocument();
writer.Flush();
stream.Position = 0;
yield return stream;
}
private class JoinedStream: Stream
{
public JoinedStream(IEnumerable<Stream> streams, bool closeStreams = true)
...
}
}
The use is even more simple:
// We have a streamed business data.
var people = GetPeople();
// We want to see it as streamed xml data.
using(var stream = people.ToXmlStream("persons", "http://www.nesterovsky-bros.com"))
using(var reader = XmlReader.Create(stream))
{
...
}
We have packed the sample into the project Streaming.zip.
In the next post we're going to remind about streaming processing in xslt.
We're pleased to work with Kendo UI. Its design is good, however we find here and there things we would wish be done better. Here is a list of problems in a no particular order we would like to be addressed in the next release:
with
statement?)parseExact(value, format, culture)
method should be rewritten, as it has
nothing in common with parsing data string according to exact format.parseOption()
method). It works neither for string "1,2", nor json " { x: 0 } ", nor for date.Some time ago we were taking a part in a project where 95% of all sources are xslt 2.0. It was a great experience for us.
The interesting part is that we used xslt in areas we would never expect it in early 2000s. It crunched gigabytes of data in offline, while earlier we generally sought xslt application in a browser or on a server as an engine to render the data.
Web applications (both .NET and java) are in our focus today, and it became hard to find application for xslt or xquery.
Indeed, client side now have a very strong APIs: jquery, jqueryui, jsview, jqgrid, kendoui, and so on. These libraries, and today's browsers cover developer's needs in building managable applications. In contrast, a native support of xslt (at least v2) does not exist in browsers.
Server side at present is seen as a set of web services. These services support both xml and json formats, and implement a business logic only. It would be a torture to try to write such a frontend in xslt/xquery. A server logic itself is often dealing with a diversity of data sources like databases, files (including xml files) and other.
As for a database (we primarily work with SQL Server 2008 R2), we think that all communication should go through stored procedures, which implement all data logic. Clearly, this place is not for xslt. However, those who know sql beyond its basics can confirm that sql is very similar to xquery. More than that SQL Server (and other databases) integrate xquery to work with xml data, and we do use it extensively.
Server logic itself uses API like LINQ to manipulate with different data sources. In fact, we think that one can build a compiler from xquery 3.0 to C# with LINQ. Other way round compiler would be a whole different story.
The net result is that we see little place for xslt and xquery. Well, after all it's only a personal perspective on the subject. The similar type of thing has happened to us with C++. As with xslt/xquery we love C++ very much, and we fond of C++11, but at present we have no place in our current projects for C++. That's pitty.
P.S. Among other things that play against xslt/xquery is that there is a shortage of people who know these languages, thus who can support such projects.
Let's start from a distance.
We support a busy database for a customer. Customer's requirement (in fact, state's requirement) is that the database should have audit logs. This means that all important requests should be logged. These logs help both for the offline security analysis, and for the database health monitoring.
Before the end of the last year we used SQL Server 2005, and then customer has upgraded to SQL Server 2008 R2.
As by design the database is accessed through Stored Procedures only, so the logging was done using a small SP that traced input parameters and execution time. The call to that SP was inserted throughout the code of other SPs.
We expected SQL Server 2008 R2 to simplify the task, and to allow us to switch the audit on and off on a fine grained level without the need to change a SP in the production (see Understanding SQL Server Audit for details).
Unfortunatelly, we have almost immediately found that
the current audit implementation traces SP calls but does not store parameter values. This way, you can see that
there was a call "execute X @param1, @param2
", but you
have no idea what values were passed. Internet search shows that this a known problem (see
SQL Server 2008 Database Audit on INSERT UPDATE and DELETE actual SQL and not
parameter values), which renders SQL Server Audit useless.
But nevertheless, looking at how can we simplify our hand-made audit we have found a brilliant solution: "Light weight SQL Server procedure auditing without using SQL Server auditing". It's so simple, that it's a shame that we did not invent it ourselves! The approach is to insert or remove tracing code automatically. Indeed, there is nothing but data in the database, even the text of SP is only a data.
To automate it even more, we have defined a small table with names of procedures
and their log levels, and have defined a procedure "Log.SetLevel @level
" to configure
all logging in one go. In addition we have simplified logging procedures and
tables, and started to store parameters in xml columns rather than in a
pipe-concatenated strings.
Now, to the negative SP execution times.
The logging code among other things measures current_timestamp
at
the begin and at the end of the execution of SP. This helps us (as developers)
to monitor how database performs on a day to day basis, and to build many useful
statistics.
For example we can see that the duration of about 10% of untrivial selects is 0ms (execution time is under 1ms). This means that SQL Server is good at data caching. But what is most interesting is that about 0.1% of requests have negative duration!
You could speculate on parallel or on out of order execution, but the paradox is resolved when you look closely on a value of duration. It's always around of -7,200,000ms. No one will assume that execution has ended two hours before it has started. So, what does it mean -2 hours? Well, we live in (UTC+02:00) Jerusalem time zone. We think that UTC offset crawls somehow into the result. To prove our hypothesis we would like to change time zone on sql servers, but customer won't agree on such an experiment.
This effect probably means that there is some hidden bug in SQL Server 2008 R2 that we cannot
reliably reproduce, but we can see that the datediff(ms,
start_timestamp, end_timestamp)
may return negative value when it's known
that start_timestamp
is acquired before end_timestamp
.
Update: What a shame. During tunning of the original logging procedures we have changed type from datetime
to datetime2
, and calls from GETUTCDATE()
to current_timestamp
, except one place (default value in the table definition) where it remained with GETUTCDATE()
.
So, negative durations meant operation timeout (in our case duration is greater than 30 secs).
After C++11 revision has been approved a new cycle of C++ design has begun:
N3370: The C++ standards committee is soliciting proposals for additional library components. Such proposals can range from small (addition of a single signature to an existing library) to large (something bigger than any current standard library component).
At this stage it's interesting to read papers, as authors try to express ideas rather than to formulate sentences that should go into spec as it lately was.
These are just several papers that we've found interesting:
N3322 | 12-0012 | A Preliminary Proposal for a Static if | Walter E. Brown |
N3329 | 12-0019 | Proposal: static if declaration | H. Sutter, W. Bright, A. Alexandrescu |
Those proposals argue about compile time "if statement". The feature can replace #if
preprocessor
directive, a SFINAE or in some cases template specializations.
A static if
declaration can appear wherever a declaration or a statement is
legal.
Authors also propose to add static if
clause to a class and a function declarations
to conditionally exclude them from the scope.
Examples:
// Compile time factorial.
template <unsigned n>
struct factorial
{
static if (n <= 1)
{
enum : unsigned { value = 1 };
}
else
{
enum : unsigned { value = factorial<n - 1>::value * n };
}
};
// Declare class provided a condition is true.
class Internals if (sizeof(void*) == sizeof(int));
Paper presents strong rationale why this addition helps to build better
programs, however the questions arise about relations between static if
and concepts, static if
clause and an error diagnostics.
N3327 | 12-0017 | A Standard Programmatic Interface for Asynchronous Operations | N. Gustafsson, A. Laksberg |
N3328 | 12-0018 | Resumable Functions | Niklas Gustafsson |
That's our favorite.
Authors propose an API and a language extensions to make asynchronous programs simpler.
In fact, asynchronous function will look very mush as a regular one but with small
additions. It's similar to yield return
in C# (a
construct that has been available in C# for many years and is well vetted),
and to async
expression in C# 4.5. Compiler will rewrite such a
function into a state machine, thus function can suspend its execution, wait
for the data and to resume when data is available.
Example:
// read data asynchronously from an input and
write it into an output.
int cnt = 0;
do
{
cnt = await streamR.read(512, buf);
if (cnt == 0)
{
break;
}
cnt = await streamW.write(cnt, buf);
}
while(cnt > 0);
It's iteresting to see how authors will address yield return
: either with
aditional keyword, or in terms of resumable functions.
N3340 | 12-0030 | Rich Pointers | D. M. Berris, M. Austern, L. Crowl |
Here authors try to justify rich type-info but mask it under the name "rich pointers". To make things even more obscure they argue about dynamic code generation.
If you want a rich type-info then you should talk about it and not about thousand of other things.
We would better appealed to create a standard API to access post-compile object model, which could be used to produce different type-infos or other source derivatives.
This paper is our outsider. :-)
N3341 | 12-0031 | Transactional Language Constructs for C++ | M. Wong, H. Boehm, J. Gottschlich, T. Shpeisman, et al. |
Here people try to generalize (put you away from) locking, and replace it with other word "transaction".
Seems it's not viable proposition. It's better to teach on functional style of programming with its immutable objects.
N3347 | 12-0037 | Modules in C++ (Revision 6) | Daveed Vandevoorde |
Author argues against C style source composition with #include
directive, and propose alternative called "modules".
We think that many C++ developers would agree that C pre-processor is a legacy that would never have existed, but for the same reason (for the legacy, and compatibility) it should stay.
In out opinion the current proposition is just immature, at least it's not
intuitive. Or in other words there should be something to replace the C
pre-processor (and #include
as its part), but we don't like this
paper from aestetic perspective.
N3365 | 12-0055 | Filesystem Library Proposal (Revision 2) |
This proposal says no a word about asynchronous nature of file access, while it should be designed around it.
For a long time we were developing web applications with ASP.NET and JSF. At present we prefer rich clients and a server with page templates and RESTful web services.
This transition brings technical questions. Consider this one.
Browsers allow to store session state entirely on the client, so should we maintain a session on the server?
Since the server is just a set of web services, so we may supply all required arguments on each call.
At first glance we can assume that no session is required on the server. However, looking further we see that we should deal with data validation (security) on the server.
Think about a classic ASP.NET application, where a user can select a value from a dropdown. Either ASP.NET itself or your program (against a list from a session) verifies that the value received is valid for the user. That list of values and might be other parameters constitute a user profile, which we stored in session. The user profile played important role (often indirectly) in the validation of input data.
When the server is just a set of web services then we have to validate all parameters manually. There are two sources that we can rely to: (a) a session, (b) a user principal.
The case (a) is very similar to classic ASP.NET application except that with
EnableEventValidation="true" runtime did it for us most of the time.
The case (b) requires reconstruction of the user profile for a user principal
and then we proceed with validation of parameters.
We may cache user profile in session, in which case we reduce (b) to (a); on the other hand we may cache user profile in Cache, which is also similar to (a) but which might be lighter than (at least not heavier than) the solution with the session.
What we see is that the client session does not free us from server session (or its alternative).
We were dealing with a datasource of (int? id, string
value)
pairs in LINQ. The data has originated from a database where id
is unique field.
In the program this datasource had to be seen as a dictionary, so we have
written a code like this:
var dictionary =
CreateIDValuePairs().ToDictionary(item => item.ID, item => item.Value)
;
That was too simple-minded. This code compiles but crashes at runtime when there is an id == null
.
Well, help warns about this behaviour, but anyway this does not make pain easier.
In our opinion this restriction is not justified and just complicates the use
of Dictionaty
.
A database we support for a client contains multi-billion row tables. Many users query the data from that database, and it's permanently populated with a new data.
Every day we load several millions rows of a new data. Such loads can lock tables for a considerable time, so our loading procedures collect new data into intermediate tables and insert it into a final destination by chunks, and usually after work hours.
SQL Server 2008 R2 introduced READ_COMMITTED_SNAPSHOT database option. This feature trades locks for an increased tempdb size (to store row versions) and possible performance degradation during a transaction.
When we have switched the database to that option we did not notice any considerable performance change. Encouraged, we've decided to increase size of chunks of data we insert at once.
Earlier we have found that when we insert no more than 1000 rows at once, users don't notice impact, but for a bigger chunk sizes users start to complain on performance degradation. This has probably happened due to locks escalations.
Now, with chunks of 10000 or even 100000 rows we have found that no queries became slower. But load process became several times faster.
We were ready to pay for increased tempdb and transaction log size to increase performance, but in our case we didn't approach limits assigned by the DBA. Another gain is that we can easily load data at any time. This makes data we store more up to date.
Yesterday, by accident, we've seen an article about some design principles of V8 JavaScript Engine. It made clearer what techniques are used in today's script implementations.
In particular V8 engine optimizes property access using "dynamically created hidden classes". These are structures to store object's layout, they are derived when ש new property is created (deleted) on the object. When code accesses a property, and if a cached object's dynamic hidden class is available at the code point then access time is comparable to one of native fields.
In our opinion this tactics might lead to a proliferation of such dynamic hidden classes, which requires a considerable housekeeping, which also slows property write access, especially when it's written for the first time.
We would like to suggest a slightly different strategy, which exploits the cache matches, and does not require a dynamic hidden classes.
Consider an implementation data type with following characteristics:
A pseudo code for the property access looks like this:
pair = object.values[cachedIndex];
if (pair.ID == propertyID)
{
value = pair.Value;
}
else
{
// Cache miss.
cachedIndex = object.index(propertyID);
value = objec.values[cachedIndex].Value;
}
This approach brings us back to dictionary like implementation but with important optimization of array speed access when property index is cached, and with no dynamic hidden classes.
@michaelhkay Saxon 9.4 is out.
But why author does not state that HE version is still xslt/xpath 2.0, as neither xslt maps, nor function items are supported.
Recently, we have found and reported the bug in the SQL Server 2008 (see SQL Server 2008 with(recompile), and also Microsoft Connect).
Persons, who's responsible for the bug evaluation has closed it, as if "By Design". This strange resolution, in our opinion, says about those persons only.
Well, we shall try once more (see Microsoft Connect). We have posted another trivial demonstartion of the bug, where we show that option(recompile) is not used, which leads to table scan (nothing worse can happen for a huge table).
A bit history: the first release of this solution was about 9.5 years ago...
Today we've run into a strange situation. One of our clients ask us about automatic conversion of data from mainframe (that were defined as COBOL copybooks) into XML or Java/.NET objects. On our suggestion to use eXperanto, which is well known to him, he stated that he wouldn't like to use a tool of a company that is no more exists...
The situation, in our opinion, become more strange when you consider the following:
It has happened so, that we have never worked with jQuery, however were aware of it.
In early 2000 we have developed a web application that contained rich javascript APIs, including UI components. Later, we were actively practicing in ASP.NET, and later in JSF.
At present, looking at jQuery more closely we regret that we have failed to start using it earlier.
Separation of business logic and presentation is remarkable when one uses JSON web services. In fact server part can be seen as a set of web services representing a business logic and a set of resources: html, styles, scripts, others. Nor ASP.NET or JSF approach such a consistent separation.
The only trouble, in our opinion, is that jQuery has no standard data binding: a way to bind JSON data to (and from) html controls. The technique that will probably be standardized is called jQuery Templates or JsViews .
Unfortunatelly after reading about this binding API, and being in love with Xslt and XQuery we just want to cry. We don't know what would be the best solution for the task, but what we see looks uncomfortable to us.
Incidentally, we have found one new implementation of yield return in java that is in the development stage. Sources can be found at https://github.com/peichhorn/lombok-pg/zipball/master. Just to be sure we have copied those sources at other place peichhorn-lombok-pg-0.10.0-39-g384fb7b.zip (you may search "yield" in the archive).
It's broken according to source tracker, but the funny thing is that sources, however different, still resemble our yield return implementation (Yield.jar, Yield.3.7.jar - Indigo, Yield.zip - sources) very much: variable names, error messages, algorithmic structure.
Those programmers probably have forgotten good manners: to reference a base work, at least.
Well, we generously forgive them this blunder.
P.S. our implementation, in contrast, works without bugs.
P.P.S. misunderstanding is resolved. See comments.
A couple of weeks ago, we have suggested to introduce a enumerator function into the XPath (see [F+O30] A enumerator function):
I would like the WG to consider an addition of a function that turns a sequence
into a enumeration of values.
Consider a function like this:
fn:enumerator($items as item()*) as function() as item()?;
alternatively, signature could be:
fn:enumerator($items as function() as item()*) as function() as item()?;
This function receives a sequence, and returns a function item, which upon N's
call shall return N's element of the original sequence. This way, a sequence of
items is turned into a function providing a enumeration of items of the
sequence.
As an example consider two functions:
a) t:rand($seed as xs:double) as xs:double* - a function producing a random
number sequence;
b) t:work($input as element()) as element() - a function that generates output
from it's input, and that needs random numbers in the course of the execution.
t:work() may contain a code like this:
let $rand := fn:enumerator(t:rand($seed)),
and later it can call $rand() to get a random numbers.
Enumerators will help to compose algorithms where one algorithm communicate
with other independant algorithms, thus making code simpler. The most obvious
class of enumerators are generators: ordered numbers, unique identifiers,
random numbers.
Technically, function returned from fn:enumerator() is nondetermenistic, but its "side effect" is
similar to a "side effect" of a function generate-id() from a newly created
node (see bug #13747, and bug #13494).
The idea is inspired by a generator function, which returns a new value upon each call.
Such function can be seen as a stateful object. But our goal is to look at it in a more functional way. So, we look at the algorithm as a function that produces a sequence of output, which is pure functional; and an enumerator that allows to iterate over algorithm's output.
This way, we see the function that implements an algorithm and the function that uses it can be seen as two thread of functional programs that use messaging to communicate to each other.
Honestly, we doubt that WG will accept it, but it's interesting to watch the discussion.
More than month has passed since we have reported a problem to the saxon forum (see Saxon optimizer bug and Saxon 9.2 generate-id() bug).
The essence of the problem is that we have constructed argumentless function to
return a unique identifiers each time function is called. To achieve the effect
we have created a temporary node and returned its generate-id()
value.
Such a function is nondetermenistic, as we cannot state that its result depends on arguments only. This means that engine's optimizer is not free to reorder calls to such a function. That's what happens in Saxon 9.2, and Saxon 9.3 where engine elevates function call out of cycle thus producing invalid results.
Michael Kay, the author of the Saxon engine, argued that this is "a gray area of the xslt spec":
If the spec were stricter about defining exactly when you can rely on identity-dependent operations then I would be obliged to follow it, but I think it's probably deliberate that it currently allows implementations some latitude, effectively signalling to users that they should avoid depending on this aspect of the behaviour.
He adviced to raise a bug in the w3c bugzilla to resolve the issue. In the end two related bugs have been raised:
Yesterday, the WG has resolved the issue:
The Working Group agreed that default behavior should continue to require these nodes to be constructed with unique IDs. We believe that this is the kind of thing implementations can do with annotations or declaration options, and it would be best to get implementation experience with this before standardizing.
This means that the technique we used to generate unique identifiers is correct and the behaviour is well defined.
The only problem is to wait when Saxon will fix its behaviour accordingly.
We're not big fans of Entity Framework, as we don't directly expose the database structure to the client program but rather through stored procedures and functions. So, EF for us is a tool to expose those stored procedures as .NET wrappers. This limited use of EF still greatly automates the data access code.
But what we have lately found is that the EF has a problem with char parameters. Namely,
if you import a procedure say MyProc
that accepts char(1)
,
and then will call it through the generated wrapper, the you will see in sql profiler
that char(1)
parameter is passed with many trailing spaces as if it
were char(8000)
. There isn't necessity to prove that this is highly
ineffective.
We can see that the problem happens in VS 2010 designer rather than in the EF runtime, as SP's parameters are not attributed with length, see model xml (*.edmx):
<Function Name="MyProc" Schema="Data">
...
<Parameter Name="recipientType" Type="char" Mode="In"
/>
...
</Function>
while if we set:
<Parameter Name="recipientType" Type="char" MaxLength="1"
Mode="In" />
the runtime starts working as expected. So the workaround is to fix model file manually.
See also: Stored Proc and Char parm
1. query.dll vs tquery.dll
We have installed Windows Search 4 on a Windows 2003 server. The goal was to index huge compressed xml files (see Windows Search Notifications). But for some reason it did not want to index content.
No "select System.ItemUrl from SystemIndex where contains('...')
"
has ever returned a row.
We thought that the problem was in our protocol handler, and tried to localize it, but finally have discovered that Windows Search is not able to find anything within text files.
Registry comparision has shown that *.txt extension was indexed by the IFilter defined in the query.dll, while on the other computers, where everything worked, the implementation was in the tquery.dll.
Both libraries were present on the Windows 2003 server, so we have corrected the registry and everything has started to work.
As far as we understand query.dll is part of legacy Indexing Service, and tquery.dll is up to date implementation.
2. Search index size
We have to index a considerable amout of data. But before we can do it we have to estimate the size of index.
In the past it seems we saw somewhere a statement that search index needs a storage that's about 10% of original data for its purposes. Unfortunatelly we cannot find this estimation at present, neither we cannot find any other estimation. This complicates our planning.
To get empirical estimate we've indexed several thousands *.xml-gz files, which are gz'ed big xmls. The total size of this files is about 4.5GB. Total uncompressed size of xmls ~50GB. Xml contained about 10 millions pages of data.
According to 10% criteria we had to arrive to ~5GB search index.
But what we have discovered is that the index has grown to more than 50GB. That's very disappointing. We cannot afford such expense, as we've commited test on a tiny part of data, which increases over time.
So, the solution is to find out what's wrong, and how can it be cured, or to fulltext index only most recent subset of data.
P.S. We have tried to mark folder with search index as compressed, but it did not work.
P.P.S. We have found the reference to Windows Search 4 index size estimation. It is in Windows Search Frequently Asked Questions, see answer on "What is average size of a user's index?" question.
We needed to track a stream position during creation of xml file. This is to allow random access to a huge xml file (the task is related to WindowsSearch).
This is a simplified form of the xml:
<data>
<item>...</item>
...
<item>...</item>
</data>
The goal was to have stream position of each item element. With this in mind, we've decided to:
Flush()
method of the xml writer before measuring stream offset;That's a code sample:
var stream = new MemoryStream();
var writer = XmlWriter.Create(stream);
writer.WriteStartDocument();
writer.WriteStartElement("data");
for(var i = 0; i < 10; ++i)
{
writer.Flush();
Console.WriteLine("Flush offset: {0}, char: {1}",
stream.Position,
(char)stream.GetBuffer()[stream.Position - 1]);
writer.WriteStartElement("item");
writer.WriteValue("item " + i);
writer.WriteEndElement();
}
writer.WriteEndElement();
writer.WriteEndDocument();
That's the output:
Flush offset: 46, char: a
Flush offset: 66, char: >
Flush offset: 85, char: >
Flush offset: 104, char: >
Flush offset: 123, char: >
Flush offset: 142, char: >
Flush offset: 161, char: >
Flush offset: 180, char: >
Flush offset: 199, char: >
Flush offset: 218, char: >
Funny, isn't it?
After feeding the start tag <data>
, and flushing xml writer we observe that only
"<data"
has been written down to the stream. Well,
Flush()
have never promissed anything particular about the content
of the stream, so we cannot claim any violation, however we expected to see
whole start tag.
Inspection of the implementation of xml writer reveals laziness during writting
data down the stream. In particular start tag is closed when one starts the
content. This is probably to implement empty tags: <data/>
.
To do the trick we had to issue empty content, moreover, to call a particular method with particular parameters of the xml writer. So the code after the fix looks like this:
var stream = new MemoryStream();
var writer = XmlWriter.Create(stream);
writer.WriteStartDocument();
writer.WriteStartElement("data");
char[] empty = { ' ' };
for(var i = 0; i < 10; ++i)
{
writer.WriteChars(empty, 0, 0);
writer.Flush();
Console.WriteLine("Flush offset: {0}, char: {1}",
stream.Position,
(char)stream.GetBuffer()[stream.Position - 1]);
writer.WriteStartElement("item");
writer.WriteValue("item " + i);
writer.WriteEndElement();
}
writer.WriteEndElement();
writer.WriteEndDocument();
And output is:
Flush offset: 47, char: >
Flush offset: 66, char: >
Flush offset: 85, char: >
Flush offset: 104, char: >
Flush offset: 123, char: >
Flush offset: 142, char: >
Flush offset: 161, char: >
Flush offset: 180, char: >
Flush offset: 199, char: >
Flush offset: 218, char: >
While this code works, we feel uneasy with it.
What's the better way to solve the task?
Update: further analysis shows that it's
only possible behaviour, as after the call to write srart element, you either
can write attributes, content or end of element, so writer may write either
space, '>'
or '/>'
. The only
question is why it takes WriteChars(empty, 0, 0)
into account and WriteValue("")
it doesn't.
As you probably know we have implemented our custom Protocol Handler for the Windows Search.
It's called .xml-gz, and has a goal to index compressed xml files and to have search results with a subtree precision. So, for xml:
<data>
<item>...</item>
<item>...</item>
...
</data>
search finds results within item
and returns xml's url and stream
offset of the item. Using ZLIB API we can compress data with stream bookmarks, so fast random
access to the data is possible.
The only problem we have is about notification of changes (create, delete, update) of such files.
Spec describes several techniques (nothing has worked for us):
1. Call catalogManager.ReindexMatchingURLs() - it just returns without any impact.
2.Call changeSink.OnItemsChanged() - returns error.
3. Implement .xml-gz IFilter and call IGatherNotifyInline (see " have your .zip urls indexed when they are created or modified") - that's a mistery, as:
4. Implement root url in form .xml-gz:///
and perform Windows Search:
SELECT
System.ItemUrl, System.DateModified
FROM
SystemIndex WHERE System.FileExtension='.xml-gz'
to find all .xml-gz
sources. This is not reliable, as your protocol handler can
be (and is) called before file is indexed.
So, the only reliable way to index your data is to (re-)add indexing rule for the protocol handler, which in most cases reindexes everything.
The only bearable solution we found is to define indexing rule in the form:
.xml-gz://file:d:/data/...
and to use
IShellFolder(2)
interfaces to discover sub items and their modification times. This technique allows
minimal data scan when you're (re-)add indexing rule.
Being unexperienced with Windows Search we tried to build queries to find data in the huge storage. We needed to find a document that matches some name pattern and contains some text.
Our naive query was like this:
select top 1000
System.ItemUrl
from
SystemIndex
where
scope = '...' and
System.ItemName like '...%' and
contains('...')
In most cases this query returns nothing and runs very long. It's interesting to note that it may start returning data if "top
" clause is missing or uses a bigger number, but in this cases query is slower even more.
Next try was like this:
select top 1000
System.ItemUrl
from
SystemIndex
where
scope = '...' and
System.ItemName >= '...' and System.ItemName < '...' and
contains('...')
This query is also slow, but at least it returns some results.
At some point we have started to question the utility of Windows Search if it's so slow, but then we have found that there is a property System.ItemNameDisplay
, which in our case coincides with the value of property System.ItemName
, so we have tried the query:
select top 1000
System.ItemUrl
from
SystemIndex
where
scope = '...' and
System.ItemNameDisplay like '...%' and
contains('...')
This query worked fast, and produced good results. This hints that search engine has index on System.ItemNameDisplay
in contrast to System.ItemName
property.
We've looked at property definitions:
The display name in "most complete" form. It is the unique representation of the item name most appropriate for end users.
propertyDescription
name = System.ItemNameDisplay
shellPKey = PKEY_ItemNameDisplay
formatID = B725F130-47EF-101A-A5F1-02608C9EEBAC
propID = 10
searchInfo
inInvertedIndex = true
isColumn = true
isColumnSparse = false
columnIndexType = OnDisk
maxSize = 128
The base name of the System.ItemNameDisplay property.
propertyDescription
name = System.ItemName
shellPKey = PKEY_ItemName
formatID = 6B8DA074-3B5C-43BC-886F-0A2CDCE00B6F
propID = 100
searchInfo
inInvertedIndex = false
isColumn = true
isColumnSparse = false
columnIndexType = OnDisk
maxSize = 128
Indeed, one property is indexed, while the other is not.
As with other databases, query is powerful when engine uses indices rather than performs data scan. This is also correct for Windows Search.
The differences in results that variations of query produce also manifests that Windows Search nevertheless is very different from relational database.
We have developed our custom Windows Search Protocol Handler. The role of this component is to expose items of complex content (or unusual storage) to Windows Search.
You can think of some virtual folder, so a Protocol Handler allows to enumerate it's files, file properties, and contents.
The goal of our Protocol Handler is to represent some data structure as a set of xml files. We expected that if we found a data within a folder with these files, then a search within Protocol Handler's scope would bring the same (or almost the same) results.
Reality is different.
For some reason .xml IFilter (a component to extract text data to index) works differently with file system and with our storage. We cannot state that it does not work, but for some reason many words that Windows Search finds within a file are never found within Protocol Handler scope.
We have observed that if, for purpose of indexing, we represent content xml items as .txt files, then search works as expected. So, our workaround was to present only xml's text data for the indexing, and to use .txt IFilter (this in fact roughly what .xml IFilter does by itself).
Is there a conclusion?
Well, Windows Search is a black box probably containing bugs. Its behaviour is not always obvious.
Let's put it blatantly: Windows Search 4 has design and implementation problems.
You discover this immediatelly when you start implementing indexing of custom file format.
If you want to index simple file format then you need to imlement you IFilter interface. But if it has happened so that you want to index compound data then you should invent you own protocols.
If you will fugure out how to implement your protocol to index that compound data, then you will most probably stuck on the problem on how to notify indexer about the changes.
The problem is that Windows Search 4 has API to reindex urls, which simply does not work, or to notify indexer about changes, which throws an error (returns error HRESULT) for custom protocols. At least, we were not able to make it run.
Do you know that the best JSF/Facelets visual editor, in our opinion, is ... Microsoft Visual Studio 2008? Another rather good JSF editor is presented in IBM RAD 7.xx. The most popular, open source Java IDE Eclipse contains an ugly implementation of such useful thing.
Already for a couple of days we're trying to create a UserControl containing a TabContainer. To achieve the goal we have created a page with a ToolkitScriptManager and a user control itself.
Page:
<form runat="server">
<ajax:ToolkitScriptManager ID="ToolkitScriptManager1" runat="server"/>
<uc1:WebUserControl1 ID="WebUserControl11" runat="server" />
</form>
User control:
<%@ Control Language="C#" %>
<%@ Register
Assembly="AjaxControlToolkit"
Namespace="AjaxControlToolkit"
TagPrefix="ajax" %>
<ajax:TabContainer ID="Tab" runat="server" Width="100%">
<ajax:TabPanel runat="server" HeaderText="Tab1" ID="Tab1">
<ContentTemplate>Panel 1</ContentTemplate>
</ajax:TabPanel>
</ajax:TabContainer>
What could be more simple?
But no, there is a problem. At run-time control works perfectly, but at the designer it shows an error instead of a normal design view:
Error Rendering Control - TabContainer1
An unhandled exception has occurred.
Could not find any resources appropriate for the specified culture or the neutral culture. Make sure "AjaxControlToolkit.Properties.Resources.NET4.resources" was correctly embedded or linked into assembly "AjaxControlToolkit" at compile time, or that all the satellite assemblies required are loadable and fully signed.
That's a stupid error, which says nothing about the real problem reason. We had to attach a debugger to a Visual Studio just to realize what the problem is.
So, the error occurs at the following code of AjaxControlToolkit.ScriptControlBase:
private void EnsureScriptManager()
{
if (this._scriptManager == null)
{
this._scriptManager = ScriptManager.GetCurrent(this.Page);
if (this._scriptManager == null)
{
throw new HttpException(Resources.E_NoScriptManager);
}
}
}
Originally, the problem is due to the fact that ScriptManager is not found, and code
wants to report an HttpException
, but fun is that we recieve a different exception, which is releted to a missing resouce text for a message Resources.E_NoScriptManager
.
It turns out that E_NoScriptManager
text is found neither in
primary no in resource assemblies.
As for original problem, it's hard to say about reason of why ScriptManager is
not available at design time. We, however, observed that a ScriptManager
registers itself for a ScriptManager.GetCurrent()
method at
run-time only:
protected internal override void OnInit(EventArgs e)
{
...
if (!base.DesignMode)
{
...
iPage.Items[typeof(ScriptManager)] = this;
...
}
}
So, it's not clear what they (toolkit's developers) expected to get at design time.
These observations leave uneasiness regarding the quality of the library.
Earlier, we have described an approach to call Windows Search from SQL Server 2008. But it has turned out that our problem is more complicated...
All has started from the initial task:
Later we shall describe how we have solved this task, and now it's enough to say that we have implemented a Protocol Handler for Windows Search named '.xml-gz:'. This way original file stored say at 'file:///c:/store/data.xml-gz' is seen as a container by the Windows Search:
This way search xml should be like this:
select System.ItemUrl from SystemIndex where scope='.xml-gz:' and contains(...)
Everything has worked during test: we have succeeded to issue Windows Search selects from SQL Server and join results with other sql queries.
But later on when we considered a runtime environment we have seen that our design won't work. The reason is simple. Windows Search will work on a computer different from those where SQL Servers run. So, the search query should look like this:
select System.ItemUrl from Computer.SystemIndex where scope='.xml-gz:' and contains(...)
Here we have realized the limitation of current (Windows Search 4) implementation: remote search works for shared folders only, thus query may only look like:
select System.ItemUrl from Computer.SystemIndex where scope='file://Computer/share/' and contains(...)
Notice that search restricts the scope to a file protocol, this way remoter search will never return our results. The only way to search in our scope is to perform a local search.
We have considered following approaches to resolve the issue.
The simplest one would be to access Search protocol on remote computer using a connection string: "Provider=Search.CollatorDSO;Data Source=Computer"
and use local queries. This does not work, as provider simply disregards Data Source
parameter.
The other try was to use MS Remote OLEDB provider. We tried hard to configure it but it always returns obscure error, and more than that it's deprecated (Microsoft claims to remove it in future).
So, we decided to forward request manually:
Here we considered WCF Data Services and a custom web service.
The advantage of WCF Data Services is that it's a technology that has ambitions of a standard but it's rather complex task to create implementation that will talk with Windows Search SQL dialect, so we have decided to build a primitive http handler to get query parameter. That's trivial and also has a virtue of simple implementation and high streamability.
So, that's our http handler (WindowsSearch.ashx):
<%@ WebHandler Language="C#" Class="WindowsSearch" %>
using System;
using System.Web;
using System.Xml;
using System.Text;
using System.Data.OleDb;
/// <summary>
/// A Windows Search request handler.
/// </summary>
public class WindowsSearch: IHttpHandler
{
/// <summary>
/// Handles the request.
/// </summary>
/// <param name="context">A request context.</param>
public void ProcessRequest(HttpContext context)
{
var request = context.Request;
var query = request.Params["query"];
var response = context.Response;
response.ContentType = "text/xml";
response.ContentEncoding = Encoding.UTF8;
var writer = XmlWriter.Create(response.Output);
writer.WriteStartDocument();
writer.WriteStartElement("resultset");
if (!string.IsNullOrEmpty(query))
{
using(var connection = new OleDbConnection(provider))
using(var command = new OleDbCommand(query, connection))
{
connection.Open();
using(var reader = command.ExecuteReader())
{
string[] names = null;
while(reader.Read())
{
if (names == null)
{
names = new string[reader.FieldCount];
for (int i = 0; i < names.Length; ++i)
{
names[i] = XmlConvert.EncodeLocalName(reader.GetName(i));
}
}
writer.WriteStartElement("row");
for(int i = 0; i < names.Length; ++i)
{
writer.WriteElementString(
names[i],
Convert.ToString(reader[i]));
}
writer.WriteEndElement();
}
}
}
}
writer.WriteEndElement();
writer.WriteEndDocument();
writer.Flush();
}
/// <summary>
/// Indicates that a handler is reusable.
/// </summary>
public bool IsReusable { get { return true; } }
/// <summary>
/// A connection string.
/// </summary>
private const string provider =
"Provider=Search.CollatorDSO;" +
"Extended Properties='Application=Windows';" +
"OLE DB Services=-4";
}
And a SQL CLR function looks like this:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Net;
using System.IO;
using System.Xml;
/// <summary>
/// A user defined function.
/// </summary>
public class UserDefinedFunctions
{
/// <summary>
/// A Windows Search returning result as xml strings.
/// </summary>
/// <param name="url">A search url.</param>
/// <param name="userName">A user name for a web request.</param>
/// <param name="password">A password for a web request.</param>
/// <param name="query">A Windows Search SQL.</param>
/// <returns>A result rows.</returns>
[SqlFunction(
IsDeterministic = false,
Name = "WindowsSearch",
FillRowMethodName = "FillWindowsSearch",
TableDefinition = "value nvarchar(max)")]
public static IEnumerable Search(
string url,
string userName,
string password,
string query)
{
return SearchEnumerator(url, userName, password, query);
}
/// <summary>
/// A filler of WindowsSearch function.
/// </summary>
/// <param name="value">A value returned from the enumerator.</param>
/// <param name="row">An output value.</param>
public static void FillWindowsSearch(object value, out string row)
{
row = (string)value;
}
/// <summary>
/// Gets a search row enumerator.
/// </summary>
/// <param name="url">A search url.</param>
/// <param name="userName">A user name for a web request.</param>
/// <param name="password">A password for a web request.</param>
/// <param name="query">A Windows Search SQL.</param>
/// <returns>A result rows.</returns>
private static IEnumerable<string> SearchEnumerator(
string url,
string userName,
string password,
string query)
{
if (string.IsNullOrEmpty(url))
{
throw new ArgumentException("url");
}
if (string.IsNullOrEmpty(query))
{
throw new ArgumentException("query");
}
var requestUrl = url + "?query=" + Uri.EscapeDataString(query);
var request = WebRequest.Create(requestUrl);
request.Credentials = string.IsNullOrEmpty(userName) ?
CredentialCache.DefaultCredentials :
new NetworkCredential(userName, password);
using(var response = request.GetResponse())
using(var stream = response.GetResponseStream())
using(var reader = XmlReader.Create(stream))
{
bool read = true;
while(!read || reader.Read())
{
if ((reader.Depth == 1) && reader.IsStartElement())
{
// Note that ReadInnerXml() advances the reader similar to Read().
yield return reader.ReadInnerXml();
read = false;
}
else
{
read = true;
}
}
}
}
}
And, finally, when you call this service from SQL Server you write query like this:
with search as
(
select
cast(value as xml) value
from
dbo.WindowsSearch
(
N'http://machine/WindowsSearchService/WindowsSearch.ashx',
null,
null,
N'
select
"System.ItemUrl"
from
SystemIndex
where
scope=''.xml-gz:'' and contains(''...'')'
)
)
select
value.value('/System.ItemUrl[1]', 'nvarchar(max)')
from
search
Design is not trivial but it works somehow.
After dealing with all these problems some questions remain unanswered:
We were trying to query Windows Search from an SQL Server 2008.
Documentation states that Windows Search is exposed as OLE DB datasource. This meant that we could just query result like this:
SELECT
*
FROM
OPENROWSET(
'Search.CollatorDSO.1',
'Application=Windows',
'SELECT "System.ItemName", "System.FileName" FROM SystemIndex');
But no, such select never works. Instead it returns obscure error messages:
OLE DB provider "Search.CollatorDSO.1" for linked server "(null)" returned message "Command was not prepared.".
Msg 7399, Level 16, State 1, Line 1
The OLE DB provider "Search.CollatorDSO.1" for linked server "(null)" reported an error. Command was not prepared.
Msg 7350, Level 16, State 2, Line 1
Cannot get the column information from OLE DB provider "Search.CollatorDSO.1" for linked server "(null)".
Microsoft is silent about reasons of such behaviour. People came to a conclusion that the problem is in the SQL Server, as one can query search results through OleDbConnection without problems.
This is very unfortunate, as it bans many use cases.
As a workaround we have defined a CLR function wrapping Windows Search call and returning rows as xml fragments. So now the query looks like this:
select
value.value('System.ItemName[1]', 'nvarchar(max)') ItemName,
value.value('System.FileName[1]', 'nvarchar(max)') FileName
from
dbo.WindowsSearch('SELECT "System.ItemName", "System.FileName" FROM SystemIndex')
Notice how we decompose xml fragment back to fields with the value() function.
The C# function looks like this:
using System;
using System.Collections;
using System.IO;
using System.Xml;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.Data.OleDb;
using Microsoft.SqlServer.Server;
public class UserDefinedFunctions
{
[SqlFunction(
FillRowMethodName = "FillSearch",
TableDefinition="value xml")]
public static IEnumerator WindowsSearch(SqlString query)
{
const string provider =
"Provider=Search.CollatorDSO;" +
"Extended Properties='Application=Windows';" +
"OLE DB Services=-4";
var settings = new XmlWriterSettings
{
Indent = false,
CloseOutput = false,
ConformanceLevel = ConformanceLevel.Fragment,
OmitXmlDeclaration = true
};
string[] names = null;
using(var connection = new OleDbConnection(provider))
using(var command = new OleDbCommand(query.Value, connection))
{
connection.Open();
using(var reader = command.ExecuteReader())
{
while(reader.Read())
{
if (names == null)
{
names = new string[reader.FieldCount];
for (int i = 0; i < names.Length; ++i)
{
names[i] = XmlConvert.EncodeLocalName(reader.GetName(i));
}
}
var stream = new MemoryStream();
var writer = XmlWriter.Create(stream, settings);
for(int i = 0; i < names.Length; ++i)
{
writer.WriteElementString(names[i], Convert.ToString(reader[i]));
}
writer.Close();
yield return new SqlXml(stream);
}
}
}
}
public static void FillSearch(object value, out SqlXml row)
{
row = (SqlXml)value;
}
}
Notes:
OLE DB Services=-4
" in provider string to avoid transaction enlistment (required in SQL Server 2008).
Last few days we were testing Java web-applications that expose web-services. During these tests we've found few interesting features.
The first feature allows to retrieve info about all endpoints supported by the web-application on GET request. The feature works at least for Metro that implements JAX-WS API v2.x. In order to get such info, a client sends any endpoint's URL to the server. The result is an HTML page with a table. Each row of such table contains an endpoint's data for each supported web-service method. This feature may be used as a web-services discovery mechanism.
The second feature is bad rather than good. JAX-WS API supposes that a developer annotates classes and methods that he/she wants to expose as web-services. Then, an implementation generates additional layer-bridge between developer's code and API that does all routine work behind the scene. May be that was a good idea, but Metro's implementation is imperfect. Metro dynamically generates such classes at run-time when a web-application starts. Moreover, Metro does such generation for all classes at once. So, in our case, when the generated web-based application contains dozens or even hundreds of web-services, the application's startup takes a lot of time.
Probably, Metro developers didn't want to deal with implementation of lazy algorithms, when a web-service is generated and cached on demand. We hope this issue will be solved in next releases.
A while ago we have created
a simple cache for Java application. It was modelled like a Map<K, V>
: it cached
values for keys.
Use cases were:
Cache<String, Object> cache = new Cache<String, Object>();
...
instance = cache.get("key");
cache.put("key", instance);
But now we thought of different implementation like a WeakReference<V>
and with map access as additional utility methods.
Consider an examples:
1. Free standing CachedReference<V>
instance.
CachedReference<Data> ref = new CachedReference<Data>(1000, true);
...
ref.set(data);
...
data = ref.get();
2. Map of CachedReference<V>
instances.
ConcurrentHashMap<String, CachedReference<Data>> cache =
new
ConcurrentHashMap<String, CachedReference<Data>>();
CachedReference.put(cache, "key", data, 1000, true);
...
data = CachedReference.get(cache, "key");
The first case is faster than original Cache<K, V>
as it does not use any hash
map at all. The later case provides the same performance as Cache<K, V>
but
gives a better control over the storage. Incidentally, CachedReference<V>
is more compact than Cache<K, V>
.
The new implementation is CachedReference.java, the old one Cache.java.
michaelhkay: Just posted a new internal draft of XSLT 3.0. Moving forward on maps, nested sequences, and JSON support.
Hope they will finally appear there!
See also: Tuples and maps - next try, Tuples and maps - Status: CLOSED, WONTFIX, Tuples and maps in Saxon and other blog posts on our site about immutable maps.
A method pattern we have suggested to use along with @Yield
annotation brought
funny questions like: "why should I mark my method with @Yield
annotation at
all?"
Well, in many cases you may live with ArrayList
populated with data, and then to
perform iteration. But in some cases this approach is not practical either due
to amount of data or due to the time required to get first item.
In later
case you usually want to build an iterator that calculates items on demand. The @Yield
annotation is designed as a marker of such methods. They are refactored
into state machines at compilation time, where each addition to a result list is
transformed into a new item yielded by the iterator.
So, if you have decided to use @Yield
annotation then at some point you will ask yourself what
happens with resources acquired during iteration. Will resources be released if
iteration is interrupted in the middle due to exception or a break statement?
To address the problem yield iterator implements Closeable
interface.
This way when you call close()
before iteration reached the end, the state machine
works as if break statement of the method body is injected after the yield
point. Thus all finally blocks of the original method are executed and resources
are released.
Consider an example of data iterator:
@Yield
public Iterable<Data> getData(final Connection connection)
throws Exception
{
ArrayList<Data> result = new ArrayList<Data>();
PreparedStatement statement =
connection.prepareStatement("select key, value from table");
try
{
ResultSet resultSet = statement.executeQuery();
try
{
while(resultSet.next())
{
Data data = new Data();
data.key = resultSet.getInt(1);
data.value = resultSet.getString(2);
result.add(data); // yield point
}
}
finally
{
resultSet.close();
}
}
finally
{
statement.close();
}
return result;
}
private static void close(Object value)
throws IOException
{
if (value instanceof Closeable)
{
Closeable closeable = (Closeable)value;
closeable.close();
}
}
public void daoAction(Connection connection)
throws Exception
{
Iterable<Data> items = getData(connection);
try
{
for(Data data: items)
{
// do something that potentially throws exception.
}
}
finally
{
close(items);
}
}
getData()
iterates over sql data. During the lifecycle it creates and releases
PreparedStatement
and ResultSet
.
daoAction()
iterates over results provided by getData()
and performs some
actions that potentially throw an exception. The goal of close()
is to release
opened sql resources in case of such an exception.
Here you can inspect how state machine is implemented for such a method:
@Yield()
public static Iterable<Data> getData(final Connection connection)
throws Exception
{
assert (java.util.ArrayList<Data>)(ArrayList<Data>)null == null;
class $state implements java.lang.Iterable<Data>, java.util.Iterator<Data>, java.io.Closeable
{
public java.util.Iterator<Data> iterator() {
if ($state$id == 0) {
$state$id = 1;
return this;
} else return new $state();
}
public boolean hasNext() {
if (!$state$nextDefined) {
$state$hasNext = $state$next();
$state$nextDefined = true;
}
return $state$hasNext;
}
public Data next() {
if (!hasNext()) throw new java.util.NoSuchElementException();
$state$nextDefined = false;
return $state$next;
}
public void remove() {
throw new java.lang.UnsupportedOperationException();
}
public void close() {
do switch ($state$id) {
case 3:
$state$id2 = 8;
$state$id = 5;
continue;
default:
$state$id = 8;
continue;
} while ($state$next());
}
private boolean $state$next() {
java.lang.Throwable $state$exception;
while (true) {
try {
switch ($state$id) {
case 0:
$state$id = 1;
case 1:
statement = connection.prepareStatement("select key, value from table");
$state$exception1 = null;
$state$id1 = 8;
$state$id = 2;
case 2:
resultSet = statement.executeQuery();
$state$exception2 = null;
$state$id2 = 6;
$state$id = 3;
case 3:
if (!resultSet.next()) {
$state$id = 4;
continue;
}
data = new Data();
data.key = resultSet.getInt(1);
data.value = resultSet.getString(2);
$state$next = data;
$state$id = 3;
return true;
case 4:
$state$id = 5;
case 5:
{
resultSet.close();
}
if ($state$exception2 != null) {
$state$exception = $state$exception2;
break;
}
if ($state$id2 > 7) {
$state$id1 = $state$id2;
$state$id = 7;
} else $state$id = $state$id2;
continue;
case 6:
$state$id = 7;
case 7:
{
statement.close();
}
if ($state$exception1 != null) {
$state$exception = $state$exception1;
break;
}
$state$id = $state$id1;
continue;
case 8:
default:
return false;
}
} catch (java.lang.Throwable e) {
$state$exception = e;
}
switch ($state$id) {
case 3:
case 4:
$state$exception2 = $state$exception;
$state$id = 5;
continue;
case 2:
case 5:
case 6:
$state$exception1 = $state$exception;
$state$id = 7;
continue;
default:
$state$id = 8;
java.util.ConcurrentModificationException ce = new java.util.ConcurrentModificationException();
ce.initCause($state$exception);
throw ce;
}
}
}
private PreparedStatement statement;
private ResultSet resultSet;
private Data data;
private int $state$id;
private boolean $state$hasNext;
private boolean $state$nextDefined;
private Data $state$next;
private java.lang.Throwable $state$exception1;
private int $state$id1;
private java.lang.Throwable $state$exception2;
private int $state$id2;
}
return new $state();
}
Now, you can estimate for what it worth to write an algorithm as a sound state machine comparing to the conventional implementation.
Yield annotation processor can be downloaded from Yield.zip or Yield.jar
See also Yield return feature in java.
We're happy to announce that we have implemented @Yield
annotation
both in javac and in eclipse compilers.
This way you get built-in IDE support for the feature!
To download yield annotation processor please use the following link: Yield.zip
It contains both yield annotation processor, and a test project.
If you do not want to compile the sources, you can download Yield.jar
We would like to reiterate on how @Yield
annotation works:
Iterator<T>
or
Iterable<T>
instance and marks it with @Yield
annotation.ArrayList<T> items = new ArrayList<T>();
items.add(...);
return items;
return items.iterator();
to return result;YieldProcessor
rewrites method into a state machine at
compilation time.The following is an example of such a method:
@Yield
public static Iterable<Integer> generate(final int from, final int to)
{
ArrayList<Integer> items = new ArrayList<Integer>();
for(int i = from; i < to; ++i)
{
items.add(i);
}
return items;
}
The use is like this:
for(int value: generate(7, 20))
{
System.out.println("generator: " + value);
}
Notice that method's implementation still will be correct in absence of
YieldProcessor
.
Other important feature is that the state machine returned after the yield processor is closeable.
This means that if you're breaking the iteration before the end is reached you can release resources acquired during the iteration.
Consider the example where break exits iteration:
@Yield
public static Iterable<String> resourceIteration()
{
ArrayList<String> items = new ArrayList<String>();
acquire();
try
{
for(int i = 0; i < 100; ++i)
{
items.add(String.valueOf(i));
}
}
finally
{
release();
}
return items;
}
and the use
int i = 0;
Iterable<String> iterator = resourceIteration();
try
{
for(String item: iterator)
{
System.out.println("item " + i + ":" + item);
if (i++ > 30)
{
break;
}
}
}
finally
{
close(iterator);
}
...
private static <T> void close(T value)
throws IOException
{
if (value instanceof Closeable)
{
Closeable closeable = (Closeable)value;
closeable.close();
}
}
Close will execute all required finally blocks. This way resources will be released.
To configure yield processor a developer needs to refer Yield.jar in build path,
as it contains @Yield
annotation. For javac it's enough, as
compiler will find annotation processor automatically.
Eclipse users need to open project properties and:
At the end we want to point that @Yield
annotation is a syntactic
suggar, but it's important the way the foreach statement is important, as it
helps to write concise and an error free code.
See also
Yield feature in java implemented!
Yield feature in java
We could not stand the temptation to implement the @Yield
annotation that
we described
earlier.
Idea is rather clear but people are saying that it's not an easy task to update the sources.
They were right!
Implementation has its price, as we were forced to access JDK's classes of javac compiler. As result, at present, we don't support other compilers such as EclipseCompiler. We shall look later what can be done in this area.
At present, annotation processor works perfectly when you run javac either from the command line, from ant, or from other build tool.
Here is an example of how method is refactored:
@Yield
public static Iterable<Long> fibonachi()
{
ArrayList<Long> items = new ArrayList<Long>();
long Ti = 0;
long Ti1 = 1;
while(true)
{
items.add(Ti);
long value = Ti + Ti1;
Ti = Ti1;
Ti1 = value;
}
}
And that's how we transform it:
@Yield()
public static Iterable<Long> fibonachi() {
assert (java.util.ArrayList<Long>)(ArrayList<Long>)null == null : null;
class $state$ implements java.lang.Iterable<Long>, java.util.Iterator<Long>, java.io.Closeable {
public java.util.Iterator<Long> iterator() {
if ($state$id == 0) {
$state$id = 1;
return this;
} else return new $state$();
}
public boolean hasNext() {
if (!$state$nextDefined) {
$state$hasNext = $state$next();
$state$nextDefined = true;
}
return $state$hasNext;
}
public Long next() {
if (!hasNext()) throw new java.util.NoSuchElementException();
$state$nextDefined = false;
return $state$next;
}
public void remove() {
throw new java.lang.UnsupportedOperationException();
}
public void close() {
$state$id = 5;
}
private boolean $state$next() {
while (true) switch ($state$id) {
case 0:
$state$id = 1;
case 1:
Ti = 0;
Ti1 = 1;
case 2:
if (!true) {
$state$id = 4;
break;
}
$state$next = Ti;
$state$id = 3;
return true;
case 3:
value = Ti + Ti1;
Ti = Ti1;
Ti1 = value;
$state$id = 2;
break;
case 4:
case 5:
default:
$state$id = 5;
return false;
}
}
private long Ti;
private long Ti1;
private long value;
private int $state$id;
private boolean $state$hasNext;
private boolean $state$nextDefined;
private Long $state$next;
}
return new $state$();
}
Formatting is automatic, sorry, but anyway it's for diagnostics only. You will never see this code.
It's iteresting to say that this implementation is very precisely mimics xslt state machine implementation we have done back in 2008.
You can download YieldProcessor here. We hope that someone will find our solution very interesting.
Several times we have already wished to see
yield
feature in java and all the time came to the same implementation:
infomancers-collections.
And every time with dissatisfaction turned away, and continued with regular
iterators.
Why? Well, in spite of the fact it's the best implementation of the feature we have seen, it's still too heavy, as it's playing with java byte code at run-time.
We never grasped the idea why it's done this way, while there is post-compile time annotation processing in java.
If we would implemented the yeild feature in java we would created a @Yield
annotation and would demanded to implement some well defined code pattern like
this:
@Yield
Iteratable<String> iterator()
{
// This is part of pattern.
ArrayList<String> list = new ArrayList<String>();
for(int i = 0; i < 10; ++i)
{
// list.add() plays the role of yield return.
list.add(String.valueOf(i));
}
// This is part of pattern.
return list;
}
or
@Yield
Iterator<String> iterator()
{
// This is part of pattern.
ArrayList<String> list = new ArrayList<String>();
for(int i = 0; i < 10; ++i)
{
// list.add() plays the role of yield return.
list.add(String.valueOf(i));
}
// This is part of pattern.
return list.iterator();
}
Note that the code will work correctly even, if by mischance, post-compile-time processing will not take place.
At post comile time we would do all required refactoring to turn these implementations into a state machines thus runtime would not contain any third party components.
It's iteresting to recall that we have also implemented similar refactoring in pure xslt.
See What you can do with jxom.
Update: implementation can be found at Yield.zip
We have a class Beans
used to serialize a list of generic objects into an xml.
This is done like this:
public class Call
{
public Beans input;
public Beans output;
...
}
@XmlJavaTypeAdapter(value = BeanAdapter.class)
public class Beans
{
public List<Object> bean;
}
Thanks to @XmlJavaTypeAdapter
, we're able to write xml in
whatever form we want.
When we're serializing a Call
instance:
Call call = ...
Beans beans = ...;
call.setInput(beans);
JAXBContext context = ...;
Marshaller marshaler = context.createMarshaller();
ObjectFactory factory = ...;
marshaler.marshal(factory.createCall(call),
result);
things work as expected, meaning that BeanAdapter
is used during xml
serialization. But if it's happened that you want to serialize a Beans
instance
itself, you start getting problems with the serialization of unknown objects. That's because JAXB does not use BeanAdapter
.
We have found a similar case "How to assign an adapter to the root element?", unfortunately with no satisfactory explanation.
That is strange.
michaelhkay: Saxon 9.3 has been out for 8 days: only two bugs so far, one found by me. I think that's a record.
Not necessary. We, for example, who use Saxon HE, have found nothing new in Saxon 9.3, while expected to see xslt 3.0. Disappointed. No actual reason to migrate.
P.S. We were among the first who were finding early bugs in previous releases.
Reading individual papers of C++ WG, you can find the following one:
N3174 | 10-0164 | To move or not to move | Bjarne Stroustrup | 2010-10-17 | 2010-10 | Core |
There, Bjarne Stroustrup thinks about issues with implicitly generated copy and move operations in C++.
It's always a pleasure to see how one can deal with a problem burdened with antagonisms. To conduct his position Bjarne skilfully uses not only rational but also emotional argumentation:
...We may deem this “bad code that deserves to be broken” or “unrealistic”, but this example demonstrates that the problem with a generated move has an exact counterpart for copy (which we have lived with for 27 years)...
...In 1984, I missed the chance to protect us against copy and we have lived with the problems ever since. I should have instituted some rule along the lines “if a class has a destructor, no copy operations are generated” or “if a class has a pointer member, no copy operations are generated.”...
It's impossible to recall this numbers without shivering. :-)
We're following w3's "Bug 9069 - Function to invoke an XSLT transformation".
There, people argue about xpath API to invoke xslt transformations. Function should look roughly like this:
transform
(
$node-tree as node()?,
$stylesheet as item(),
$parameters as XXX
) as node()
The discussion is spinning around the last argument: $parameters as
XXX
. Should it be an xml element describing parameters, a function returning values for parameter names, or some new type modelling immutable
map?
What is most interesting in this discussion is the leak about plans to introduce a map type:
There is a hope that map will finally appear in xslt!
See also:
Bug 5630
- [DM] Tuples and maps,
Tuples and maps - Status: CLOSED, WONTFIX,
Map, based on immutable trees,
Maps in exslt2?
In the previous post we have announced an API to parse a COBOL source into the cobolxom.
We exploited the incremental parser to build a grammar xml tree and then were planning to create an xslt transformation to generate cobolxom.
Now, we would like to declare that such xslt is ready.
At present all standard COBOL constructs are supported, but more tests are required. Preprocessor support is still in the todo list.
You may peek into an examples of COBOL:
And cobolxom:
While we were building a grammar to cobolxom stylesheet we asked ourselves whether the COBOL parsing could be done entirely in xslt. The answer is yes, so who knows it might be that we shall turn this task into pure xslt one. :-)
Accidentally we have found that implementation of String and StringBuilder have been considerably revised, while public interface has remained the same.
public sealed class String
{
private int m_arrayLength;
private int m_stringLength;
private char
m_firstChar;
}
This layout is dated to .NET 1.0.
VM, in fact, allocates more memory than that defined in C# class, as
&m_firstChar
refers to an inline char buffer.
This way string's buffer length and string's length were two different
values, thus StringBuilder
used this fact and stored its content in a private string
which it modified in place.
In .NET 4, string is different:
public sealed class String
{
private int m_stringLength;
private char
m_firstChar;
}
Memory footprint of such structure is smaller, but string's length should
always be the same as its buffer. In fact layout of string
is now the same as
layout of char[]
.
This modification leads to implementation redesign of the StringBuilder
.
Earlier, StringBuilder looked like the following:
public sealed class StringBuilder
{
internal IntPtr m_currentThread;
internal int m_MaxCapacity;
internal volatile
string m_StringValue;
}
Notice that m_StringValue
is used as a storage, and
m_currentThread
is used to preserve thread affinity of the internal
string value.
Now, guys at Microsoft have decided to implement StringBuilder
very differently:
public sealed class StringBuilder
{
internal int m_MaxCapacity;
internal int m_ChunkLength;
internal int m_ChunkOffset;
internal char[] m_ChunkChars;
internal StringBuilder m_ChunkPrevious;
}
Inspection of this layout immediately reveals implementation technique. It's a list of chunks. Instance itself references the last chunk (most recently appended), and the previous chunks.
Characteristics of this design are:
Length
is small, performance almost the same as it was earlier;Append()
, and ToString()
works as fast a in the old version.Insert()
in the middle works faster, as only a chuck should be splitted and
probably reallocated (copied), instead of the whole string;Personally, we would select a slightly different design:
public sealed class StringBuilder
{
private struct Chunk
{
public int length; // Chunk length.
public int offset; // Chunk offset.
public char[] buffer;
}
private int m_MaxCapacity;
// Alternatively, one can use
// private List<Chunk> chunks;
private int chunkCount; // Number of used chunks.
private Chunk[] chunks; // Array of chunks except last.
private Chunk last; // Last chunk.
private bool nonHomogenous; // false if all chunks are of the same size.
}
This design has better memory footprint, and random access time is O(1) when there were no
inserts in the middle (nonHomogenous=false
), and
O(log(chunkCount)) after such inserts. All other characteristics are the
same.
We have run into another xslt bug, which depends on several independent circumstances and often behaves differently being observed. That's clearly a Heisenbug.
Xslt designers failed to realize that a syntactic suggar they introduce into xpath can turn into obscure bugs. Well, it's easy to be wise afterwards...
To the point.
Consider you have a sequence consisting of text nodes and elements, and now you want to "normalize" this sequence wrapping adjacent text nodes into separate elements. The following stylesheet is supposed to do the work:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:t="http://www.nesterovsky-bros.com/xslt/this"
exclude-result-prefixes="xs t">
<xsl:template match="/">
<xsl:variable
name="nodes" as="node()*">
<xsl:text>Hello, </xsl:text>
<string value="World"/>
<xsl:text>! </xsl:text>
<xsl:text>Well, </xsl:text>
<string value="hello"/>
<xsl:text>, if not joking!</xsl:text>
</xsl:variable>
<result>
<xsl:sequence
select="t:normalize($nodes)"/>
</result>
</xsl:template>
<xsl:function
name="t:normalize" as="node()*">
<xsl:param name="nodes" as="node()*"/>
<xsl:for-each-group select="$nodes" group-starting-with="*">
<xsl:variable
name="string" as="element()?" select="self::string"/>
<xsl:variable name="texts"
as="node()*"
select="current-group() except $string"/>
<xsl:sequence
select="$string"/>
<xsl:if test="exists($texts)">
<string
value="{string-join($texts, '')}"/>
</xsl:if>
</xsl:for-each-group>
</xsl:function>
</xsl:stylesheet>
We're expecting the following output:
<result>
<string value="Hello, "/>
<string value="World"/>
<string value="! Well, "/>
<string value="hello"/>
<string value=", if not joking!"/>
</result>
But often we're getting other results, like:
<result>
<string value="Hello, "/>
<string value="World"/>
<string value="Well, ! "/>
<string value="hello"/>
<string value=", if not joking!"/>
</result>
Such output may seriously confuse, unless you will recall the rule for the
xpath except
operator:
The except operator takes two node sequences as operands and returns a sequence containing all the nodes that occur in the first operand but not in the second operand.
... these operators eliminate duplicate nodes from their result sequences based
on node identity. The resulting sequence is returned in document order..
...
The relative order of nodes in distinct trees is stable but implementation-dependent
These words mean that result sequence may be very different from original sequence.
In contrast, if we change $text
definition to:
<xsl:variable name="texts"
as="node()*"
select="current-group()[not(. is $string)]"/>
then the result becomes stable, but less clear.
See also Xslt Heisenbug
It does not matter that DataBindExtender looks not usual in the ASP.NET. It turns to be so handy that built-in data binding is not considered to be an option.
After a short try, you uderstand that people tried very hard and have invented many controls and methods like ObjectDataSource, FormView, Eval(), and Bind() with outcome, which is very specific and limited.
In contrast DataBindExtender performs:
See an example:
<asp:TextBox id=Field8 EnableViewState="false" runat="server"></asp:TextBox>
<bphx:DataBindExtender runat='server'
EnableViewState='false'
TargetControlID='Field8'
ControlProperty='Text'
DataSource='<%# Import.ClearingMemberFirm %>'
DataMember='Id'
Converter='<%# Converters.AsString("XXXXX", false) %>'
Validator='<%# (extender, value) => Functions.CheckID(value as string) %>'/>
Here, we beside a regualar two way data binding of a property Import.ClearingMemberFirm.Id
to a property Field8.Text
, format (parse) Converters.AsString("XXXXX", false)
, and finally validate an input value with a lambda function (extender, value) => Functions.CheckID(value as string)
.
DataBindExtender works also well in template controls like asp:Repeater, asp:GridView, and so on. Having your business data available, you may reduce a size of the ViewState with EnableViewState='false'
. This way DataBindExtender approaches page development to a pattern called MVC.
Recently, we have found that it's also useful to have a way to run a javascript during the page load (e.g. you want to attach some client side event, or register a component). DataBindExtender provides this with OnClientInit property, which is a javascript to run on a client, where this
refers to a DOM element:
... OnClientInit='$addHandler(this, "change", function() { handleEvent(event, "Field8"); } );'/>
allows us to attach onchange
javascript event to the asp:TextBox
.
So, meantime we're very satisfied with what we can achieve with DataBindExtender. It's more than JSF allows, and much more stronger and neater to what ASP.NET has provided.
The sources can be found at DataBindExtender.cs
Lately, we have found that we've accustomed to declare C#'s local variables using var
:
var exitStateName = exitState == null ? "" : exitState.Name;
var rules = Environment.NavigationRules;
var rule = rules[caller.Name];
var flow = rule.NavigationCases[procedure.OriginExitState];
This makes code cleaner, and in presense of good IDE still allows to figure out types very easely.
We, howerer, found that var
tends to have exceptions in its
uses. E.g. for some reason most of boolean locals in our code tend to remain explicit
(matter of taste?):
bool succeed = false;
try
{
...
succeed = true;
}
finally
{
if (!succeed)
{
...
}
}
Also, type often survives in for
, but not in foreach
:
for(int i = 0; i < sourceDataMapping.Length;
++i)
{
...
}
foreach(var property in properties)
{
...
}
In addition var
has some limitations, as one cannot easily
initialize such local with null. From the following we prefer the first approach:
IWindowContext context = null;
var context = (IWindowContext)null;
var context = null as IWindowContext;
var context = default(IWindowContext);
We might need to figure out a consistent code style as for var
. It
might be like that:
Another code style could be like that:
var
.Recently we were raising a question about serialization of ASPX output in xslt.
The question went like this:
What's the recommended way of ASPX page generation?
E.g.:
------------------------
<%@ Page AutoEventWireup="true"
CodeBehind="CurMainMenuP.aspx.cs"
EnableSessionState="True"
Inherits="Currency.CurMainMenuP"
Language="C#"
MaintainScrollPositionOnPostback="True"
MasterPageFile="Screen.Master" %>
<asp:Content ID="Content1" runat="server" ContentPlaceHolderID="Title">CUR_MAIN_MENU_P</asp:Content>
<asp:Content ID="Content2" runat="server" ContentPlaceHolderID="Content">
<span id="id1222146581" runat="server"
class="inputField system UpperCase" enableviewstate="false">
<%# Dialog.Global.TranCode %>
</span>
...
------------------------
Notice aspx page directives, data binding expessions, and prefixed tag names without namespace declarations.
There was a whole range of expected answers. We, however, looked whether somebody have already dealed with the task and has a ready solution at hands.
In general it seems that xslt community is very angry about ASPX: both format and technology. Well, put this aside.
The task of producing ASPX, which is almost xml, is not solvable when you're staying with pure xml serializer. Xslt's xsl:character-map
does not work at all. In fact it looks as a childish attempt to address the problem, as it does not support character escapes but only grabs characters and substitutes them with strings.
We have decided to create ASPX serializer API producing required output text. This way you use <xsl:output method="text"/>
to generate ASPX pages.
With this goal in mind we have defined a little xml schema to describe ASPX irregularities in xml form. These are:
<xs:element name="declared-prefix">
- to describe known prefixes, which should not be declared;
<xs:element name="directive">
- to describe directives like <%@ Page %>;
<xs:element name="content">
- a transparent content wrapper;
<xs:element name="entity">
- to issue xml entity;
<xs:element name="expression">
- to describe aspx expression like <%# Eval("A") %>;
<xs:element name="attribute">
- to describe an attribute of the parent element. This approach greately simplified for us an ASPX generation process.
The API includes:
In previous posts we were crying about problems with JSF to ASP.NET migration. Let's point to another one.
Consider that you have an input field, whose value should be validated:
<input type="text" runat="server" ID="id1222146409" maxlength="4"/>
<bphx:DataBindExtender runat="server"
TargetControlID="id1222146409" ControlProperty="Value"
DataSource="<%# Import.AaControlAttributes %>"
DataMember="UserEnteredTrancode"/>
Here we have an input control, whose value is bound to Import.AaControlAttributes.UserEnteredTrancode
property. But what is missed is a value validation. Somewhere we have a function that could answer the question whether the value is valid. It should be called like this: Functions.IsTransactionCodeValid(value)
.
Staying within standard components we can use a custom validator on the page:
<asp:CustomValidator runat="server"
ControlToValidate="id1222146409"
OnServerValidate="ValidateTransaction"
ErrorMessage="Invalid transaction code."/>
and add the following code-behind:
protected void ValidateTransaction(object source, ServerValidateEventArgs args)
{
args.IsValid = Functions.IsTransactionCodeValid(args.Value);
}
This approach works, however it pollutes the code-behind with many very similar methods. The problem is that the validation rules in most cases are not property of page but one of data model. That's why page validation methods just forward check to somewhere.
While thinking on how to simplify the code we have came up with more conscious and short way to express validators, namely using lambda functions. To that end we have introduced a Validator
property of type ValueValidator
over DataBindExtender
. Where
/// <summary>A delegate to validate values.</summary>
/// <param name="extender">An extender instance.</param>
/// <param name="value">A value to validate.</param>
/// <returns>true for valid value, and false otherwise.</returns>
public delegate bool ValueValidator(DataBindExtender extender, object value);
/// <summary>An optional data member validator.</summary>
public virtual ValueValidator Validator { get; set; }
With this new property the page markup looks like this:
<input type="text" runat="server" ID="id1222146409" maxlength="4"/>
<bphx:DataBindExtender runat="server"
TargetControlID="id1222146409" ControlProperty="Value"
DataSource="<%# Import.AaControlAttributes %>"
DataMember="UserEnteredTrancode"
Validator='<%# (extender, value) => Functions.IsTransactionCodeValid(value as string) %>'
ErrorMessage="Invalid transaction code."/>
This is almost like an event handler, however it allowed us to call data model validation logic without unnecessary code-behind.
The updated DataBindExtender
can be found at DataBindExtender.cs.
Being well behind of the latest news and traps of the ASP.NET, we're readily falling on each problem.
This time it's a script injection during data binding.
In JSF there is a component to output data called h:outputText. Its use is like this:
<span jsfc="h:outputText" value="#{myBean.myProperty}"/>
The output is a span element with data bound value embeded into content. The natural alternative in ASP.NET seems to be an asp:Label
control:
<asp:Label runat="server" Text="<%# Eval("MyProperty") %>"/>
This almost works except that the h:outputText
escapes data (you may override this and specify attribute escape="false"
), and asp:Label
never escapes the data.
This looks as a very serious omission in ASP.NET (in fact very close to a security hole). What are chances that when you're creating a new page, which uses data binding, you will not forget to fix code that wizard created for you and to change it to:
<asp:Label runat="server" Text="<%# Server.HtmlEncode(Eval("MyProperty")) %>"/>
Eh? Think what will happen if MyProperty
will return a text that looks like a script (e.g.: <script>alert(1)</script>
), while you just wanted to output a label?
To address the issue we've also introduced a property Escape into DataBindExtender. So at present we have a code like this:
<asp:Label runat="server" ID="MyLabel"/>
<bphx:DataBindExtender runat="server" TargetControlID="MyLabel"
ControlProperty="Text" ReadOnly="true" Escape="true"
DataSource="<%# MyBean %>" DataMember="MyProperty"/>
See also: A DataBindExtender, Experience of JSF to ASP.NET migration
After struggling with ASP.NET data binding we found no other way but to introduce our little extender control to address the issue.
We were trying to be minimalistic and to introduce two way data binding and to support data conversion. This way extender control (called DataBindExtender
) have following page syntax:
<asp:TextBox id=TextBox1 runat="server"></asp:TextBox>
<cc1:DataBindExtender runat="server"
DataSource="<%# Data %>"
DataMember="ID"
TargetControlID="TextBox1"
ControlProperty="Text" />
Two way data binding is provided with DataSource
object (notice data binding over this property) and a DataMember
property from the one side, and TargetControlID
and ControlProperty
from the other side. DataBindExtender
supports Converter
property of type TypeConverter
to support custom converters.
DataBindExtender
is based on AjaxControlToolkit.ExtenderControlBase
class and implements System.Web.UI.IValidator
. ExtenderControlBase
makes implementation of extenders extremely easy, while IValidator
plugs natuarally into page validation (Validate
method, Validators
collections, ValidationSummary
control).
The good point about extenders is that they are not visible in designer, while it exposes properties in extended control itself. The disadvantage is that it requires Ajax Control Toolkit, and also ScriptManager
component of the page.
To simplify the use DataBindExtender
gets data from control and puts the value into data source in Validate
method, and puts data into control in OnPreRender
method; thus no specific action is required to perform data binding.
Source for the DataBindExtender
is DataBindExtender.cs.
We used to think that ASP.NET is a way too powerful than JSF. It might be still true, but not when you are accustomed to JSF and spoiled with its code practice...
Looking at both technologies from a greater distance, we now realize that they give almost the same level of comfort during development, but they are different. You can feel this after you were working for some time with one technology and now are to implement similar solution in opposite one. That is where we have found ourselves at present.
The funny thing is that we did expect some problems but in a different place. Indeed, both ASP.NET and JSF are means to define a page layout and to map input and output of business data. While with the presentation (controls, their compositions, masters, styles and so on) you can find more or less equal analogies, the differences of implementation of data binding is a kind of a pain.
We have found that data binding in ASP.NET is somewhat awkward. Its Eval
and Bind
is bearable in simple cases but almost unusable when you business data is less trivial, or if you have to apply custom data formatting.
In JSF, with its Expression Language, we can perform two way data binding for rather complex properties like ${data.items[index + 5].property}
, or to create property adapters ${my:asSomething(data.bean, "property").Value}
, or add standard or custom property converters. In contrast data binding in ASP.NET is limited to simple property path (no expressions are supported), neither custom formatters are supported (try to format number as a telephone number).
Things work well when you're designing ASP.NET application from scratch, as you naturally avoid pitfalls, however when you got existing business logic and need to expose it to the web, you have no other way but to write a lot of code behind just to smooth out the problems that ASP.NET exhibits.
Another solution would be to design something like extender control that would attach more proper data binding and formatting facilities to control properties. That would allow to make page definitions in more declarative way, like what we have now in JSF.
While porting a solution from JSF to ASP.NET we have seen an issue with synchronization of access to a data stored in a session from multiple requests.
Consider a case when you store a business object in a session.
Going through the request lifecycle we observe that this business object may be accessed at different stages: data binding, postback event handler, security filters, other.
Usually this business object is mutable and does not assume concurent access. Browsers, however, may easily issue multiple requests to the same session at the same time. In fact, such behaviour, is not even an exception, as browsers nowadays are often sending concurrent requests.
In the JSF we're using a sync object, which is part of business object itself; lock it and unlock at the begin and at the end of a request correspondingly. This works perfectly as JSF guarantees that:
@PostConstruct
and @PreDestroy
annotations to lock and unlock);
ASP.NET, in contrast, tries to be more asynchronous, and allows for different stages of request to take place in different threads. This could be seen indirectly in the documentation, which does not give any commitments in this regards, and with code inspection where you can see that request can begin in one thread, and a next stage can be queued for the execution into the other thread.
In addition, ASP.NET does not guarantee that if BeginRequest
has been executed then EndRequest
will also run.
The conclusion is that we should not use locks to synchronize access to the same session object, but rather try to invent other means to avoid data races.
Update msdn states:
Concurrent Requests and Session State
Access to ASP.NET session state is exclusive per session, which means that if two different users make concurrent requests, access to each separate session is granted concurrently. However, if two concurrent requests are made for the same session (by using the same SessionID value), the first request gets exclusive access to the session information. The second request executes only after the first request is finished. (The second session can also get access if the exclusive lock on the information is freed because the first request exceeds the lock time-out.)
This means that the required synchronization is already built into ASP.NET. That's good.
We have implemented report parser in C#. Bacause things are spinned around C#, a schema definition is changed.
We have started from classes defining a report definition tree, annotated these classes for xml serialization, and, finally, produced xml schema for such tree. So, at present, it is not an xml schema with annotations but a separate xml schema.
In addition we have defined APIs:
IEnumerable<ViewValue>
to iterate report data in structured form);XmlReader
, which allows, for example, to
have report as input for an xslt tranformation.XmlWriter
.An example of report definition as C# code is: MyReport.cs. The very same report definition but serialized into xml is my-report.xml. A generated xml schema for a report definition is: schema0.xsd.
The good point about this solution is that it's already flexible enough to describe every report layout we have at hands, and it's extendable. Our measurments show that report parsing is extremely fast and have very small memory footprint due to forward only nature of report definitions.
From the design point of view report definition is a view of original text data with view info attached.
At present we have defined following views:
To specify details of definitions there are:
Add
, Div
,
Integer
, MatchProperty
, Max
, Min
,
Mod
, Mul
, Neg
, Null
,
Sub
, VariableRef
, ViewProperty
, Case
;
And
, EQ
, GE
,
GT
, IsMatch
, LE
, LT
,
NE
, Not
, Or
.At present there is no a specification of a report definitions. Probably, it's the most complex part to create such a spec for a user without deep knowledge. At present, our idea is that one should use xml schema (we should polish generated schema) for the report definition and schema aware editor to build report definitions. That's very robust approach working perfectly with languages xom.
C# sources can be found at: ReportLayout.zip including report definition classes and a sample report.
We're facing a task of parsing reports produced from legacy applications and converting them into a structured form, e.g. into xml. These xml files can be processed further with up to date tools to produce good looking reports.
Reports at hands are of very different structure and of size: from a couple of KB to a several GB. The good part is that they mostly have a tabular form, so it's easy to think of specific parsers in case of each report type.
Our goal is to create an environment where a less qualified person(s) could create and manage such parsers, and only rarely to engage someone who will handle less untrivial cases.
Our analysis has shown that it's possible to write such parser in almost any language: xslt, C#, java.
Our approach was to create an xml schema annotations that from one side define a data structure, and from the other map report layout. Then we're able to create an xslt that will generate either xslt, C#, or java parser according to the schema definitions. Because of languages xom, providing XML Object Model and serialization stylesheets for C# and java, it does not really matter what we shall generate xslt or C#/java, as code will look the same.
The approach we're going to use to describe reports is not as powerfull as conventional parsers. Its virtue, however, is simplicity of specification.
Consider a report sample (a data to extract is in bold):
1 TITLE ... PAGE: 1 BUSINESS DATE: 09/30/09 ... RUN DATE: 02/23/10 CYCLE : ITD RUN: 001 ... RUN TIME: 09:22:39 CM BUS ... CO NBR FRM FUNC ... ----- ----- ----- ----- XXX 065 065 CLR ... YYY ... ... 1 TITLE ... PAGE: 2 BUSINESS DATE: 09/30/09 ... RUN DATE: 02/23/10 CYCLE : ITD RUN: 001 ... RUN TIME: 09:22:39 CM BUS ... CO NBR FRM FUNC ... ----- ----- ----- ----- AAA NNN MMM PPP ... BBB ... ... * * * * * E N D O F R E P O R T * * * * *
We're approaching to the report through a sequence of views (filters) of this report. Each veiw localizes some report data either for the subsequent filterring or for the extraction of final data.
Looking into the example one can build following views of the report:
A sequence of filters allows us to build a pipeline of transformations of original text. This also allows us to generate a clean xslt, C# or java code to parse the data.
At first, our favorite language for such parser was xslt. Unfortunatelly, we're dealing with Saxon xslt implementation, which is not very strong in streaming processing. Without a couple of extension functions to prevent caching, it tends to cache whole input in the memory, which is not acceptable.
At present we have decided to start from C# code, which is pure C# naturally.
Code still is in the development but at present we would like to share the xml schema annotations describing report layout: report-mapping.xsd, and a sample of report description: test.xsd.
To see that the problem with Generator functions in xslt is a bit more complicated compare two functions.
The first one is quoted from the earlier post:
<xsl:function name="t:generate" as="xs:integer*">
<xsl:param name="value" as="xs:integer"/>
<xsl:sequence select="$value"/>
<xsl:sequence select="t:generate($value * 2)"/>
</xsl:function>
It does not work in Saxon: crashes with out of memory.
The second one is slightly modified version of the same function:
<xsl:function name="t:generate" as="xs:integer*">
<xsl:param name="value" as="xs:integer"/>
<xsl:sequence select="$value + 0"/>
<xsl:sequence select="t:generate($value * 2)"/>
</xsl:function>
It's working without problems. In first case Saxon decides to cache all function's output, in the second case it decides to evaluate data lazily on demand.
It seems that optimization algorithms implemented in Saxon are so plentiful and complex that at times they fool one another. :-)
See also: Generator functions
There are some complications with streamed tree that we have implemented in saxon. They are due to the fact that only a view of input data is available at any time. Whenever you access some element that's is not available you're getting an exception.
Consider an example. We have a log created with java logging. It looks like this:
<log>
<record>
<date>...</date>
<millis>...</millis>
<sequence>...</sequence>
<logger>...</logger>
<level>INFO</level>
<class>...</class>
<method>...</method>
<thread>...</thread>
<message>...</message>
</record>
<record>
...
</record>
...
We would like to write an xslt that returns a page of log as html:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:t="http://www.nesterovsky-bros.com/xslt/this"
xmlns="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="xs t">
<xsl:param name="start-page" as="xs:integer" select="1"/>
<xsl:param name="page-size" as="xs:integer" select="50"/>
<xsl:output method="xhtml" byte-order-mark="yes" indent="yes"/>
<!-- Entry point. -->
<xsl:template match="/log">
<xsl:variable name="start" as="xs:integer"
select="($start-page - 1) * $page-size + 1"/>
<xsl:variable name="records" as="element()*"
select="subsequence(record, $start, $page-size)"/>
<html>
<head>
<title>
<xsl:text>A log file. Page: </xsl:text>
<xsl:value-of select="$start-page"/>
</title>
</head>
<body>
<table border="1">
<thead>
<tr>
<th>Level</th>
<th>Message</th>
</tr>
</thead>
<tbody>
<xsl:apply-templates mode="t:record" select="$records"/>
</tbody>
</table>
</body>
</html>
</xsl:template>
<xsl:template mode="t:record" match="record">
<!-- Make a copy of record to avoid streaming access problems. -->
<xsl:variable name="log">
<xsl:copy-of select="."/>
</xsl:variable>
<xsl:variable name="level" as="xs:string"
select="$log/record/level"/>
<xsl:variable name="message" as="xs:string"
select="$log/record/message"/>
<tr>
<td>
<xsl:value-of select="$level"/>
</td>
<td>
<xsl:value-of select="$message"/>
</td>
</tr>
</xsl:template>
</xsl:stylesheet>
This code does not work. Guess why? Yes, it's subsequence()
, which is too greedy.
It always wants to know what's the next node, so it naturally skips a content of
the current node. Algorithmically, such saxon code could be rewritten, and could possibly work
better also in modes other than streaming.
A viable workaround, which does not use subsequence, looks rather untrivial:
<!-- Entry point. -->
<xsl:template match="/log">
<xsl:variable name="start" as="xs:integer"
select="($start-page - 1) * $page-size + 1"/>
<xsl:variable name="end" as="xs:integer"
select="$start + $page-size"/>
<html>
<head>
<title>
<xsl:text>A log file. Page: </xsl:text>
<xsl:value-of select="$start-page"/>
</title>
</head>
<body>
<table border="1">
<thead>
<tr>
<th>Level</th>
<th>Message</th>
</tr>
</thead>
<tbody>
<xsl:sequence select="
t:generate-records(record, $start, $end, ())"/>
</tbody>
</table>
</body>
</html>
</xsl:template>
<xsl:function name="t:generate-records" as="element()*">
<xsl:param name="records" as="element()*"/>
<xsl:param name="start" as="xs:integer"/>
<xsl:param name="end" as="xs:integer?"/>
<xsl:param name="result" as="element()*"/>
<xsl:variable name="record" as="element()?" select="$records[$start]"/>
<xsl:choose>
<xsl:when test="(exists($end) and ($start > $end)) or empty($record)">
<xsl:sequence select="$result"/>
</xsl:when>
<xsl:otherwise>
<!-- Make a copy of record to avoid streaming access problems. -->
<xsl:variable name="log">
<xsl:copy-of select="$record"/>
</xsl:variable>
<xsl:variable name="level" as="xs:string"
select="$log/record/level"/>
<xsl:variable name="message" as="xs:string"
select="$log/record/message"/>
<xsl:variable name="next-result" as="element()*">
<tr>
<td>
<xsl:value-of select="$level"/>
</td>
<td>
<xsl:value-of select="$message"/>
</td>
</tr>
</xsl:variable>
<xsl:sequence select="
t:generate-records
(
$records,
$start + 1,
$end,
($result, $next-result)
)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:function>
Here we observed the greediness of saxon, which too early tried to consume more input than it's required. In the other cases we have seen that it may defer actual data access to the point when there is no data anymore.
So, without tuning internal saxon logic it's possible but not easy to write stylesheets that exploit streaming features.
P.S. Updated sources are at streamedtree.zip
At some point we needed to have an array with volatile elements in java.
We knew that such beast is not found in the java world. So we searched the Internet and found the answers that are so wrong, and introduce so obscure threading bugs that the guys who provided them would better hide them and run immediately to fix their buggy programs...
The first one is Volatile arrays in Java. They suggest such solution:
volatile int[] arr = new int[...];
...
arr[4] = 100;
arr = arr;
The number two: What Volatile Means in Java
A guy assures that this code works:
Fields:
int answer = 0;
volatile boolean ready = false;
Thread1:
answer = 42;
ready = true;
Thread2:
if (ready)
{
print(answer);
}
They are very wrong! Non volatile access can be reordered by the implementation. See Java's Threads and Locks:
The rules for volatile variables effectively require that main memory be touched exactly once for each use or assign of a volatile variable by a thread, and that main memory be touched in exactly the order dictated by the thread execution semantics. However, such memory actions are not ordered with respect to read and write actions on nonvolatile variables.
They probably thought of locks when they argued about volatiles:
a lock action acts as if it flushes all variables from the thread's working memory; before use they must be assigned or loaded from main memory.
P.S. They would better recommend AtomicReferenceArray.
When time has come to process big xml log files we've decided to implement streamable tree in saxon the very same way it was implemented in .net eight years ago (see How would we approach to streaming facility in xslt).
It's interesting enough that the implementation is similar to one of composable tree. There a node never stores a reference to a parent, while in the streamed tree no references to children are stored. This way only a limited subview of tree is available at any time. Implementation does not support preceding and preceding-sibling axes. Also, one cannot navigate to a node that is out of scope.
Implementation is external (there are no changes to saxon itself). To use it one
needs to create an instance of DocumentInfo
, which pulls data from
XMLStreamReader
, and
to pass it as an input to a transformation:
Controller controller =
(Controller)transformer;
XMLInputFactory factory = XMLInputFactory.newInstance();
StreamSource inputSource = new StreamSource(new File(input));
XMLStreamReader reader = factory.createXMLStreamReader(inputSource);
StaxBridge bridge = new StaxBridge();
bridge.setPipelineConfiguration(
controller.makePipelineConfiguration());
bridge.setXMLStreamReader(reader);
inputSource = new DocumentImpl(bridge);
transformer.transform(inputSource, new StreamResult(output));
This helped us to format an xml log file of arbitrary size. An xslt like this can do the work:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://www.w3.org/1999/xhtml"
exclude-result-prefixes="xs">
<xsl:template match="/log">
<html>
<head>
<title>Log</title>
</head>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:template>
<xsl:template match="message">
...
</xsl:template>
<xsl:template match="message[@error]">
...
</xsl:template>
...
</xsl:stylesheet>
Implementation can be found at: streamedtree.zip
By the generator we assume a function that produces an infinitive output sequence for a particular input.
That's a rather theoretical question, as xslt does not allow infinitive sequence, but look at the example:
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:t="http://www.nesterovsky-bros.com/xslt"
exclude-result-prefixes="xs t">
<xsl:template match="/">
<xsl:variable name="value" as="xs:string" select="'10100101'"/>
<xsl:variable name="values" as="xs:integer+"
select="t:generate(1)"/>
<!--<xsl:variable name="values" as="xs:integer+">
<xsl:call-template name="t:generate">
<xsl:with-param name="value" select="1"/>
</xsl:call-template>
</xsl:variable>-->
<xsl:variable name="integer" as="xs:integer" select="
sum
(
for $index in 1 to string-length($value)
return
$values[$index][substring($value, $index, 1) = '1']
)"/>
<xsl:message select="$integer"/>
</xsl:template>
<xsl:function name="t:generate" as="xs:integer*">
<xsl:param name="value" as="xs:integer"/>
<xsl:sequence select="$value"/>
<xsl:sequence select="t:generate($value * 2)"/>
</xsl:function>
<!--<xsl:template name="t:generate" as="xs:integer*">
<xsl:param name="value" as="xs:integer"/>
<xsl:sequence select="$value"/>
<xsl:call-template name="t:generate">
<xsl:with-param name="value" select="$value * 2"/>
</xsl:call-template>
</xsl:template>-->
</xsl:stylesheet>
Here the logic uses such a generator and decides by itself where to break.
Should such code be valid?
From the algorithmic perspective example would better to work, as separation of generator logic and its use are two different things.
Lately, after playing a little with saxon tree models, we thought that design
would be more cleaner and implementation faster if NamePool
were implemented
differently.
Now, saxon is very pessimistic about java objects thus it prefers to encode
qualified names with integers. The encoding and decoding is done in the
NamePool
. Other parts of code use these integer values.
Operations done over these integers are:
NamePool
.We would design this differently. We would:
QualifiedName
class to store all name parts.NamePool
to create and cache QualifiedName
instances.This way:
That's the implementation we would propose: QualifiedName.java, NameCache.java
Earlier, in the entry "Inline functions in xslt 2.1" we've described an implementation of xml tree model that may share subtrees among different trees.
This way, in a code:
<xsl:variable name="elements" as="element()*" select="..."/>
<xsl:variable name="result" as="element()">
<result>
<xsl:sequence select="$elements"/>
</result>
</xsl:variable>
the implementation shares internal representation among $elements
and subtree of $result
. From the perspective of xslt it
looks as completely different subtrees with different node identities, which is
in the accordance with its view of the world.
After a short study we've decided to create a research implementation of this tree model in saxon. It's took only a couple of days to introduce a minimal changes to engine, to refactor linked tree into a new composable tree, and to perform some tests.
In many cases saxon has benefited immediately from this new tree model, in some other cases more tunings are required.
Our tests've showed that this new tree performed better than linked tree, but a little bit worser than tiny tree. On the other hand, it's obvious that conventional code patterns avoid subtree copying, assuming it's expensive operation, thus one should rethink some code practices to benefit from composable tree.
Implementation can be downloaded at: saxon.composabletree.zip
From the web we know that xslt WG is thinking now on how to make xslt more friendly to a huge documents. They will probably introduce some xslt syntax to allow implementation to be ready for a such processing.
They will probably introduce an indicator marking a specific mode for streaming. XPath in this mode will probably be restricted to a some its subset.
The funny part is that we have implemented similar feature back in 2002 in .net. It was called XPathForwardOnlyNavigator.
Implementation stored only several nodes at a time (context node and its ancestors), and read data from XmlReader perforce. Thus one could navigate to ancestor elements, to children, and to the following siblings, but never to any previous node. When one tried to reach a node that was already not available we threw an exception.
It was simple, not perfect (too restrictive) but it was pluggable in .net's xslt, and allowed to process files of any size.
That old implementation looks very attractive even now in 2010. We expect that WG with their decisions will not rule out such or similar solutions, and will not force implementers to write alternative engine for xslt streaming.
Xslt 1.0 has been designed based on the best intentions. Xslt 2.0 got a legacy baggage.
If you're not entirely concentrated during translation of your algorithms into xslt 2.0 you can get into trap, as we did.
Consider a code snapshot:
<xsl:variable name="elements" as="element()+">
<xsl:apply-templates/>
</xsl:variable>
<xsl:variable name="converted-elements" as="element()+"
select="$elements/t:convert(.)"/>
Looks simple, isn't it?
Our intention was to get converted elements, which result from some
xsl:apply-templates
logic.
Well, this code works... but rather sporadically, as results are often in wrong order! This bug is very close to what is called a Heisenbug.
So, where is the problem?
Elementary, my dear Watson:
xsl:apply-templates
constructs a sequence of rootless
elements.$elements/t:convert(.)
converts elements and orders them in document order.Here is a tricky part:
The relative order of nodes in distinct trees is stable but implementation-dependent...
Clearly each rootless element belongs to a unique tree.
After that we have realized what the problem is, code has been immediately rewritten as:
<xsl:variable name="elements" as="element()+">
<xsl:apply-templates/>
</xsl:variable>
<xsl:variable name="converted-elements" as="element()+" select="
for $element in $elements return
t:convert($element)"/>
P.S. Taking into an accout a size of our xslt code base, it took a half an hour to localize the problem. Now, we're at position to review all uses of slashes in xslt. As you like it?
Inline functions in xslt 2.1 look often as a some strange aberration. Sure, there are very usefull cases when they are delegates of program logic (e.g. comparators, and filters), but often (probably more often) we can see that it's use is to model data structures.
As an example, suppose you want to model a structure with three properties say a, b, and c. You implement this creating functions that wrap and unwrap the data:
function make-data($a as item(), $b as item(), $c
as item()) as function() as item()+
{
function() { $a, $b, $c }
}
function a($data as function() as item()+) as item()
{
$data()[1]
}
function b($data as function() as item()+) as item()
{
$data()[2]
}
function c($data as function() as item()+) as item()
{
$data()[3]
}
Clever?
Sure, it is! Here, we have modeled structrue with the help of sequence, which we have wrapped into a function item.
Alas, clever is not always good (often it's a sign of a bad). We just wanted to define a simple structure. What it has in common with function?
There is a distance between what we want to express, designing an algorithm, and what we see looking at the code. The greater the distance, the more efforts are required to write, and to read the code.
It would be so good to have simpler way to express such concept as a structure. Let's dream a little. Suppose you already have a structure, and just want to access its members. An idea we can think about is an xpath like access method:
$data/a, $data/b, $data/c
But wait a second, doesn't
$data
looks very like an xml element, and its accessors are just
node tests?
That's correct, so data constructor may coincide with element constructor.
Then what pros and cons of using of xml elements to model structures?
Pros are: existing xml type system, and sensibly looking code (you just understand that here we're constructing a structure).
Cons are: xml trees are implemented the way that does not assume fast (from the perfromace perspective) composition, as when you construct a structure a copy of data is made.
But here we observe that "implemented" is very important word in this context. If xml tree implementation would not store reference to the parent node then subtrees could be composed very efficiently (note that tree is assumed to be immutable). Parent node could be available through a tree navigator, which would contain reference to a node itself and to a parent tree navigator (or to store child parent map somewhere near the root).
Such tree structure would probably help not only in this particular case but also with other conventional xslt code patterns.
P.S. Saxon probably could implement its NodeInfo
this way.
Update: see also Custom tree model.
A while ago we have proposed to introduce maps as built-in types in xpath/xquery type system: Tuples and maps.
The suggestion has been declined (probably our arguments were not convincing). We, however, still think that maps along with sequences are primitives, required to build sensible (not obscure one) algorithms. To see that map is at times is the best way to resolve the problems we shall refer to an utility function to allocate names in scope. Its signature looks like this:
<!--
Allocates unique names in the form $prefix{number}?.
Note that prefixes may coincide.
$prefixes - a name prefixes.
$names - allocated names pool.
$name-max-length - a longest allowable name length.
Returns unique names.
-->
<xsl:function name="t:allocate-names" as="xs:string*">
<xsl:param name="prefixes" as="xs:string*"/>
<xsl:param name="names" as="xs:string*"/>
<xsl:param name="name-max-length" as="xs:integer?"/>
Just try to program it and you will find yourselves coding something like one defined at cobolxom.
To be efficient such maps should provide, at worst, a logarithmic operation complexity:
These performance metrics are found in many functional and procedural implementations of the maps. Typical RB and AVL tree based maps satisfy these restrictions.
What we suggest is to introduce map implementation into the exslt2 (assuming inline functions are present). As a sketch we have implemented pure AVL Tree in Xslt 2.0:
We do not assume that implementation like this should be used, but rather think of some extension function(s) that provides a better performance.
What do you think?
The story about immutable tree would not be complete without xslt implementation. To make it possible one needs something to approxomate tree nodes. You cannot implement such consruct efficiently in pure xslt 2.0 (it would be either unefficient or not pure).
To isolate the problem we have used tuple interface:
tuple:ref($items as item()*) as item()
- to wrap items into a tuple;tuple:deref($tuple as item()?) as item()*
- to unwrap items from a tuple;tuple:is-same($first as item(), $second as item()) as xs:boolean
- to test
whether two tuples are the same.and defined inefficient implementation based on xml elements. Every other part of code is a regular AVL algorithm implementation.
We want to stress again that even assuming that there is a good tuple implementation we would prefer built-in associative container implementation. Why the heck you need to include about 1000 lines of code just to use a map?
Source code is:
We like Visual Studio very much, and try to adopt new version earlier.
For the last time our VS's use pattern is centered around xml and xslt. In our opinion VS 2008 is the best xslt 2 editor we have ever seen even with lack of support of xslt 2.0 debugging.
Unfortunatelly, that is still a true when VS 2010 is almost out. VS 2008 is just 2 - 3 times faster. You can observe this working with xslt files like those in languages-xom.zip (1000 - 2000 rows). Things just become slow.
We still hope that VS 2010 will make a final effort to outdo what VS 2008 has already delivered.
Why do we return to this theme again?
Well, it's itching!
In cobolxom there is an utility function to allocate names in scope. Its signature looks like this:
<!--
Allocates unique names in the form $prefix{number}?.
Note that prefixes may coincide.
$prefixes - a name prefixes.
$names - allocated names pool.
$name-max-length - a longest allowable name length.
Returns unique names.
-->
<xsl:function name="t:allocate-names" as="xs:string*">
<xsl:param name="prefixes" as="xs:string*"/>
<xsl:param name="names" as="xs:string*"/>
<xsl:param name="name-max-length" as="xs:integer?"/>
We have created several different implementations (all use recursion). Every implementation works fair for relatively small input sequences, say N < 100, but we have cases when there are about 10000 items on input. Algorithm's worst case complexity, in absence of associative containers, is O(N*N), and be sure it's an O-o-o-oh... due to xslt engine implementation.
If there were associative containers with efficient access (complexity is O(LogN)), and construction of updated container (complexity is also O(LogN)) then implementation would be straightforward and had complexity O(N*LogN).
Given:
public class N
{
public readonly N next;
}
What needs to be done to construct a ring of N
: n1
refers to n2
, n2
to n3
, ... nk
to n1
? Is it possible?
To end with immutable trees, at least for now, we've implemented IDictionary<K, V>
.
It's named Map<K, V>
. Functionally it looks very like SortedDictionary<K, V>
.
there are some differences, however:
Map
in contrast to SortedDictionary
is very cheap on
copy.Map
is based on AVL tree, which is more rigorly balanced
than RB tree, so it's a little bit faster asymptotically for lookup than SortedDictionary
,
and a little bit slower on modification.Map
consumes less memory than
SortedDictionary
, and is probably cheaper for GC (simple garbage
graphs).Sources are:
Update:
It was impossible to withstand temptation to commit some primitive performance
comparision. Map
outperforms SortedDictionary
both in population and in access.
this does not aggree with pure algorithm's theory, but there might be other
unaccounted factors: memory consumption, quality of implementation, and so on.
Program.cs is updated with measurements.
Update 2:
More occurate tests show that for some key types Map
's faster, for others
SortedDictionary
's faster. Usually Map
's slower during population (mutable AVL
tree navigator may fix this). the odd thing is that Map<string, int>
is faster
than SortedDictionary<string, int>
both for allocaction and for access. See
excel report.
Update 3:
Interesing observation. The following table shows maximal and average tree heights for different node sizes in AVL and RB trees after a random population:
AVL | RB | |||
---|---|---|---|---|
Size | Max | Avg | Max | Avg |
10 | 4 | 2.90 | 5 | 3.00 |
50 | 7 | 4.94 | 8 | 4.94 |
100 | 8 | 5.84 | 9 | 5.86 |
500 | 11 | 8.14 | 14 | 8.39 |
1000 | 12 | 9.14 | 16 | 9.38 |
5000 | 15 | 11.51 | 18 | 11.47 |
10000 | 16 | 12.53 | 20 | 12.47 |
50000 | 19 | 14.89 | 23 | 14.72 |
100000 | 20 | 15.90 | 25 | 15.72 |
500000 | 25 | 18.26 | 28 | 18.27 |
1000000 | 25 | 19.28 | 30 | 19.27 |
Here, according with theory, the height of AVL tree is shorter than the height of RB tree. But what is most interesting is that the depth of an "average node". This value describes a number of steps required to find a random key. RB tree is very close and often is better than AVL in this regard.
It was obvious as hell from day one of generics that there will appear obscure long names when you will start to parametrize your types. It was the easiest thing in the world to take care of this in advanvce. Alas, C# inherits C++'s bad practices.
Read Associative containers in a functional languages and Program.cs to see what we're talking about.
Briefly, there is a pair (string
, int
), which in C# should be declared as:
System.Collections.Generic.KeyValuePair<string, int>
Obviously we would like to write it in a short way. These are our attempts, which fail:
1. Introduce generic alias Pair<K, V>:
using System.Collections.Generic;
using Pair<K, V> = KeyValuePair<K, V>;
2. Introduce type alias for a generic type with specific types.
using System.Collections.Generic;
using Pair = KeyValuePair<string, int>;
And this is only one that works:
using Pair = System.Collections.Generic.KeyValuePair<string, int>;
Do you think is it bearable? Well, consider the following:
ValueNode<T>
, where T
should be Pair
.TreeNavigator<N>
, where N
is should be ValueNode<Pair>
.The declaration looks like this:
using Pair = System.Collections.Generic.KeyValuePair<string, int>;
using Node = NesterovskyBros.Collections.AVL.ValueNode<
System.Collections.Generic.KeyValuePair<string, int>>;
using Navigator = NesterovskyBros.Collections.AVL.TreeNavigator<
NesterovskyBros.Collections.AVL.ValueNode<
System.Collections.Generic.KeyValuePair<string, int>>>;
Do you still think is it acceptable?
P.S. Legacy thinking led C#'s and java's designers to the use of word "new" for the
object construction. It is not required at all. Consider new Pair("A", 1)
vs Pair("A", 1)
.
C++ prefers second form. C# and java always use the first one.
Continuing with the post "Ongoing xslt/xquery spec update" we would like to articulate what options regarding associative containers do we have in a functional languages (e.g. xslt, xquery), assuming that variables are immutable and implementation is efficient (in some sense).
There are three common implementation techniques:
Implementation choice considerably depends on operations, which are taken over the container. Usually these are:
Note that modification in a functional programming means a creation of a new container, so here is a division:
On the other hand to deal with container's modifications one can build an implementation, which uses "top-down" RB or AVL trees. To see the difference consider a classical tree structure and its functional variant:
Classical | Functional | |
---|---|---|
Node structure: |
node |
node |
Node reference: | node itself | node path from a root of a tree |
Modification: | either mutable or requires a completely new tree | O(LnN) nodes are created |
Here we observe that:
Here we provide a C# implementation of a functional AVL tree, which also supports element indexing:
Our intention was to show that the usual algorithms for associative containers apply in functional programming; thus a feature complete functional language must support associative containers to make development more conscious, and to free a developer from inventing basic things existing already for almost a half of century.
Several years ago we have started a new project. We do not like neither hate any particular language, thus the decision what language to use was pragmatical: xslt 2.0 fitted perfectly.
At present it's a solid volume of xslt code. It exhibits all the virtues of any other good project in other language: clean design, modularity, documentation, sophisticationless (good code should not be clever).
Runtime profile of the project is that it deals with xml documents with sizes from a few dozens of bytes to several megabytes, and with xml schemas from simple ones like a tabular data, and to rich like xhtml and untyped. Pipeline of stylesheets processes gigabytes of data stored in the database and in files.
All the bragging above is needed here to introduce the context for the following couple of lines.
The diversity of load conditions and a big code base, exposed xslt engine of choice to a good proof test. The victim is Saxon. In the course of project we have found and reported many bugs. Some of them are nasty and important, and others are bearable. To Michael Kay's credit (he's owner of Saxon) all bugs are being fixed promtly (see the last one).
Such project helps us to understand a weak sides of xslt (it seems sometimes they, in WG, lack such experience, which should lead them through).
Well, it has happened so that we're helping to Saxon project. Unintentionally, however!
P.S. About language preferences.
Nowdays we're polishing a COBOL generation. To this end we have familiarized ourselves with this language. That's the beatiful language. Its straightforwardness helps to see the evolution of computer languages and to understand what and why today's languages try to timidly hide.
In spite of the fact that our last projects are being developed in Java, the .NET is definitly our favorite platform.
In a twitter I saw the phrase: "Java the language is a stagnant mess". It's said in favour of C#. It's true that C# significantly affects now even on Java (let's remember generics, jaxb, web services, etc.), but in my opinion, the C# won't be the leading language for worldwide enterprise applications in the nearest future.
One of causes is that the main platform for .NET still is Windows. The situation could be changed by Mono project, but I think there are yet not enough projects on platforms other than Windows.
My guess is confirmed by some of observations that I did as a software engineer of an IT company. Our company performs different software porting projects from legacy programming languages like COBOL, ADSO, Natural etc. into up to date languages like Java, C# etc. It worth to say that clients rarely select to migrate to .NET despite to our advices.
The main reason of such choice, according to most of our clients, is that they want to be platform independent and only Java gives them this choice.
It worth for Microsoft to think about cooperation with Mono in order to make .NET really platform indpendent, otherwise C# will always be a step behind Java despite apparent advantages of C# as a programming language.
Suppose you have a library, which is out in the production and is used by many clients. At the same time the library evolves: API is extended, bugs are being fixed, code becomes faster and cleaner, bla bla bla...
At some point you're fixing some important bug that's been hiding for a long time in the bowels of your library. You're happy that you've spotted it before clients got into troubles. You're notifying all the clients that there is an important fix, and that they need to update the library.
What do you think you hear in return?
Well, we're not perfect, there are bugs in our software. We and our clients realize this. Nothing will eliminate bugs to creep into a code from time to time.
That's a train of thought of a particular client:
We agree that there is a bug and that it has to be fixed. We, however, want to touch the library in a minimal way, as who knows what other new bugs had they introduced, so let's ask them to fix this particular bug in our version of the library.
That's fair from the client's perspective. They don't want better code, they just want that particular bug fixed!
For us, however, this means branching some old version of the library, fixing bug and supporting this branch for the particular client. It's fair to expect similar position from each client, thus should we create and support library branches per client, and branch a main branch for a new client only?
For us (Arthur and Vladimir) it looks as enormous waste of resources. We (our company) should either hire more and more scaled people or experience gradual slowdown of support and development.
Our answer could be obvious if not position of top managers who value client relations so much that they easily promise whatever client wishes. No arguments that latest version is better tested, more conforming to specifications, more reliable, faster and so on are accepted. The main argument against our position is that the client's applications run in the production, and no new potential bugs are acceptable.
Here is our dilemma: we neither can convince the client (more precisely our managers) that we're right, nor are convinced with their arguments...
From time to time I like to tell, to express (or even to brag about) what keeps our minds busy. Sometimes it's even useful, and you unexpectedly discover, you are on the edge of community interests, other time you are outsider.
So, what are we doing now? Well, already several years we are working in Multiconn. One of goals of our company is to help to expose mainframe functionality. Many different solutions of this task are available, and even Multiconn provides several different approaches. Now go right to our business - web service as delegate of mainframe.
To be honest, I think mainframes are legacy monsters, dinosaurs. I realize, it's only a bad perspective, but the first experience is often very strong. For us as developers mainframe is a source of data in different formats (sure formats are legacy also): COBOL records, terminal messages, and so on. Our idea is to allow to mainframe to rest in peace and to gracefully consume its data. Combining consumed data in xml form and operation bindings that describe mainframe's application flow, we arrive to a view of mainframe application as a web service.
To achieve this goal we have worked up extension to xml schema that allows mapping schema elements to a data. It should be pointed, it's perfectly legal to extend xml schema. One way to extend it, is to use elements in custom namespace in appInfo element.
The following was to create importer of schema with our extensions. It's somehow similar to xsd.exe tool. Our tool generates classes with annotations XmlXxxAttribute for xml serialization and LayoutXxxAttribute (this is our custom attributes) for instance serializing and deserializing to and from a mainframe data.
The next stage was to create data serializer. This is a counterpart to XmlSerializer. It inspects class meta-data and creates plan for serialization and deserialization.
On the next stage we have worked up wsdl schema bindings for our technology.
After that we have created tool to import wsdl (similar to wsdl.exe) that generates web service which passes input messages to communication layer that serializes and forwards data to mainframe, accepts and deserializes response and returns it to the web service, which in its turn returns result to a client.
The next was communication layer that interacts with mainframe. This layer consists of abstract (general) sublayer and specializations which support different patterns of communications.
As result we have web service implemented in .NET. This web service represents some mainframe's application.
There are plans to generate similar web services in the Java.