Saturday, 03 December 2011 - Nesterovsky bros

Saturday, 03 December 2011

SQL Server 2008 with(recompile) Next

Recently, we have found and reported the bug in the SQL Server 2008 (see SQL Server 2008 with(recompile), and also Microsoft Connect).

Persons, who's responsible for the bug evaluation has closed it, as if "By Design". This strange resolution, in our opinion, says about those persons only.

Well, we shall try once more (see Microsoft Connect). We have posted another trivial demonstartion of the bug, where we show that option(recompile) is not used, which leads to table scan (nothing worse can happen for a huge table).

Saturday, 03 December 2011 15:06:44 UTC

Comments [0] -
SQL Server puzzle | Thinking aloud

Friday, 18 November 2011

SQL Server 2008 with(recompile)

Recently we have introduced some stored procedure in the production and have found that it performs incredibly slow.

Our reasoning and tests in the development environment did not manifest any problem at all.

In essence that procedure executes some SELECT and returns a status as a signle output variable. Procedure recieves several input parameters, and the SELECT statement uses with(recompile) execution hint to optimize the performance for a specific parameters.

We have analyzed the execution plan of that procedure and have found that it works as if with(recompile) hint was not specified. Without that hint SELECT failed to use index seek but rather used index scan.

What we have lately found is that the same SELECT that produces result set instead of reading result into a variable performs very well.

We think that this is a bug in SQL Server 2008 R2 (and in SQL Server 2008).

To demonstrate the problem you can run this test:

-- Setup create table dbo.Items ( Item int not null primary key ); go insert into dbo.Items select 1 union all select 2 union all select 3 union all select 4 union all select 5 go create procedure dbo.GetMaxItem ( @odd bit = null, @result int output ) as begin set nocount on; with Items as ( select * from dbo.Items where @odd is null union all select * from dbo.Items where (@odd = 1) and ((Item & 1) = 1) union all select * from dbo.Items where (@odd = 0) and ((Item & 1) = 0) ) select @result = max(Item) from Items option(recompile); end; go create procedure dbo.GetMaxItem2 ( @odd bit = null, @result int output ) as begin set nocount on; declare @results table ( Item int ); with Items as ( select * from dbo.Items where @odd is null union all select * from dbo.Items where (@odd = 1) and ((Item & 1) = 1) union all select * from dbo.Items where (@odd = 0) and ((Item & 1) = 0) ) insert into @results select max(Item) from Items option(recompile); select @result = Item from @results; end; go

Test with output into a variable:

declare @result1 int; execute dbo.GetMaxItem @odd = null, @result = @result1 output

Execution plan of dbo.GetMaxItem

Test without output directly into a variable:

declare @result2 int; execute dbo.GetMaxItem2 @odd = null, @result = @result2 output

Execution plan of dbo.GetMaxItem2

Now, you can see the difference: the first execution plan uses startup expressions, while the second optimizes execution branches, which are not really used. In our case it was crucial, as the execition time difference was minutes (and more in future) vs a split of second.

See also Microsoft Connect Entry.

Friday, 18 November 2011 14:49:50 UTC

Comments [0] -
SQL Server puzzle | Tips and tricks

Thursday, 10 November 2011

eXperanto project actual yet...

A bit history: the first release of this solution was about 9.5 years ago...

Today we've run into a strange situation. One of our clients ask us about automatic conversion of data from mainframe (that were defined as COBOL copybooks) into XML or Java/.NET objects. On our suggestion to use eXperanto, which is well known to him, he stated that he wouldn't like to use a tool of a company that is no more exists...

The situation, in our opinion, become more strange when you consider the following:

eXperanto (the design-time tool and run-time libraries for Java and .NET) were developed, well tested, and delivered by us to production already several years ago.
the client bought this set (the tool and libraries).
the set is in production yet already in another big company, and is used time to time by our company in different migration projects.
the client talks with developers of this tool and run-time libraries, and he knows about this fact.
the client uses widely open source solutions even without dedicated vendors or support warranties.

Thursday, 10 November 2011 22:14:09 UTC

Comments [2] -
.NET | Java | Thinking aloud

Friday, 28 October 2011

jQuery

It has happened so, that we have never worked with jQuery, however were aware of it.

In early 2000 we have developed a web application that contained rich javascript APIs, including UI components. Later, we were actively practicing in ASP.NET, and later in JSF.

At present, looking at jQuery more closely we regret that we have failed to start using it earlier.

Separation of business logic and presentation is remarkable when one uses JSON web services. In fact server part can be seen as a set of web services representing a business logic and a set of resources: html, styles, scripts, others. Nor ASP.NET or JSF approach such a consistent separation.

The only trouble, in our opinion, is that jQuery has no standard data binding: a way to bind JSON data to (and from) html controls. The technique that will probably be standardized is called jQuery Templates or JsViews .

Unfortunatelly after reading about this binding API, and being in love with Xslt and XQuery we just want to cry. We don't know what would be the best solution for the task, but what we see looks uncomfortable to us.

Friday, 28 October 2011 22:59:23 UTC

Comments [0] -
ASP.NET | JSF and Facelets | Thinking aloud | Tips and tricks | xslt

Sunday, 16 October 2011

Funny, about yield return in java

Incidentally, we have found one new implementation of yield return in java that is in the development stage. Sources can be found at https://github.com/peichhorn/lombok-pg/zipball/master. Just to be sure we have copied those sources at other place peichhorn-lombok-pg-0.10.0-39-g384fb7b.zip (you may search "yield" in the archive).

It's broken according to source tracker, but the funny thing is that sources, however different, still resemble our yield return implementation (Yield.jar, Yield.3.7.jar - Indigo, Yield.zip - sources) very much: variable names, error messages, algorithmic structure.

Those programmers probably have forgotten good manners: to reference a base work, at least.

Well, we generously forgive them this blunder.

P.S. our implementation, in contrast, works without bugs.

P.P.S. misunderstanding is resolved. See comments.

Sunday, 16 October 2011 13:49:33 UTC

Comments [3] -
Java | Thinking aloud

Thursday, 29 September 2011

An XPath enumerator function

A couple of weeks ago, we have suggested to introduce a enumerator function into the XPath (see [F+O30] A enumerator function):

I would like the WG to consider an addition of a function that turns a sequence into a enumeration of values.

Consider a function like this: fn:enumerator($items as item()*) as function() as item()?;

alternatively, signature could be:

fn:enumerator($items as function() as item()*) as function() as item()?;

This function receives a sequence, and returns a function item, which upon N's call shall return N's element of the original sequence. This way, a sequence of items is turned into a function providing a enumeration of items of the sequence.

As an example consider two functions:

a) t:rand($seed as xs:double) as xs:double* - a function producing a random number sequence;
b) t:work($input as element()) as element() - a function that generates output from it's input, and that needs random numbers in the course of the execution.

t:work() may contain a code like this:
let $rand := fn:enumerator(t:rand($seed)),

and later it can call $rand() to get a random numbers.

Enumerators will help to compose algorithms where one algorithm communicate with other independant algorithms, thus making code simpler. The most obvious class of enumerators are generators: ordered numbers, unique identifiers, random numbers.

Technically, function returned from fn:enumerator() is nondetermenistic, but its "side effect" is similar to a "side effect" of a function generate-id() from a newly created node (see bug #13747, and bug #13494).

The idea is inspired by a generator function, which returns a new value upon each call.

Such function can be seen as a stateful object. But our goal is to look at it in a more functional way. So, we look at the algorithm as a function that produces a sequence of output, which is pure functional; and an enumerator that allows to iterate over algorithm's output.

This way, we see the function that implements an algorithm and the function that uses it can be seen as two thread of functional programs that use messaging to communicate to each other.

Honestly, we doubt that WG will accept it, but it's interesting to watch the discussion.

Thursday, 29 September 2011 11:56:05 UTC

Comments [0] -
Thinking aloud | xslt

Wednesday, 14 September 2011

Resolution of the Saxon optimizer bug

More than month has passed since we have reported a problem to the saxon forum (see Saxon optimizer bug and Saxon 9.2 generate-id() bug).

The essence of the problem is that we have constructed argumentless function to return a unique identifiers each time function is called. To achieve the effect we have created a temporary node and returned its generate-id() value.

Such a function is nondetermenistic, as we cannot state that its result depends on arguments only. This means that engine's optimizer is not free to reorder calls to such a function. That's what happens in Saxon 9.2, and Saxon 9.3 where engine elevates function call out of cycle thus producing invalid results.

Michael Kay, the author of the Saxon engine, argued that this is "a gray area of the xslt spec":

If the spec were stricter about defining exactly when you can rely on identity-dependent operations then I would be obliged to follow it, but I think it's probably deliberate that it currently allows implementations some latitude, effectively signalling to users that they should avoid depending on this aspect of the behaviour.

He adviced to raise a bug in the w3c bugzilla to resolve the issue. In the end two related bugs have been raised:

Bug 13494 - Node uniqueness returned from XSLT function;
Bug 13747 - [XPath 3.0] Determinism of expressions returning constructed nodes.

Yesterday, the WG has resolved the issue:

The Working Group agreed that default behavior should continue to require these nodes to be constructed with unique IDs. We believe that this is the kind of thing implementations can do with annotations or declaration options, and it would be best to get implementation experience with this before standardizing.

This means that the technique we used to generate unique identifiers is correct and the behaviour is well defined.

The only problem is to wait when Saxon will fix its behaviour accordingly.

Wednesday, 14 September 2011 05:54:56 UTC

Comments [0] -
Thinking aloud | xslt

Tuesday, 06 September 2011

Entity Framework and char parameters

We're not big fans of Entity Framework, as we don't directly expose the database structure to the client program but rather through stored procedures and functions. So, EF for us is a tool to expose those stored procedures as .NET wrappers. This limited use of EF still greatly automates the data access code.

But what we have lately found is that the EF has a problem with char parameters. Namely, if you import a procedure say MyProc that accepts char(1), and then will call it through the generated wrapper, the you will see in sql profiler that char(1) parameter is passed with many trailing spaces as if it were char(8000). There isn't necessity to prove that this is highly ineffective.

We can see that the problem happens in VS 2010 designer rather than in the EF runtime, as SP's parameters are not attributed with length, see model xml (*.edmx):

<Function Name="MyProc" Schema="Data"> ... <Parameter Name="recipientType" Type="char" Mode="In" /> ... </Function>

while if we set:

<Parameter Name="recipientType" Type="char" MaxLength="1" Mode="In" />

the runtime starts working as expected. So the workaround is to fix model file manually.

Sunday, 28 August 2011 19:11:45 UTC

Comments [0] -
Announce | Java | xslt

Friday, 12 August 2011

Windows Search complications

1. query.dll vs tquery.dll

We have installed Windows Search 4 on a Windows 2003 server. The goal was to index huge compressed xml files (see Windows Search Notifications). But for some reason it did not want to index content.

No "select System.ItemUrl from SystemIndex where contains('...')" has ever returned a row.

We thought that the problem was in our protocol handler, and tried to localize it, but finally have discovered that Windows Search is not able to find anything within text files.

Registry comparision has shown that *.txt extension was indexed by the IFilter defined in the query.dll, while on the other computers, where everything worked, the implementation was in the tquery.dll.

Both libraries were present on the Windows 2003 server, so we have corrected the registry and everything has started to work.

As far as we understand query.dll is part of legacy Indexing Service, and tquery.dll is up to date implementation.

2. Search index size

We have to index a considerable amout of data. But before we can do it we have to estimate the size of index.

In the past it seems we saw somewhere a statement that search index needs a storage that's about 10% of original data for its purposes. Unfortunatelly we cannot find this estimation at present, neither we cannot find any other estimation. This complicates our planning.

To get empirical estimate we've indexed several thousands *.xml-gz files, which are gz'ed big xmls. The total size of this files is about 4.5GB. Total uncompressed size of xmls ~50GB. Xml contained about 10 millions pages of data.

According to 10% criteria we had to arrive to ~5GB search index.

But what we have discovered is that the index has grown to more than 50GB. That's very disappointing. We cannot afford such expense, as we've commited test on a tiny part of data, which increases over time.

So, the solution is to find out what's wrong, and how can it be cured, or to fulltext index only most recent subset of data.

P.S. We have tried to mark folder with search index as compressed, but it did not work.

P.P.S. We have found the reference to Windows Search 4 index size estimation. It is in Windows Search Frequently Asked Questions, see answer on "What is average size of a user's index?" question.

Friday, 12 August 2011 09:20:18 UTC

Comments [0] -
Thinking aloud | Tips and tricks | Window Search

Monday, 01 August 2011

Farewell to CME!

Yesterday (2011-07-31) we have finished the project (development and support) of the modernization of Cool:GEN code base to java for the Chicago Mercantile Exchange.

It wasn't the first such project but definitely most interesting. We have migrated and tested about 300 MB of source code. In the process of translation we have identified many bugs that were present in the original code. Thanks to languages-xom that task turned to be pure xslt.

We hope that CME's developers are pleased with results.

If you by chance is looking for Cool:GEN conversion to java, C#, or even COBOL (don't understand why people still asking for COBOL) then you can start at bphx site.

Monday, 01 August 2011 06:32:15 UTC

Comments [0] -

Wednesday, 27 July 2011

Saxon optimizer bug

An xslt code that worked in the production for several years failed unexpectedly. That's unusual, unfortunate but it happens.

We started to analyze the problem, limited the code block and recreated it in the simpe form. That's it:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:t="http://www.nesterovsky-bros.com/xslt/public" exclude-result-prefixes="t xs"> <xsl:template match="/" name="main"> <xsl:variable name="content"> <root> <xsl:for-each select="1 to 3"> <item/> </xsl:for-each> </root> </xsl:variable> <xsl:variable name="result"> <root> <xsl:for-each select="$content/root/item"> <section-ref name-ref="{t:generate-id()}.s"/>  </xsl:for-each> </root> </xsl:variable> <xsl:message select="$result"/> </xsl:template> <xsl:function name="t:generate-id" as="xs:string"> <xsl:variable name="element" as="element()"> <element/> </xsl:variable> <xsl:sequence select="generate-id($element)"/> </xsl:function> </xsl:stylesheet>

This code performs some transformation and assigns unique values to name-ref attributes. Values generated with t:generate-id() function are guaranteed to be unique, as spec claims that every node has its unique generate-id() value.

Imagine, what was our surprise to find that generated elements all have the same name-ref's. We studied code all over, and found no holes in our reasoning and implementation, so our conlusion was: it's Saxon's bug!

It's interesting enough that if we rewrite code a little (see commented part), it starts to work properly, thus we suspect Saxon's optimizer.

Well, in the course of development we have found and reported many Saxon bugs, but how come that this little beetle was hiding so long.

We've verified that the bug exists in the versions 9.2 and 9.3. Here is the bug report: Saxon 9.2 generate-id() bug.

Unfortunatelly, it's there already for three days (2011-07-25 to 2011-07-27) without any reaction. We hope this will change soon.

Wednesday, 27 July 2011 20:02:38 UTC

Comments [0] -
Tips and tricks | xslt

Friday, 22 July 2011

Track stream position while writing xml

We needed to track a stream position during creation of xml file. This is to allow random access to a huge xml file (the task is related to WindowsSearch).

This is a simplified form of the xml:

<data> <item>...</item> ... <item>...</item> </data>

The goal was to have stream position of each item element. With this in mind, we've decided to:

open a stream, and then xml writer over it;
write data into xml writer;
call Flush() method of the xml writer before measuring stream offset;

That's a code sample:

var stream = new MemoryStream(); var writer = XmlWriter.Create(stream); writer.WriteStartDocument(); writer.WriteStartElement("data"); for(var i = 0; i < 10; ++i) { writer.Flush(); Console.WriteLine("Flush offset: {0}, char: {1}", stream.Position, (char)stream.GetBuffer()[stream.Position - 1]); writer.WriteStartElement("item"); writer.WriteValue("item " + i); writer.WriteEndElement(); } writer.WriteEndElement(); writer.WriteEndDocument();

That's the output:

Flush offset: 46, char: a Flush offset: 66, char: > Flush offset: 85, char: > Flush offset: 104, char: > Flush offset: 123, char: > Flush offset: 142, char: > Flush offset: 161, char: > Flush offset: 180, char: > Flush offset: 199, char: > Flush offset: 218, char: >

Funny, isn't it?

After feeding the start tag <data>, and flushing xml writer we observe that only "<data" has been written down to the stream. Well, Flush() have never promissed anything particular about the content of the stream, so we cannot claim any violation, however we expected to see whole start tag.

Inspection of the implementation of xml writer reveals laziness during writting data down the stream. In particular start tag is closed when one starts the content. This is probably to implement empty tags: <data/>.

To do the trick we had to issue empty content, moreover, to call a particular method with particular parameters of the xml writer. So the code after the fix looks like this:

And output is:

Flush offset: 47, char: > Flush offset: 66, char: > Flush offset: 85, char: > Flush offset: 104, char: > Flush offset: 123, char: > Flush offset: 142, char: > Flush offset: 161, char: > Flush offset: 180, char: > Flush offset: 199, char: > Flush offset: 218, char: >

While this code works, we feel uneasy with it.

What's the better way to solve the task?

Update: further analysis shows that it's only possible behaviour, as after the call to write srart element, you either can write attributes, content or end of element, so writer may write either space, '>' or '/>'. The only question is why it takes WriteChars(empty, 0, 0) into account and WriteValue("") it doesn't.

Friday, 22 July 2011 21:08:36 UTC

Comments [0] -
Thinking aloud | Tips and tricks | Window Search

Navigation