TipsAndTricks - Nesterovsky bros

Introduction

We migrate code out of MF to Azure. Tool we use produces plain good functionally equivalent C# code. But it turns it's not enough!

So, what's the problem?

Converted code is very slow, especially for batch processing, where MF completes job, say in 30 minutes, while converted code finishes in 8 hours.

At this point usually someone appears and whispers in the ear:

Look, those old technologies are proven by the time. It worth to stick to old good Cobol, or better to Assembler if you want to do the real thing.

We're curious though: why is there a difference?

Turns out the issue lies in differences of network topology between MF and Azure solutions. On MF all programs, database and file storage virtually sit in a single box, thus network latency is negligible.

It's rather usual to see chatty SQL programs on MF that are doing a lot of small SQL queries.

In Azure - programs, database, file storage are different services most certainly sitting in different phisical boxes. You should be thankfull if they are co-located in a single datacenter. So, network latency immediately becomes a factor. Even if it just adds 1 millisecond per SQL roundtrip, it adds up in loops, and turns in the showstopper.

There is no easy workaround on the hardware level.

People advice to write programs differently: "Tune applications and databases for performance in Azure SQL Database".

That's a good advice for a new development but discouraging for migration done by a tool.

So, what is the way forward?

Well, there is one. While accepting weak sides of Azure we can exploit its strong sides.

Parallel refactoring

Before continuing let's consider a code demoing the problem:

  public void CreateReport(StringWriter writer)
  {
    var index = 0;

    foreach(var transaction in dataService.
      GetTransactions().
      OrderBy(item => (item.At, item.SourceAccountId)))
    {
      var sourceAccount = dataService.GetAccount(transaction.SourceAccountId);
      var targetAccount = transaction.TargetAccountId != null ?
        dataService.GetAccount(transaction.TargetAccountId) : null;

      ++index;

      if (index % 100 == 0)
      { 
        Console.WriteLine(index);
      }

      writer.WriteLine($"{index},{transaction.Id},{
        transaction.At},{transaction.Type},{transaction.Amount},{
        transaction.SourceAccountId},{sourceAccount?.Name},{
        transaction.TargetAccountId},{targetAccount?.Name}");
    }
  }

This cycle queries transactions, along with two more small queries to get source and target accounts for each transaction. Results are printed into a report.

If we assume query latency just 1 millisecond, and try to run such code for 100K transactions we easily come to 200+ seconds of execution.

Reality turns to be much worse. Program spends most of its lifecycle waiting for database results, and iterations don't advance until all work of previous iterations is complete.

We could do better even without trying to rewrite all code! Let's articulate our goals:

To make code fast.
To leave code recognizable.

The idea is to form two processing pipelines:

(a) one that processes data in parallel out of order;
(b) other that processes data serially, in original order;

Each pipeline may post sub-tasks to the other, so (a) runs its tasks in parallel unordered, while (b) runs its tasks as if everything was running serially.

So, parallel plan would be like this:

Queue parallel sub-tasks (a) for each transaction.
Parallel sub-task in (a) reads source and target accounts, and queues serial sub-task (b) passing transaction and accounts.
Serial sub-task (b) increments index, and writes report record.
Wait for all tasks to complete.

To reduce burden of task piplelines we use Dataflow (Task Parallel Library), and encapsulate everything in a small wrapper.

Consider refactored code:

  public void CreateReport(StringWriter writer)
  {
    using var parallel = new Parallel(options.Value.Parallelism); //     ⬅️ 1
    var index = 0;

    parallel.ForEachAsync( //     ⬅️ 2
      dataService.
        GetTransactions().
        OrderBy(item => (item.At, item.SourceAccountId)),
      transaction => //     ⬅️ 3
      {
        var sourceAccount = dataService.GetAccount(transaction.SourceAccountId);
        var targetAccount = transaction.TargetAccountId != null ?
          dataService.GetAccount(transaction.TargetAccountId) : null;

        parallel.PostSync(  //     ⬅️ 4
          (transaction, sourceAccount, targetAccount),  //     ⬅️ 5
          data =>
          {
            var (transaction, sourceAccount, targetAccount) = data; //     ⬅️ 6

            ++index;

            if (index % 100 == 0)
            {
              Console.WriteLine(index);
            }

            writer.WriteLine($"{index},{transaction.Id},{
              transaction.At},{transaction.Type},{transaction.Amount},{
              transaction.SourceAccountId},{sourceAccount?.Name},{
              transaction.TargetAccountId},{targetAccount?.Name}");
          });
      });
  }

Consider ⬅️ points:

We create Parallel utility class passing degree of parallelism requested.
We iterate transactions using parallel.ForEachAsync() that queues parallel sub-tasks for each transaction, and then waits until all tasks are complete.
Each parallel sub-task recieves a transaction. It may be called from a different thread.
Having recieved required accounts we queue a sub-task for synchronous execution using parallel.PostSync(), and
Pass there data collected in parallel sub-task: transaction and accounts.
We deconstruct data passed into variables, and then proceed with serial logic.

What we achieve with this refactoring:

Top level query that brings transactions is done and iterated serially.
But each iteration body is run in parallel. By default we set it up to allow up to 100 parallel executions. All those parallel sub-task do not wait on each other so their waitings do not add up.
Sync sub-tasks are queued and executed in order of their serial appearance, so increments and report records are not a subject of race conditions, nor a subject of reordering of output records.

We think that such refactored code is still recognizible.

As for performance this is what log shows:

Serial test
100
...
Execution time: 00:01:33.8152540

Parallel test
100
...
Execution time: 00:00:05.8705468

Reference

Please take a look at project to understand implementation details, and in particular Parallel class implementing API to post parallel and serial tasks, run cycles and some more.

Please continue reading on GitHub.

Friday, 26 January 2024 13:27:16 UTC

Comments [0] -
.NET | Thinking aloud | Tips and tricks

Using Bing chat as a tool.

As we wrote some time ago, we started to use AI and ML APIs in our applications, but this time we've tried to use Bing chat as a tool.

The initial task was to collect sequences of integer numbers and return a list of pairs integers where the first number is a range's start and the second one is a count of values in the range. For example, for sequence 1, 2, 3, 5, 8, 9 we'd like to get the following results: 1, 3 5, 1 8, 2

We both started to solve this issue. My brother started to play with SSMS using his deep knowledges in SQL and analytic functions, whereas me started to play with Bing chat. Looking ahead, I can state that we both solved the issue using almost the same SQL statement almost in the same time.

Here are my prompts and Bing's answers:

Prompt:: Act as database analyst and software engineer. Given the table (T) with a numeric field (N), which values are orderd ranges of integers. In a range each next value is greater than previous one on 1. When value differs from the previous one more than on 1 then in belongs to another range. Generate select statement that returns rows with two fields. The first field (R) is a number from which a range starts and the second field (C) is count elements of that range:
Answer:: Sure, I can help with that. Here's a SQL query that should do the trick. This query uses a self-join to identify the start of each range, and then groups by the start of the range to count the number of elements. ```sql SELECT start_of_range AS R, COUNT(*) AS C FROM ( SELECT N, N - ROW_NUMBER() OVER (ORDER BY N) AS start_of_range FROM T ) subquery GROUP BY start_of_range ORDER BY start_of_range; ``` In this query, `start_of_range` is calculated as the difference between the current value of `N` and its row number (when ordered by `N`). This difference will be the same for all `N` in the same range. Then we group by `start_of_range` to get the count of elements in each range. The outer query selects the start of each range (`R`) and the count of elements in that range (`C`).
Me, scratching the head, hmmm...
Question:: This is a bit incorrect SQL select. For example for values 1, 2, 3, 5, 8, 9 I'd like to get the following results: 1, 3 5, 1 8, 2 What a SQL select shall I use to get such results?
Answer:: I apologize for the confusion earlier. You're right, the previous query doesn't handle the scenario you've described correctly. Here's a corrected SQL query that should give you the desired output. This query uses a window function to compare each row with the previous one and identify the start of a new range. Then it groups by the start of the range to count the elements. ```sql WITH ranges AS ( SELECT N, CASE WHEN N - LAG(N, 1, N) OVER (ORDER BY N) > 1 THEN 1 ELSE 0 END AS is_start_of_range FROM T ), range_groups AS ( SELECT N, SUM(is_start_of_range) OVER (ORDER BY N) AS range_group FROM ranges ) SELECT MIN(N) AS R, COUNT(*) AS C FROM range_groups GROUP BY range_group ORDER BY R; ``` In this query, `is_start_of_range` is a flag that indicates the start of a new range (when the difference between the current and previous `N` is more than 1). `range_group` is a running total of `is_start_of_range` that serves as a unique identifier for each range. The outer query then groups by `range_group` to get the start (`R`) and count (`C`) of each range. I hope this helps! Let me know if you have any other questions.

That answer more satisfied me, and I've checked the following example:


        with T as
        (
            select 10 N
            union all
            select 11 N
            union all
            select 30 N
            union all
            select 31 N
            union all
            select 32 N
            union all
            select 50 N
            union all
            select 80 N
            union all
            select 81 N
            union all
            select 82 N
            union all
            select 83 N
            union all
            select 90 N
        ),
        ranges AS (
            SELECT
                N,
                CASE
                    WHEN N - LAG(N, 1, N) OVER (ORDER BY N) > 1 THEN 1
                    ELSE 0
                END AS is_start_of_range
            FROM T
        ),
        range_groups AS (
            SELECT
                N,
                SUM(is_start_of_range) OVER (ORDER BY N) AS range_group
            FROM ranges
        )
        SELECT
            MIN(N) AS R,
            COUNT(*) AS C
        FROM range_groups
        GROUP BY range_group
        ORDER BY R;

The result was:


        R	C
        ------------
        10	2
        30	3
        50	1
        80	4
        90	1

Conclusion:
Nowdays, using AI as a tool, you may improve your performance to the expert level and, may be, study something new...

P.S. Don't fool youself that AI can replace a developer (see the first answer of the chat), but together they can be much stornger than separatelly.

P.P.S. Another interesting solution gave us our colleague. She used an additional field to avoid using analytics function, and she reached the same result:


with T as
        (
            select 10 ID, 1 N
            union all
            select 11 ID, 1 N
            union all
            select 30 ID, 1 N
            union all
            select 31 ID, 1 N
            union all
            select 32 ID, 1 N
            union all
            select 50 ID, 1 N
            union all
            select 80 ID, 1 N
            union all
            select 81 ID, 1 N
            union all
            select 82 ID, 1 N
            union all
            select 83 ID, 1 N
            union all
            select 90 ID, 1 N
        ),
Groups AS (
    SELECT
        ID,
        N,
        ROW_NUMBER() OVER (ORDER BY ID) - ID AS GroupNumber
    FROM
        T
)
SELECT
    MIN(ID) AS R,
    SUM(N) AS C
FROM
    Groups
GROUP BY
    GroupNumber
ORDER BY
    StartID;

Monday, 01 January 2024 14:02:01 UTC

Comments [0] -
AI | SQL Server puzzle | Thinking aloud | Tips and tricks

Collecting public data from Internet

Earlier we wrote that recently we've gotten few tasks related to Machine Learning. The prerequisites to such task is to collect and prepare the input data. Usually the required data is scattered across public sites, some of them are in plain text format (or close to it), but others are accessible as output of public applications. To obtain the required data for such sites you have to navigate thourgh pages, which often requires keeping state between navigations.

In order to implement this task you need some kind of crawler/scraper of the websites. Fortunately, there are a lot of frameworks, libraries and tools in C# (and in other languages too) that allow to do this (visit this or this site to see most popular of them), for example:

ScrapySharp
ABot
HtmlAgilityPack
DotnetSpider

There are pros and cons of using these libraries. Most crucial cons is a lack of support of rich UI based on heavy client-side scripts and client-side state support. Since not all such libraries implement fully browser emulation and even more, some of them do not support Javascript execution. So, they suit for gathering information from simple web pages, but no library allows to easy navigate to some page of a web application that keeps rich client-side state. Even best of them, like ScrapySharp, require heavy programming to achieve the result.

Then, suddenly, we've recalled that already for several years we're using Selenium and web drivers to automate web tests for AngularJS/Angular projects. After short discussion we came to conclusion that there is no big difference between testing web application and collecting data, since one of testing stages is collecting of actual results (data) from the tested page, and usually our tests consist of chains of actions performed on consequently visited pages.

This way we came to idea to use WebDriver API implemented by Selenium project. There are implementations of this API in different languages, and in C# too.

Using WebDriver we easily implement cumbersome navigation of a complex web application and can collect required data. Moreover, it allows to run WebDriver in screenless mode. Some of its features allow to create a snapshots of virtual screen and store HTML sources that would resulted of Javascript execution. These features are very useful during run-time troubleshooting. To create a complex web application navigation we need only a bit more knowledge than usual web application's user - we need to identify somehow pages' elements for example by CSS selectors or by id of HTML elements (as we do this for tests). All the rest, like coockies, view state (if any), value of hidden fields, some Javascript events will be transparent in this case.

Although one may say that approach with Selenium is rather fat, it's ought to mention that it is rather scalable. You may either to run several threads with different WebDriver instances in each thread or run several processes simultaneously.

However, beside pros there are cons in the solution with Selenium. They will appear when you'll decide to publish it, e.g. to Azure environment. Take a note that approach with Selenium requires a browser on the server, there is also a problem with Azure itself, as it's Microsoft's platform and Selenium is a product of their main competitor Google... So, some issues aren't techincals. The only possible solution is to use PaaS approach instead of SaaS, but in this case you have to support everything by yourself...

The other problem is that if your application will implement rather aggressive crawling, so either servers where you gather data or your own host might ban it. So, be gentle, play nice, and implement delays between requests.

Also, take into account that when you're implementing any crawler some problems may appear on law level, since not all web sites allow pull anything you want. Many sites use terms & conditions that defines rules for the site users (that you cralwer should follow to), otherwise legal actions may be used against them (or their owners in case of crawler). There is very interesting article that describes many pitfalls when you implement your own crawler.

To summarize everything we told early, the Selenium project could be used in many scenarios, and one of them is to create a powerful crawler.

Friday, 01 January 2021 14:34:37 UTC

Comments [0] -
ML.NET | Thinking aloud | Tips and tricks

ML.NET tutorial

Eventually we've started to deal with tasks that required machine learning. Thus, the good tutorial for ML.NET was required and we had found this one that goes along with good simple codesamples. Thanks to Jeff Prosise. Hope this may be helpfull to you too.

Wednesday, 16 December 2020 11:38:57 UTC

Comments [0] -
.NET | ML.NET | Tips and tricks

Groups in SQL

Recently our colleague turned to us and asked to help to deal with some complex query.

It has turned out that the complex part was to understand what he wants to achieve.

After listening to him we have forumulated the task in our words and have confirmed that that is what he wants.

So, that's the task in our formulation:

Assume you have events.
Each event acts upon one or more accounts.
Find all events that act on the same set of accounts.
Note we deal with mutiple millions of events and accounts.

Data is defined like this:

create table dbo.Event
(
  EventID bigint not null,
  AccountID varchar(18) not null,
  primary key(EventID, AccountID)
);

Requested query turned out to be very simple, yet, not as simple as one would think to account big amout of data:

with D as
(
  select * from dbo.Event
),
S as
(
  select
    EventID,
    count(*) Items,
    checksum_agg(checksum(AccountID)) Hash
  from
    D
  group by
    EventID
)
select
  S1.EventID, S2.EventID
from
  S S1
  inner join
  S S2
  on
    S1.EventID < S2.EventID and
    S1.Items = S2.Items and
    S1.Hash = S2.Hash and
    not exists
    (
      select AccountID from D where EventID = S1.EventID
      except
      select AccountID from D where EventID = S2.EventID
    );

The idea is to:

calculate a hash derived from list of accounts for each group;
join groups with the same hash;
verify that matched groups fit perfectly.

Even simpler solution that does not use hashes is not scaleable, as it's performance is slower than O(N^2), where N - is a number of events. It has unacceptable time with N ~1e4, nothing to say about N ~1e7.

At this point our colleague was already satisfied, as he got result in couple of minutes for a task that he could not even formalize as SQL.

But we felt it could be even better.

We looked at statistics:

with D as
(
  select * from dbo.Event
),
S as
(
  select
    EventID,
    count(*) Items
  from
    D
  group by
    EventID
)
select
  Items, count(*) EventCount
from
  S
group by
  Items
order by
  EventCount desc;

and have seen that most of the events, about 90%, deal with single account, and all other with two and more (some of them act upon big number of accounts).

The nature of the dataset gave us a hint of more verbose but more fast query:

with D as
(
  select * from dbo.Event
),
S as
(
  select
    EventID,
    min(AccountID) AccountID,
    count(*) Items,
    checksum_agg(checksum(AccountID)) Hash
  from
    D
  group by
    EventID
)
select
  S1.EventID, S2.EventID
from
  S S1
  inner join
  S S2
  on
    S1.EventID < S2.EventID and
    S1.Items = 1 and
    S2.Items = 1 and
    S1.AccountID = S2.AccountID
union all
select
  S1.EventID, S2.EventID
from
  S S1
  inner join
  S S2
  on
    S1.EventID < S2.EventID and
    S1.Items > 1 and
    S2.Items > 1 and
    S1.Items = S2.Items and
    S1.Hash = S2.Hash and
    not exists
    (
      select AccountID from D where EventID = S1.EventID
      except
      select AccountID from D where EventID = S2.EventID
    );

This query produced results in twenty seconds instead of couple of minutes for a dataset with ~1e7 rows.

Wednesday, 05 August 2020 07:44:07 UTC

Comments [0] -
SQL Server puzzle | Thinking aloud | Tips and tricks

Xslt Graph and Saxon processor

Not sure what is use of our Xslt Graph exercises but what we are sure with is that it stresses different parts of Saxon Xslt engine and helps to find and resolve different bugs.

While implementing biconnected components algorithm we incidently run into internal error with Saxon 10.1 with rather simple xslt:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:array="http://www.w3.org/2005/xpath-functions/array"
  exclude-result-prefixes="xs array">

  <xsl:template match="/">
    <xsl:sequence select="
      array:fold-left
      (
        [8, 9], 
        (), 
        function($first as item(), $second as item()) 
        {  
          min(($first, $second))
        }
      )"/>
  </xsl:template>

</xsl:stylesheet>

More detail can be found at Saxon's issue tracker: Bug #4578: NullPointerException when array:fold-left|right $zero argument is an empty sequence.

Bug is promptly resolved.

Monday, 08 June 2020 05:58:32 UTC

Comments [0] -
Tips and tricks | xslt

Angular vs AngularJS. Practical comparison.

People compare these two technologies, and it seems an established fact is that Angular is evolutionally more advanced framework. We're not going to contradict, contrary, we agree with it, but it's better for an opinion to be grounded on facts that one can evaluate and verify.

Fortunately we got a chance to make such a comparison.

We support conversions of Cool:GEN (a legacy CASE tool with roots in 80th) to java or C#. In its time Cool:GEN allowed to greatly automate enterprise development using Mainframes as a server side and Mainframe terminals or Win32 GUIs as clients.

The legacy of this tool are probably hundreds of business and database models, milions of programs generated on COBOL on Mainframes and on C or Java on Windows and Linux. All this runs to this time in many econimic sectors.

Usually the client is some enterprise that invested a lot into design, development and support of their business model using Cool:GEN but now most such clients a trying not to lose this legacy but to convert it into something that goes in parallel with todays technologies.

As original technology is sound, so it is possible to map it to todays Java or C# on server, REST or SOAP as a transport, and Angular, AngularJS or some other on client. Such automatic conversion is an essense of our conversions efforts.

To understand a scope consider a typical enterprise client that has 2-3 thousand windows that are backed by 20-30 thousand programs.

Now, consider that the conversion is done. On output among other things we produce a clean java or C# web application with REST and SOAP interface, and Angular or AngularJS web client that encapsulates those 2-3 thousand windows.

Each window definition is rather small 5-10 KB in html form, but whole mass of windows takes 10-30 MB, which is not small any more.

For AngularJS we generate just those html templates, but for Angular we need to generate separate components for each window that includes typescript class, template and style.

While amout of generated resource for AngularJS stays in those 10-30 MB, generated Angular takes at least 5-10 MB more.

The next step is build.

AngularJS builds distribution that includes all used libraries and a set of templates, and it takes something like a minute from the CPU. Produced output is about 300 KB minified script and those 10-30 MB of templates (multiple files with 5-10 KB each one).

Angular (here we talk about version 9) builds distribution that includes all used libraries and a set of compiled components that are to be loaded lazily on demand. Without of the both angular builder that performs tree shaking build takes days. With tree shaking off it takes 40 minutes. This is the first notable difference. Produced output for ES2015 (latest javascript) is about 1 MB, and 15-100 KB per each compiled component. This is the second notable difference that already impacts end user rather than developer.

The third difference is in the end user experience. Though we have built equalvalent Angular and AngularJS frontend we observe load time of angular is higher. This cannot only be ascribed to bigger file sizes. It seems internal initialization also takes more time for Angular.

So, our experience in this particular test shows that Angular has more room to improve. In particular: compile time, bundle size, runtime speed and simplicity of dynamic loading (we have strong cases when template compilation is not the best approach).

Saturday, 04 April 2020 12:37:15 UTC

Comments [0] -
AngularJS | Java | Thinking aloud | Tips and tricks

Theory vs practice

We were asked to help with search service in one enterprise. We were told that their SharePoint portal does not serve their need. Main complaints were about the quality of search results.

They have decided to implement external index of SharePoint content, using Elastic, and expose custom search API within the enterprise.

We questioned their conclusions, asked why did they think Elastic will give much better results, asked did they try to figure out why SharePoint give no desired results.

Answers did not convince us though we have joined the project.

What do you think? Elastic did not help at all though they hoped very much that its query language will help to rank results in a way that matched documents will be found. After all they thought it was a problem of ranking of results.

Here we have started our analysis. We took a specific document that must be found but is never returned from search.

It turned to be well known problem, at least we dealt with closely related one in the past. There are two ingredients here:

documents that have low chances to be found are PDFs;
we live in Israel, so most texts are Hebrew, which means words are written from right to left, while some other texts from left to right. See Bi-directional text.

Traditionally PDF documents are provided in a way that only distantly resembles logical structure of original content. E.g., paragraphs of texts are often represented as unrelated runs of text lines, or as set of text runs representing single words, or independant characters. No need to say that additional complication comes from that Hebrew text are often represented visually (from left to right, as if "hello" would be stored as "olleh" and would be just printed from right to left). Another common feature of PDF are custom fonts with uncanonical mappings, or images with glyphs of letters.

You can implement these tricks in other document formats but for some reason PDF is only format we have seen that regularly and intensively uses these techniques.

At this point we have realized that it's not a fault of a search engine to find the document but the feature of the PDF to expose its text to a crawler in a form that cannot be used for search. In fact, PDF cannot search by itself in such documents, as when you try to find some text in the document opened in a pdf viewer, that you clearly see in the document, you often find nothing.

A question. What should you do in this case when no any PDF text extractor can give you correct text but text is there when you looking at document in a pdf viewer?

We decided it's time to go in the direction of image recognition. Thankfully, nowdays it's a matter of available processing resources.

Our goal was:

Have images of each PDF page. This task is immediately solved with Apache PDFBox (A Java PDF Library) - it's time to say this is java project.
Run Optical Character Recognition (OCR) over images, and get extracted texts. This is perfectly done by tesseract-ocr/tesseract, and thankfully to its java wrapper bytedeco/javacpp-presets we can immediately call this C++ API from java.

The only small nuisance of tesseract is that it does not expose table recognition info, but we can easily overcome it (we solved this task in the past), as along with each text run tesseract exposes its position.

What are results of the run of such program?

Full success! It works with high quality of recognition. Indeed, there is no any physical noise that impacts quality.
Slow speed - up to several seconds per recognition per page.
Scalable solution. Slow speed can be compensated by almost unlimited theoretical scalability.

So, what is the lesson we have taked from this experience?

Well, you should question yourself, test and verify ideas on the ground before building any theories that will lead you in completely wrong direction. After all people started to realize there was no need to claim on SharePoint, to throw it, and to spend great deal of time and money just to prove that the problem is in the different place.

A sample source code can be found at App.java

Friday, 26 July 2019 16:38:11 UTC

Comments [0] -
C++ | Java | Thinking aloud | Tips and tricks

Circular shift

In some code we needed to perform a circular shift of a part of array, like on the following picture:

It's clear what to do, especially in case of one element shift but think about "optimal" algorithm that does minimal number of data movemenents.

Here is what we have came up with in C#: algorithm doing single pass over data.

/// <summary>
/// <para>
/// Moves content of list within open range <code>[start, end)</code>. 
/// <code>to</code> should belong to that range.
/// </para>
/// <para>
/// <code>list[start + (to - start + i) mod (end - start)] = 
/// list[start + i]</code>, 
/// where i in range<code>[0, end - start)</ code >.
/// </para>
/// </summary>
/// <typeparam name="T">An element type.</typeparam>
/// <param name="list">A list to move data withing.</param>
/// <param name="start">Start position, including.</param>
/// <param name="end">End position, not incuding.</param>
/// <param name="to">Target position.</param>
public static void CircularMove<T>(IList<T> list, int start, int end, int to)
{
  var size = end - start;
  var step = to - start;
  var anchor = start;
  var pos = start;
  var item = list[pos];

  for(int i = 0; i < size; ++i)
  {
    pos += step;

    if (pos >= end)
    {
      pos -= size;
    }

    var next = list[pos];

    list[pos] = item;
    item = next;

    if (pos == anchor)
    {
      pos = ++anchor;

      if (pos >= end)
      {
        break;
      }

      item = list[pos];
    }
  }
}

Thursday, 07 June 2018 10:20:27 UTC

Comments [0] -
Tips and tricks

JAXB and java.time.* types

J2SE has become sole large that its different parts don't play well.

That is pitty but nothing to do. There is probably a lack of resources in Oracle to fill gaps.

So, to the point.

There is relatively new API to work with time defined in: package java.time. There is older API JAXB to serialize and deserialize beans to and from XML (and often to JSON). To JAXB viable, it should be able to deal with basic primitive types. The problem is that JAXB does not handle LocalDate, LocalTime, LocalDateTime, and ZonedDateTime out of the box.

We do understand that:

JAXB is older and java.time is newer API; and that
JAXB has no built-in plugin to handle new types.

But this does not help, and we should define/redefine serialization adapters using some drop in code or third party libraries. Here are these convenience adapters:

LocalDateAdapter.java

import java.time.LocalDate;
import java.time.LocalDateTime;
import java.time.ZonedDateTime;

import javax.xml.bind.annotation.adapters.XmlAdapter;

/**
 * An adapter for the bean properties of {@link LocalDate} type. 
 */
public class LocalDateAdapter extends XmlAdapter<String, LocalDate>
{
  /**
   * Converts {@link LocalDate} into a string value.
   * @param value a value to convert. Can be null.
   * @return a string value.
   */
  @Override
  public String marshal(LocalDate value)
    throws Exception
  {
    return value == null ? null : value.toString();
  }

  /**
   * Converts a string value into a {@link LocalDate}
   * instance.
   * @param value a value to convert. Can be null.
   * @return a {@link LocalDate} instance.
   */
  @Override
  public LocalDate unmarshal(String value)
    throws Exception
  {
    if (value == null)
    {
      return null;
    }
    
    int p = value.indexOf('T');
    
    if (p < 0)
    {
      return LocalDate.parse(value);
    }
    
    while(++p < value.length())
    {
      switch(value.charAt(p)) 
      {
        case '+':
        case '-':
        case 'Z':
        {
          return ZonedDateTime.parse(value).toLocalDate();
        }
      }
      
    }
    
    return LocalDateTime.parse(value).toLocalDate();
  }
}

LocalDateTimeAdapter.java

import java.time.LocalDate;
import java.time.LocalDateTime;
import java.time.LocalTime;
import java.time.ZonedDateTime;

import javax.xml.bind.annotation.adapters.XmlAdapter;

/**
 * An adapter for the bean properties of {@link LocalDateTime} type. 
 */
public class LocalDateTimeAdapter extends XmlAdapter<String, LocalDateTime>
{
  /**
   * Converts {@link LocalDateTime} into a string value.
   * @param value a value to convert. Can be null.
   * @return a string value.
   */
  @Override
  public String marshal(LocalDateTime value)
    throws Exception
  {
    return value == null ? null : value.toString();
  }

  /**
   * Converts a string value into a {@link LocalDateTime}  instance.
   * @param value a value to convert. Can be null.
   * @return a {@link LocalDateTime} instance.
   */
  @Override
  public LocalDateTime unmarshal(String value) 
    throws Exception
  {
    if (value == null)
    {
      return null;
    }
    
    int p = value.indexOf('T');
    
    if (p < 0)
    {
      return LocalDateTime.of(LocalDate.parse(value), LocalTime.MIN);
    }
    
    while(++p < value.length())
    {
      switch(value.charAt(p)) 
      {
        case '+':
        case '-':
        case 'Z':
        {
          return ZonedDateTime.parse(value).toLocalDateTime();
        }
      }
      
    }
    
    return LocalDateTime.parse(value);
  }
}

LocalTimeAdapter.java

import java.time.LocalDate;import java.time.LocalTime;

import javax.xml.bind.annotation.adapters.XmlAdapter;

/**
 * An adapter for the bean properties of {@link LocalTime} type. 
 */
public class LocalTimeAdapter extends XmlAdapter<String, LocalTime>
{
  /**
   * Converts {@link LocalTime} into string value.
   * @param value a value to convert. Can be null.
   * @return a string value
   */
  @Override
  public String marshal(LocalTime value)
    throws Exception
  {
    return value == null ? null : value.toString();
  }

  /**
   * Converts a string value into a {@link LocalTime} instance.
   * @param value a value to convert. Can be null.
   * @return a {@link LocalTime} instance.
   */
  @Override
  public LocalTime unmarshal(String value)
    throws Exception
  {
    return value == null ? null : LocalTime.parse(value);
  }
}

To make them work either field/properties or package should be annotated with JAXB xml adapters.

The simplest way is to annotate it on package level like this:

package-info.java

@XmlJavaTypeAdapters(
{
  @XmlJavaTypeAdapter(value = LocalDateAdapter.class, type = LocalDate.class),
  @XmlJavaTypeAdapter(value = LocalTimeAdapter.class, type = LocalTime.class),
  @XmlJavaTypeAdapter(value = LocalDateTimeAdapter.class, type = LocalDateTime.class)
})
package com.nesterovskyBros.demo.entities;

import java.time.LocalDate;
import java.time.LocalDateTime;
import java.time.LocalTime;

import javax.xml.bind.annotation.adapters.XmlJavaTypeAdapter;
import javax.xml.bind.annotation.adapters.XmlJavaTypeAdapters;

Wednesday, 24 January 2018 06:46:41 UTC

Comments [0] -
Java | Tips and tricks

Ternary expression type in java.

We've run into following java function that models some variation of NULL arithmetic:

public static Long add(Long value1, Long value2)
{
  return value1 == null ? value2 : value2 == null ? value1 : value1 + value2;
}

When this function runs the outcome is different from what we have expected.

Here is a quiz:

What is outcome of add(1, 2):
1. 3;
2. null;
3. other.
What is outcome of add(3, null):
1. 3;
2. null;
3. other.
What is outcome of add(null, 4):
1. null;
2. 4;
3. other.
What is outcome of add(null, null):
1. null;
2. 0;
3. other.

Our assumptions were:

add(1, 2) == 3;
add(3, null) == 3;
add(null, 4) == 4;
add(null, null) == null;

Java works differently:

add(1, 2) == 3;
add(3, null) throws NullPointerException;
add(null, 4) throws NullPointerException;
add(null, null) throws NullPointerException;

The problem is with compile time type of ternary ?: operator. Compiler decides it's long, while we intuitively expected Long. Java casts null to long (which results into NPE), and then to Long.

Correct code would be:

public static Long add(Long value1, Long value2)
{
  if (value1 == null)
  {
    return value2;
  }
  else if (value2 == null)
  {
    return value;
  }
  else
  {
    return value1 + value2;
  }
}

This version does not cast anything to long, and works as we originally expected.

Honestly, we're a bit anexious about this subtle difference of if-then-else and ?: operator.

Wednesday, 22 November 2017 12:00:23 UTC

Comments [0] -
Java | Tips and tricks

group-combinator

Our recent task required us to find all sets of not intersecting rectangles for a rectangle list.

At first glance it did not look like a trivial task. Just consider that for a list of N rectangles you can form 2^N different subsets. So, even result list, theoretically, can be enormous.

Fortunately, we knew that our result will be manageable in size. But nevertheless, suppose you have a list of couple of hundred rectangles, how would you enumerate all different sets of rectangles?

By the way, this task sounds the same as one of a Google interview's question. So, you may try to solve it by yourself before to check our solution.

We didn't even dare to think of brute-force solution: to enumerate all sets and then check each one whether it fits our needs.

Instead we used induction:

Suppose S(N) - is an solution for our task for N rectangles R(n), where S(N) is a set of sets of rectangles;
Then solution for S(N+1) will contain whole S(N), R(N+1) - a set consisting of single rectangle, and some sets of rectangles from S(N) combinded with R(N+1) provided they fit the condition;
S(0) - is an empty set.

The algorithm was implemented in java, and at first it was using Streaming and recursion.

Then we have figured out that we can use Stream.reduce or Stream.collect to implement the same algorithm. That second implementation was a little bit longer but probably faster, and besides it used standard idioms.

But then at last step we reformulated the algorithms in terms of Collections.

Though the final implementation is the least similar to original induction algorithm, it's straightforward and definitely fastest among all implementations we tried.

So, here is the code:

/**
 * For a sequence of items builds a list of matching groups.
 * @param identity an identity instance used for the group.
 * @param items original sequence of items.
 * @param matcher a group matcher of item against a group.
 * @param combiner creates a new group from a group (optional) and an item.
 * @return a list of matching groups.
 */
public static <T, G> List<G> matchingGroups(
  G identity,
  Iterable<T> items, 
  BiPredicate<G, T> matcher,
  BiFunction<G, T, G> combiner)
{
  ArrayList<G> result = new ArrayList<>();
  
  for(T item: items)
  {
    int size = result.size();
    
    result.add(combiner.apply(identity, item));
   
    for(int i = 0; i < size; ++i)
    {
      G group = result.get(i);
      
      if (matcher.test(group, item))
      {
        result.add(combiner.apply(group, item));
      }
    }
  }
    
  return result;
}

The sample project on GitHub contains implementation and a tests of this algorithm.

Thursday, 10 March 2016 11:31:48 UTC

Comments [0] -
Java | Tips and tricks

BidiVisualConverter

It's very old theme...

Many years ago we have defined a .NET wrapper around Windows Uniscribe API.

Uniscribe API is used to render bidirectional languages like Hebrew, so it's important mainly here in Israel.

Once in a while we get request from people to give that API, so we published it on GitHub at https://github.com/nesterovsky-bros/BidiVisualConverter.

You're welcome to use it!

Thursday, 14 January 2016 14:19:54 UTC

Comments [2] -
.NET | Announce | Tips and tricks

Marshalling default value with JAXB

We have solved this problem years ago, but have run into it once again.

So, we shall log the solution here.

The problem: to minify payload of the JAXB serialized beans.

Java beans have many properties most of them contains default values: zero ints, empty strings, and so on.

JAXB never tries to omit default value from marshalled xml, the only thing it can remove from output is null values. So, our approach is to define xml adapter to map default values to nulls.

Here we refer to the StackOverflow question: Prevent writing default attribute values JAXB, and to our answer.

Though it's not as terse as one would wish, one can create XmlAdapters to avoid marshalling the default values.

The use case is like this:

@XmlRootElement(name = "FIELD")
public class TestLayoutNode
{
  @XmlAttribute(name = "num")
  @XmlJavaTypeAdapter(value = IntegerZero.class, type = int.class)
  public int number;

  @XmlAttribute(name = "str")
  @XmlJavaTypeAdapter(StringDefault.class)
  public String str = "default";
}

And here are adapters.

IntegerZero:

public class IntegerZero extends DefaultValue<Integer>
{
  public Integer defaultValue() { return 0; }
}

StringDefault:

public class StringDefault extends DefaultValue<String>
{
  public String defaultValue() { return "default"; }
}

DefaultValueAdapter:

public class DefaultValue<T> extends XmlAdapter<T, T>
{
  public T defaultValue() { return null; }

  public T marshal(T value) throws Exception
  {
    return (value == null) || value.equals(defaultValue()) ? null : value;
  }

  public T unmarshal(T value) throws Exception
  {
    return value;
  }
}

With small number of different default values this approach works well.

Thursday, 06 August 2015 20:01:23 UTC

Comments [0] -
Java | Tips and tricks

Database async trap

Though ADO.NET and other ORM framworks like EntityFramework and Dapper support async pattern, you should remember that database drivers (at least all we know about) do not support concurrent db commands running against a single connection.

To see what we mean consider a bug we have recently identified. Consider a code:

await Task.WhenAll(
  newImages.
    Select(
      async image =>
      {
        // Load data from url.
        image.Content = await HttpUtils.ReadData(image.Url);

        // Insert image into the database.
        image.ImageID = await context.InsertImage(image);
      }));

The code runs multiple tasks to read images, and to write them into a database.

Framework decides to run all these tasks in parallel. HttpUtils.ReadData() has no problem with parallel execution, while context.InsertImage() does not run well in parallel, and is a subject of race conditions.

To workaround the problem we had to use async variant of a critical section. So the fixed code looks like this:

using(var semaphore = new SemaphoreSlim(1))
{
  await Task.WhenAll(
    newImages.
      Select(
        async image =>
        {
          // Load data from url.
          image.Content = await HttpUtils.ReadData(image.Url);

          await semaphore.WaitAsync();

          try
          {
            // Insert image into the database.
            image.ImageID = await context.InsertImage(image);
          }
          finally
          {
            semaphore.Release();
          }
        }));
}

So, in the async world we still should care about race conditions.

Monday, 27 July 2015 06:44:45 UTC

Comments [0] -
.NET | Thinking aloud | Tips and tricks

ui-eval angularjs directive

In our angularjs projects we are often dealing with existing models that do not always fit to angularjs expectations.

Here is an example.

There is a model consisting of two arrays: for data, and for associated data. How to create an ng-repeat that displays data from both sources?

Consider a test controller (see a github sources, and a rawgit working sample):

model.controller(
  "Test",
  function()
  {
    this.records =
    [
      { name: "record 1", state: "Draft" },
      { name: "record 2", state: "Public" },
      { name: "record 3", state: "Disabled" },
      { name: "record 4", state: "Public" },
      { name: "record 5", state: "Public" }
    ];

    this.more =
    [
      { value: 1, selected: true, visible: true },
      { value: 2, selected: false, visible: true },
      { value: 3, selected: true, visible: true },
      { value: 4, selected: false, visible: false },
      { value: 5, selected: false, visible: true }
    ];

    this.delete = function(index)
    {
      this.records.splice(index, 1);
      this.more.splice(index, 1);
    };
  });

Basically there are three approaches here:

Change model.
Adapt model to a single collection.
ng-repeat over first array and access the second array using $index scope variable.

We argued like this:

It is often not an option to change a model, as it's how business data are described.
A model adaptation when we build a single collection from original two collections, and synchronize it back (if required) may unnecessary complicate things.
Thus let's get associated items by $index variable.

This is an example of ng-repeat use:

<table border="1">
  <tr>
    <th>[x]</th>
    <th>Name</th>
    <th>Value</th>
    <th>State</th>
    <th>Actions</th>
  </tr>
  <tr ng-repeat="item in test.records track by $index"
    ng-if="test.more[$index].visible">
    <td>    
      <input type="checkbox" ng-model="test.more[$index].selected"/>
    </td>
    <td>{{item.name}}</td>
    <td>{{test.more[$index].value}}</td>
    <td>{{item.state}}</td>
    <td>
      <a href="#" ng-click="test.delete($index)">Delete</a>
    </td>
  </tr>
</table>

Look at how associated data is accessed: test.more[$index]... Our goal was to optimize that repeating parts, so we looked at ng-init directive.

Though docs warn about its use: "the only appropriate use of ngInit is for aliasing special properties of ngRepeat", we thought that our use of ng-init is rather close to what docs state, so we tried the following:

...
<tr ng-repeat="item in test.records track by $index" 
  ng-init="more = test.more[$index]" 
  ng-if="more.visible">
  <td>    
    <input type="checkbox" ng-model="more.selected"/>
  </td>
  <td>{{item.name}}</td>
  <td>{{more.value}}</td>
  <td>{{item.state}}</td>
  <td>
    <a href="#" ng-click="test.delete($index)">Delete</a>
  </td>
</tr>
...

This code just does not work, as it shows empty table, as if ng-if is always evaluated to false. From docs we found the reason:

the priority of the directive ng-if is higher than the prirority of the ng-init, and besides ng-if is a terminal directive;
as result ng-if directive is bound, and ng-init is not;
when ng-if is evaluated no $scope.more is defined, so more.visible is evaluated to false.

To workaround ng-init/ng-if problem we refactored ng-if as ng-if-start/ng-if-end:

...
<tr ng-repeat="item in test.records track by $index" 
  ng-init="more = test.more[$index]">
  <td ng-if-start="more.visible">
    <input type="checkbox" ng-model="more.selected"/>
  </td>
  <td>{{item.name}}</td>
  <td>{{more.value}}</td>
  <td>{{item.state}}</td>
  <td ng-if-end>
    <a href="#" ng-click="test.delete($index)">Delete</a>
  </td>
</tr>
...

This code works much better and shows a correct content. But then click "Delete" for a row with Name "record 2" and you will find that updated table is out of sync for all data that come from test.more array.

So, why the data goes out of sync? The reason is in the way how the ng-init is implemented: its expression is evaluated just once at directive's pre-link phase. So, the value of $scope.more will persist for the whole ng-init's life cycle, and it does not matter that test.mode[$index] may have changed at some point.

At this point we have decided to introduce a small directive named ui-eval that will act in a way similar to ng-init but that:

will run before ng-if;
will be re-evaluated when it's value is changed.

This is it:

module.directive(
  "uiEval",
  function()
  {
    var directive =
    {
      restrict: 'A',
      priority: 700,
      link:
      {
        pre: function(scope, element, attr)
        {
          scope.$watch(attr["uiEval"]);
        }
      }
    };

    return directive;
  });

The ui-eval version of the markup is:

...
<tr ng-repeat="item in test.records track by $index" 
  ui-eval="more = test.more[$index]" 
  ng-if="more.visible">
  <td>    
    <input type="checkbox" ng-model="more.selected"/>
  </td>
  <td>{{item.name}}</td>
  <td>{{more.value}}</td>
  <td>{{item.state}}</td>
  <td>
    <a href="#" ng-click="test.delete($index)">Delete</a>
  </td>
</tr>
...

It works as expected both during initial rendering and when model is updated.

We consider ui-eval is a "better" ng-init as it solves ng-init's silent limitations. On the other hand it should not try to evaluate any complex logic, as it can be often re-evaluated, so its use case is to alias a sub-expression. It can be used in any context and is not limited to items of ng-repeat.

Source code can be found at github, and a working sample at rawgit.

Sunday, 07 June 2015 11:46:11 UTC

Comments [0] -
AngularJS | javascript | Tips and tricks

Intercept angularjs resource request

Stackoverfow shows that people are searching How to intercept $resource requests.

Recently we have written about the way to cancel angularjs $resource requests (see Cancel angularjs resource request). Here we apply the same technique to intercept resource request.

Consider a sample (nesterovsky-bros/angularjs-api/master/angularjs/transform-request.html):

<!DOCTYPE html> <html> <head> <meta charset="utf-8" /> <title>Intercept resource request</title> <style type="text/css">.ng-cloak { display: none; }</style> <script src="angular.js"></script> <script src="angular-resource.js"></script> <script> angular.module("app", ["ngResource"]). factory( "services", ["$resource", function ($resource) { return $resource( "http://md5.jsontest.com/", {}, { MD5: { method: "GET", params: { text: null }, then: function (resolve) { this.params.text = "***" + this.params.text + "***"; this.then = null; resolve(this); } }, }); }]). controller( "Test", ["services", function (services) { this.value = "Sample text"; this.call = function() { this.result = services.MD5({ text: this.value }); } }]); </script> </head> <body ng-app="app" ng-controller="Test as test"> <label>Text: <input type="text" ng-model="test.value" /></label> <input type="button" value="call" ng-click="test.call()"/> <div ng-bind="test.result.md5"></div> </body> </html>

How it works.

$resource merges action definition, request params and data to build a config parameter for an $http request.
a config parameter passed into an $http request is treated as a promise like object, so it may contain then function to initialize config.
action's then function may transform request as it wishes.

The demo can be found at transform-request.html

Thursday, 07 May 2015 10:53:34 UTC

Comments [0] -
AngularJS | javascript | Tips and tricks

window.sessionStorage in AngularJS

Often we need to keep a client session state in our angularjs application.

This state should survive page refresh and navigations within the application.

Earlier we used ngStorage module but lately have changed our opinion, as we think it's over-engineered and is too heavy at runtime.

We have replaced it with a simple service that synchronizes sessionStorage once during initialization, and once before page unload.

Look at an example (session.html):

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8" />
  <title>Session</title>
  <style type="text/css">
    .ng-cloak { display: none; }
  </style>
  <script src="angular.js"></script>
  <script>
angular.module("app", []).
factory(
  "session",
  ["$window", function($window)
  {
    var session =
      angular.fromJson($window.sessionStorage.getItem("app")) || {};

    $window.addEventListener(
      "beforeunload",
      function()
      {
        $window.sessionStorage.setItem("app", angular.toJson(session));
      })

    return session;
  }]).
controller(
  "Test",
  ["session",
  function(session)
  {
    this.state = session;
  }]);
  </script>
</head>
<body ng-app="app" ng-controller="Test as test">
  <input type="text" ng-model="test.state.value"/>
  <a href="session.html?p=1">Page 1</a>
  <a href="session.html?p=2">Page 2</a>
</body>
</html>

Source can be found at nesterovsky-bros/angularjs-api/services/session.html.

Monday, 26 January 2015 08:46:36 UTC

Comments [4] -
AngularJS | javascript | Tips and tricks

Dependency Injection in ASP.NET Web API 2

Earlier this year Mike Wasson has published a post: "Dependency Injection in ASP.NET Web API 2" that describes Web API's approach to the Dependency Injection design pattern.

In short it goes like this:

Web API provides a primary integration point through HttpConfiguration.DependencyResolver property, and tries to obtain many services through this resolver;
Web API suggests to use your favorite Dependecy Injection library through the integration point. Author lists following libraries: Unity (by Microsoft), Castle Windsor, Spring.Net, Autofac, Ninject, and StructureMap.

The Unity Container (Unity) is a lightweight, extensible dependency injection container. There are Nugets both for Unity library and for Web API integration.

Now to the point of this post.

Unity defines a hierarchy of injection scopes. In Web API they are usually mapped to application and request scopes. This way a developer can inject application singletons, create request level, or transient objects.

Everything looks reasonable. The only problem we have found is that there is no way you to inject Web API objects like HttpConfiguration, HttpControllerContext or request's CancellationToken, as they are never registered for injection.

To workaround this we have created a small class called UnityControllerActivator that perfroms required registration:

using System; using System.Net.Http; using System.Threading; using System.Threading.Tasks; using System.Web.Http.Controllers; using System.Web.Http.Dispatcher; using Microsoft.Practices.Unity; /// <summary> /// Unity controller activator. /// </summary> public class UnityControllerActivator: IHttpControllerActivator { /// <summary> /// Creates an UnityControllerActivator instance. /// </summary> /// <param name="activator">Base activator.</param> public UnityControllerActivator(IHttpControllerActivator activator) { if (activator == null) { throw new ArgumentException("activator"); } this.activator = activator; } /// <summary> /// Creates a controller wrapper. /// </summary> /// <param name="request">A http request.</param> /// <param name="controllerDescriptor">Controller descriptor.</param> /// <param name="controllerType">Controller type.</param> /// <returns>A controller wrapper.</returns> public IHttpController Create( HttpRequestMessage request, HttpControllerDescriptor controllerDescriptor, Type controllerType) { return new Controller { activator = activator, controllerType = controllerType }; } /// <summary> /// Base controller activator. /// </summary> private readonly IHttpControllerActivator activator; /// <summary> /// A controller wrapper. /// </summary> private class Controller: IHttpController, IDisposable { /// <summary> /// Base controller activator. /// </summary> public IHttpControllerActivator activator; /// <summary> /// Controller type. /// </summary> public Type controllerType; /// <summary> /// A controller instance. /// </summary> public IHttpController controller; /// <summary> /// Disposes controller. /// </summary> public void Dispose() { var disposable = controller as IDisposable; if (disposable != null) { disposable.Dispose(); } } /// <summary> /// Executes an action. /// </summary> /// <param name="controllerContext">Controller context.</param> /// <param name="cancellationToken">Cancellation token.</param> /// <returns>Response message.</returns> public Task<HttpResponseMessage> ExecuteAsync( HttpControllerContext controllerContext, CancellationToken cancellationToken) { if (controller == null) { var request = controllerContext.Request; var container = request.GetDependencyScope(). GetService(typeof(IUnityContainer)) as IUnityContainer; if (container != null) { container.RegisterInstance<HttpControllerContext>(controllerContext); container.RegisterInstance<HttpRequestMessage>(request); container.RegisterInstance<CancellationToken>(cancellationToken); } controller = activator.Create( request, controllerContext.ControllerDescriptor, controllerType); } controllerContext.Controller = controller; return controller.ExecuteAsync(controllerContext, cancellationToken); } } }

Note on how it works.

IHttpControllerActivator is a controller factory, which Web API uses to create new controller instances using IHttpControllerActivator.Create(). Later controller's IHttpController.ExecuteAsync() is called to run the logic.

UnityControllerActivator replaces original controller activator with a wrapper that delays creation (injection) of real controller untill request objects are registered in the scope

To register this class one need to update code in the UnityWebApiActivator.cs (file added with nuget Unity.AspNet.WebApi)

public static class UnityWebApiActivator { /// <summary>Integrates Unity when the application starts.<summary> public static void Start() { var config = GlobalConfiguration.Configuration; var container = UnityConfig.GetConfiguredContainer(); container.RegisterInstance<HttpConfiguration>(config); container.RegisterInstance<IHttpControllerActivator>( new UnityControllerActivator(config.Services.GetHttpControllerActivator())); config.DependencyResolver = UnityHierarchicalDependencyResolver(container); } ... }

With this addition we have simplified the boring problem with passing of CancellationToken all around the code, as controller (and other classes) just declared a property to inject:

public class MyController: ApiController { [Dependency] public CancellationToken CancellationToken { get; set; } [Dependency] public IModelContext Model { get; set; } public async Task<IEnumerable<Products>> GetProducts(...) { ... } public async Task<IEnumerable<Customer>> GetCustomer(...) { ... } ... } ... public class ModelContext: IModelContext { [Dependency] public CancellationToken CancellationToken { get; set; } ... }

And finally to perform unit tests for controllers with Depenency Injection you can use a code like this:

using System.Threading; using System.Threading.Tasks; using System.Web.Http; using System.Web.Http.Controllers; using System.Web.Http.Dependencies; using System.Net.Http; using Microsoft.Practices.Unity; using Microsoft.Practices.Unity.WebApi; using Microsoft.VisualStudio.TestTools.UnitTesting; [TestClass] public class MyControllerTest { [ClassInitialize] public static void Initialize(TestContext context) { config = new HttpConfiguration(); Register(config); } [ClassCleanup] public static void Cleanup() { config.Dispose(); } [TestMethod] public async Task GetProducts() { var controller = CreateController<MyController>(); //... } public static T CreateController<T>(HttpRequestMessage request = null) where T: ApiController { if (request == null) { request = new HttpRequestMessage(); } request.SetConfiguration(config); var controllerContext = new HttpControllerContext() { Configuration = config, Request = request }; var scope = request.GetDependencyScope(); var container = scope.GetService(typeof(IUnityContainer)) as IUnityContainer; if (container != null) { container.RegisterInstance<HttpControllerContext>(controllerContext); container.RegisterInstance<HttpRequestMessage>(request); container.RegisterInstance<CancellationToken>(CancellationToken.None); } T controller = scope.GetService(typeof(T)) as T; controller.Configuration = config; controller.Request = request; controller.ControllerContext = controllerContext; return controller; } public static void Register(HttpConfiguration config) { config.DependencyResolver = CreateDependencyResolver(config); } public static IDependencyResolver CreateDependencyResolver(HttpConfiguration config) { var container = new UnityContainer(); container.RegisterInstance<HttpConfiguration>(config); // TODO: configure Unity contaiener. return new UnityHierarchicalDependencyResolver(container); } public static HttpConfiguration config; }

P.S. To those who think Dependency Injection is an universal tool, please read the article: Dependency Injection is Evil. :-)

Monday, 15 December 2014 20:24:49 UTC

Comments [2] -
.NET | ASP.NET | Tips and tricks

Single Page Apps with AngularJS and base url error

Earlier this year Mike Wasson has published a post: "Dependency Injection in ASP.NET Web API 2" that describes Web API's approach to the Dependency Injection design pattern.

In short it goes like this:

Web API provides a primary integration point through HttpConfiguration.DependencyResolver property, and tries to obtain many services through this resolver;
Web API suggests to use your favorite Dependecy Injection library through the integration point. Author lists following libraries: Unity (by Microsoft), Castle Windsor, Spring.Net, Autofac, Ninject, and StructureMap.

The Unity Container (Unity) is a lightweight, extensible dependency injection container. There are Nugets both for Unity library and for Web API integration.

Now to the point of this post.

To workaround this we have created a small class called UnityControllerActivator that perfroms required registration:

using System; using System.Net.Http; using System.Threading; using System.Threading.Tasks; using System.Web.Http.Controllers; using System.Web.Http.Dispatcher; using Microsoft.Practices.Unity; /// <summary> /// Unity controller activator. /// </summary> public class UnityControllerActivator: IHttpControllerActivator { /// <summary> /// Creates an UnityControllerActivator instance. /// </summary> /// <param name="activator">Base activator.</param> public UnityControllerActivator(IHttpControllerActivator activator) { if (activator == null) { throw new ArgumentException("activator"); } this.activator = activator; } /// <summary> /// Creates a controller wrapper. /// </summary> /// <param name="request">A http request.</param> /// <param name="controllerDescriptor">Controller descriptor.</param> /// <param name="controllerType">Controller type.</param> /// <returns>A controller wrapper.</returns> public IHttpController Create( HttpRequestMessage request, HttpControllerDescriptor controllerDescriptor, Type controllerType) { return new Controller { activator = activator, controllerType = controllerType }; } /// <summary> /// Base controller activator. /// </summary> private readonly IHttpControllerActivator activator; /// <summary> /// A controller wrapper. /// </summary> private class Controller: IHttpController, IDisposable { /// <summary> /// Base controller activator. /// </summary> public IHttpControllerActivator activator; /// <summary> /// Controller type. /// </summary> public Type controllerType; /// <summary> /// A controller instance. /// </summary> public IHttpController controller; /// <summary> /// Disposes controller. /// </summary> public void Dispose() { var disposable = controller as IDisposable; if (disposable != null) { disposable.Dispose(); } } /// <summary> /// Executes an action. /// </summary> /// <param name="controllerContext">Controller context.</param> /// <param name="cancellationToken">Cancellation token.</param> /// <returns>Response message.</returns> public Task<HttpResponseMessage> ExecuteAsync( HttpControllerContext controllerContext, CancellationToken cancellationToken) { if (controller == null) { var request = controllerContext.Request; var container = request.GetDependencyScope(). GetServices(typeof(IUnityContainer)) as IUnityContainer; if (container != null) { container.RegisterInstance<HttpControllerContext>(controllerContext); container.RegisterInstance<HttpRequestMessage>(request); container.RegisterInstance<CancellationToken>(cancellationToken); } controller = activator.Create( request, controllerContext.ControllerDescriptor, controllerType); } controllerContext.Controller = controller; return controller.ExecuteAsync(controllerContext, cancellationToken); } } }

Note on how it works.

IHttpControllerActivator is a controller factory, which Web API uses to create new controller instances using IHttpControllerActivator.Create(). Later controller's IHttpController.ExecuteAsync() is called to run the logic.

UnityControllerActivator replaces original controller activator with a wrapper that delays creation (injection) of real controller untill request objects are registered in the scope

To register this class one need to update code in the UnityWebApiActivator.cs (file added with nuget Unity.AspNet.WebApi)

public static class UnityWebApiActivator { /// <summary>Integrates Unity when the application starts.<summary> public static void Start() { var config = GlobalConfiguration.Configuration; var container = UnityConfig.GetConfiguredContainer(); container.RegisterInstance<HttpConfiguration>(config); container.RegisterInstance<IHttpControllerActivator>( new UnityControllerActivator(config.Services.GetHttpControllerActivator())); config.DependencyResolver = UnityHierarchicalDependencyResolver(container); } ... }

With this addition we have simplified the boring problem with passing of CancellationToken all around the code, as controller (and other classes) just declared a property to inject:

public class MyController: ApiController { [Dependency] public CancellationToken CancellationToken { get; set; } [Dependency] public IModelContext Model { get; set; } public async Task<IEnumerable<Products>> GetProducts(...) { ... } public async Task<IEnumerable<Customer>> GetCustomer(...) { ... } ... } ... public class ModelContext: IModelContext { [Dependency] public CancellationToken CancellationToken { get; set; } ... }

And finally to perform unit tests for controllers with Depenency Injection you can use a code like this:

using System.Threading; using System.Threading.Tasks; using System.Web.Http; using System.Web.Http.Controllers; using System.Web.Http.Dependencies; using System.Net.Http; using Microsoft.Practices.Unity; using Microsoft.Practices.Unity.WebApi; using Microsoft.VisualStudio.TestTools.UnitTesting; [TestClass] public class MyControllerTest { [ClassInitialize] public static void Initialize(TestContext context) { config = new HttpConfiguration(); Register(config); } [ClassCleanup] public static void Cleanup() { config.Dispose(); } [TestMethod] public async Task GetProducts() { var controller = CreateController<MyController>(); //... } public static T CreateController<T>(HttpRequestMessage request = null) where T: ApiController { if (request == null) { request = new HttpRequestMessage(); } request.SetConfiguration(config); var controllerContext = new HttpControllerContext() { Configuration = config, Request = request }; var scope = request.GetDependencyScope(); var container = scope.GetService(typeof(IUnityContainer)) as IUnityContainer; if (container != null) { container.RegisterInstance<HttpControllerContext>(controllerContext); container.RegisterInstance<HttpRequestMessage>(request); container.RegisterInstance<CancellationToken>(CancellationToken.None); } T controller = scope.GetService(typeof(T)) as T; controller.Configuration = config; controller.Request = request; controller.ControllerContext = controllerContext; return controller; } public static void Register(HttpConfiguration config) { config.DependencyResolver = CreateDependencyResolver(config); } public static IDependencyResolver CreateDependencyResolver(HttpConfiguration config) { var container = new UnityContainer(); container.RegisterInstance<HttpConfiguration>(config); // TODO: configure Unity contaiener. return new UnityHierarchicalDependencyResolver(container); } public static HttpConfiguration config; }

P.S. To those who think Dependency Injection is an universal tool, please read the article: Dependency Injection is Evil. :-)

Tuesday, 02 December 2014 11:56:09 UTC

Comments [0] -
AngularJS | javascript | Tips and tricks

.NET Cache API

Looking at Guava Cache we think its API is more convenient than .NET's Cache API.

Just consider:

.NET has getters, and setters of objects by string keys.
You should provide caching policy with each setter.
Guava cache operates with typed storage of Key to Value.
Provides a value factory and a caching policy in advance at cache construction.

Guava's advantange is based on an idea that homogenous storage assumes a uniform way of creation of values, and uniform caching policy. Thus a great part of logic is factored out into a cache initialization.

We have decided to create a simple adapter of the MemoryCache to achieve the same goal. Here is a result of such an experiment:

public class Cache<K, V> where V: class { /// <summary> /// A cache builder. /// </summary> public struct Builder { /// <summary> /// A memory cache. If not specified then MemoryCache.Default is used. /// </summary> public MemoryCache MemoryCache; /// <summary> /// An expiration value. /// Alternatively CachePolicyFunc can be used. /// </summary> public TimeSpan Expiration; /// <summary> /// Indicates whether to use sliding (true), or absolute (false) /// expiration. /// Alternatively CachePolicyFunc can be used. /// </summary> public bool Sliding; /// <summary> /// Optional function to get caching policy. /// Alternatively Expiration and Sliding property can be used. /// </summary> public Func<V, CacheItemPolicy> CachePolicyFunc; /// <summary> /// Optional value validator. /// </summary> public Func<V, bool> Validator; /// <summary> /// A value factory. /// Alternatively FactoryAsync can be used. /// </summary> public Func<K, V> Factory; /// <summary> /// Async value factory. /// Alternatively Factory can be used. /// </summary> public Func<K, Task<V>> FactoryAsync; /// <summary> /// A key to string converter. /// </summary> public Func<K, string> KeyFunc; /// <summary> /// Converts builder to a Cache<K, V> instance. /// </summary> /// <param name="builder">A builder to convert.</param> /// <returns>A Cache<K, V> instance.</returns> public static implicit operator Cache<K, V>(Builder builder) { return new Cache<K, V>(builder); } } /// <summary> /// Creates a cache from a cache builder. /// </summary> /// <param name="builder">A cache builder instance.</param> public Cache(Builder builder) { if ((builder.Factory == null) && (builder.FactoryAsync == null)) { throw new ArgumentException("builder.Factory"); } if (builder.MemoryCache == null) { builder.MemoryCache = MemoryCache.Default; } this.builder = builder; } /// <summary> /// Cached value by key. /// </summary> /// <param name="key">A key.</param> /// <returns>A cached value.</returns> public V this[K key] { get { return Get(key); } set { Set(key, value); } } /// <summary> /// Sets a value for a key. /// </summary> /// <param name="key">A key to set.</param> /// <param name="value">A value to set.</param> public void Set(K key, V value) { SetImpl(GetKey(key), IsValid(value) ? value : null); } /// <summary> /// Gets a value for a key. /// </summary> /// <param name="key">A key to get value for.</param> /// <returns>A value instance.</returns> public V Get(K key) { var keyValue = GetKey(key); var value = builder.MemoryCache.Get(keyValue) as V; if (!IsValid(value)) { value = CreateValue(key); SetImpl(keyValue, value); } return value; } /// <summary> /// Gets a task to return an async value. /// </summary> /// <param name="key">A key.</param> /// <returns>A cached value.</returns> public async Task<V> GetAsync(K key) { var keyValue = GetKey(key); var value = builder.MemoryCache.Get(keyValue) as V; if (!IsValid(value)) { value = await CreateValueAsync(key); SetImpl(keyValue, value); } return value; } /// <summary> /// Gets string key value for a key. /// </summary> /// <param name="key">A key.</param> /// <returns>A string key value.</returns> protected string GetKey(K key) { return builder.KeyFunc != null ? builder.KeyFunc(key) : key == null ? null : key.ToString(); } /// <summary> /// Creates a value for a key. /// </summary> /// <param name="key">A key to create value for.</param> /// <returns>A value instance.</returns> protected V CreateValue(K key) { return builder.Factory != null ? builder.Factory(key) : builder.FactoryAsync(key).Result; } /// <summary> /// Creates a task for value for a key. /// </summary> /// <param name="key">A key to create value for.</param> /// <returns>A task for a value instance.</returns> protected Task<V> CreateValueAsync(K key) { return builder.FactoryAsync != null ? builder.FactoryAsync(key) : Task.FromResult(builder.Factory(key)); } /// <summary> /// Validates the value. /// </summary> /// <param name="value">A value to validate.</param> /// <returns> /// true if value is valid for a cache, and false otherise. /// </returns> protected bool IsValid(V value) { return (value != null) && ((builder.Validator == null) || builder.Validator(value)); } /// <summary> /// Set implementation. /// </summary> /// <param name="key">A key to set value for.</param> /// <param name="value">A value to set.</param> /// <returns>A set value.</returns> private V SetImpl(string key, V value) { if (value == null) { builder.MemoryCache.Remove(key); } else { builder.MemoryCache.Set( key, value, builder.CachePolicyFunc != null ? builder.CachePolicyFunc(value) : builder.Sliding ? new CacheItemPolicy { SlidingExpiration = builder.Expiration } : new CacheItemPolicy { AbsoluteExpiration = DateTime.Now + builder.Expiration }); } return value; } /// <summary> /// Cache builder. /// </summary> private Builder builder; }

The use consists of initialization:

Cache<MyKey, MyValue> MyValues = new Cache<MyKey, MyValue>.Builder { KeyFunc = key => ...key to string value..., Factory = key => ...create a value for a key..., Expiration = new TimeSpan(0, 3, 0), Sliding = true };

and a trivial cache access:

var value = MyValues[key];

This contrasts with MemoryCache coding pattern:

MemoryCache cache = MemoryCache.Default; ...

var keyAsString = ...key to string value... var value = cache.Get(keyAsString) as MyValue; if (value == null) { value = ...create a value for a key... cache.Set(keyAsString, value, ...caching policy...); }

Monday, 28 July 2014 05:36:06 UTC

Comments [0] -
.NET | Thinking aloud | Tips and tricks

Custom error handling with Web API

In the article "Error handling in WCF based web applications" we've shown a custom error handler for RESTful service based on WCF. This time we shall do the same for Web API 2.1 service.

Web API 2.1 provides an elegant way to implementat custom error handlers/loggers, see the following article. Web API permits many error loggers followed by a single error handler for all uncaught exceptions. A default error handler knows to output an error both in XML and JSON formats depending on requested MIME type.

In our projects we use unique error reference IDs. This feature allows to an end-user to refer to any error that has happened during the application life time and pass such error ID to the technical support for further investigations. Thus, error details passed to the client-side contain an ErrorID field. An error logger generates ErrorID and passes it over to an error handler for serialization.

Let's look at our error handling implementation for a Web API application.

The first part is an implementation of IExceptionLogger interface. It assigns ErrorID and logs all errors:

/// Defines a global logger for unhandled exceptions. public class GlobalExceptionLogger : ExceptionLogger { /// Writes log record to the database synchronously. public override void Log(ExceptionLoggerContext context) { try { var request = context.Request; var exception = context.Exception; var id = LogError( request.RequestUri.ToString(), context.RequestContext == null ? null : context.RequestContext.Principal.Identity.Name, request.ToString(), exception.Message, exception.StackTrace); // associates retrieved error ID with the current exception exception.Data["NesterovskyBros:id"] = id; } catch { // logger shouldn't throw an exception!!! } } // in the real life this method may store all relevant info into a database. private long LogError( string address, string userid, string request, string message, string stackTrace) { ... } }

The second part is the implementation of IExceptionHandler:

/// Defines a global handler for unhandled exceptions. public class GlobalExceptionHandler : ExceptionHandler { /// This core method should implement custom error handling, if any. /// It determines how an exception will be serialized for client-side processing. public override void Handle(ExceptionHandlerContext context) { var requestContext = context.RequestContext; var config = requestContext.Configuration; context.Result = new ErrorResult( context.Exception, requestContext == null ? false : requestContext.IncludeErrorDetail, config.Services.GetContentNegotiator(), context.Request, config.Formatters); } /// An implementation of IHttpActionResult interface. private class ErrorResult : ExceptionResult { public ErrorResult( Exception exception, bool includeErrorDetail, IContentNegotiator negotiator, HttpRequestMessage request, IEnumerable<MediaTypeFormatter> formatters) : base(exception, includeErrorDetail, negotiator, request, formatters) { } /// Creates an HttpResponseMessage instance asynchronously. /// This method determines how a HttpResponseMessage content will look like. public override Task<HttpResponseMessage> ExecuteAsync(CancellationToken cancellationToken) { var content = new HttpError(Exception, IncludeErrorDetail); // define an additional content field with name "ErrorID" content.Add("ErrorID", Exception.Data["NesterovskyBros:id"] as long?); var result = ContentNegotiator.Negotiate(typeof(HttpError), Request, Formatters); var message = new HttpResponseMessage { RequestMessage = Request, StatusCode = result == null ? HttpStatusCode.NotAcceptable : HttpStatusCode.InternalServerError }; if (result != null) { try { // serializes the HttpError instance either to JSON or to XML // depend on requested by the client MIME type. message.Content = new ObjectContent<HttpError>( content, result.Formatter, result.MediaType); } catch { message.Dispose(); throw; } } return Task.FromResult(message); } } }

Last, but not least part of this solution is registration and configuration of the error logger/handler:

/// WebApi congiguation. public static class WebApiConfig { public static void Register(HttpConfiguration config) { ... // register the exception logger and handler config.Services.Add(typeof(IExceptionLogger), new GlobalExceptionLogger()); config.Services.Replace(typeof(IExceptionHandler), new GlobalExceptionHandler()); // set error detail policy according with value from Web.config var customErrors = (CustomErrorsSection)ConfigurationManager.GetSection("system.web/customErrors"); if (customErrors != null) { switch (customErrors.Mode) { case CustomErrorsMode.RemoteOnly: { config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.LocalOnly; break; } case CustomErrorsMode.On: { config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.Never; break; } case CustomErrorsMode.Off: { config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.Always; break; } default: { config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.Default; break; } } } } }

The client-side error handler remain almost untouched. The implementation details you may find in /Scripts/api/api.js and Scripts/controls/error.js files.

You may download the demo project here. Feel free to use this solution in your .NET projects.

Monday, 10 March 2014 10:56:24 UTC

Comments [0] -
.NET | ASP.NET | javascript | kendoui | Tips and tricks

LINQ extensions

From time to time we run into tasks that we would like to solve in LINQ style but unfortunately it either cannot be done or a solution is not efficient.

Note that by LINQ style we do not mean C# query expressions (we have a strong distaste for that syntax) but extension methods defined in System.Linq.Enumerable and other classes.

Here we quote several extension methods that are good for a general use:

1. Select with predicate. This is shorthand of items.Where(...).Select(...):

/// <summary> /// Projects each element of a sequence into a new form. /// </summary> /// <typeparam name="T">A type of elements of source sequence.</typeparam> /// <typeparam name="R">A type of elements of target sequence.</typeparam> /// <param name="source">A source sequence.</param> /// <param name="where">A predicate to filter elements.</param> /// <param name="selector">A result element selector.</param> /// <returns>A target sequence.</returns> public static IEnumerable<R> Select<T, R>( this IEnumerable<T> source, Func<T, bool> where, Func<T, R> selector) { return source.Where(where).Select(selector); }

2. Select with predicate with source element index passed both into the predicate and into the selector. This one you cannot trivially implement in LINQ:

/// <summary> /// Projects each element of a sequence into a new form. /// </summary> /// <typeparam name="T">A type of elements of source sequence.</typeparam> /// <typeparam name="R">A type of elements of target sequence.</typeparam> /// <param name="source">A source sequence.</param> /// <param name="where">A predicate to filter elements.</param> /// <param name="selector">A result element selector.</param> /// <returns>A target sequence.</returns> public static IEnumerable<R> Select<T, R>( this IEnumerable<T> source, Func<T, int, bool> where, Func<T, int, R> selector) { var index = 0; foreach(var value in source) { if (where(value, index)) { yield return selector(value, index); } ++index; } }

3. A function with output element as projection of a window of input elements. Such function can be used to get finite difference (operation opposite to a cumulative sum).

/// <summary> /// Projects a window of source elements in a source sequence into target sequence. /// Thus /// target[i] = /// selector(source[i], source[i - 1], ... source[i - window + 1]) /// </summary> /// <typeparam name="T">A type of elements of source sequence.</typeparam> /// <typeparam name="R">A type of elements of target sequence.</typeparam> /// <param name="source">A source sequence.</param> /// <param name="window">A size of window.</param> /// <param name="lookbehind"> /// Indicate whether to produce target if the number of source elements /// preceeding the current is less than the window size. /// </param> /// <param name="lookahead"> /// Indicate whether to produce target if the number of source elements /// following current is less than the window size. /// </param> /// <param name="selector"> /// A selector that derives target element. /// On input it receives: /// an array of source elements stored in round-robing fashon; /// an index of the first element; /// a number of elements in the array to count. /// </param> /// <returns>Returns a sequence of target elements.</returns> public static IEnumerable<R> Window<T, R>( this IEnumerable<T> source, int window, bool lookbehind, bool lookahead, Func<T[], int, int, R> selector) { var buffer = new T[window]; var index = 0; var count = 0; foreach(var value in source) { if (count < window) { buffer[count++] = value; if (lookbehind || (count == window)) { yield return selector(buffer, 0, count); } } else { buffer[index] = value; index = index + 1 == window ? 0 : index + 1; yield return selector(buffer, index, count); } } if (lookahead) { while(--count > 0) { index = index + 1 == window ? 0 : index + 1; yield return selector(buffer, index, count); } } }

This way a finite difference looks like this:

var diff = input.Window( 2, false, false, (buffer, index, count) => buffer[index ^ 1] - buffer[index]);

4. A specialization of Window method that returns a enumeration of windows:

/// <summary> /// Projects a window of source elements in a source sequence into a /// sequence of window arrays. /// </summary> /// <typeparam name="T">A type of elements of source sequence.</typeparam> /// <typeparam name="R">A type of elements of target sequence.</typeparam> /// <param name="source">A source sequence.</param> /// <param name="window">A size of window.</param> /// <param name="lookbehind"> /// Indicate whether to produce target if the number of source elements /// preceeding the current is less than the window size. /// </param> /// <param name="lookahead"> /// Indicate whether to produce target if the number of source elements /// following current is less than the window size. /// </param> /// <returns>Returns a sequence of windows.</returns> public static IEnumerable<T[]> Window<T>( this IEnumerable<T> source, int window, bool lookbehind, bool lookahead) { return source.Window( window, lookbehind, lookahead, (buffer, index, count) => { var result = new T[count]; for(var i = 0; i < count; ++i) { result[i] = buffer[index]; index = index + 1 == buffer.Length ? 0 : index + 1; } return result; }); }

Friday, 07 March 2014 14:11:46 UTC

Comments [3] -
.NET | Tips and tricks

Dealing with dynamic SQL in SQL Server

These are initial positions for this writing:

SQL Server allows to execute dynamic SQL.
Dynamic SQL is useful and often unavoidable, e.g. when you have to filter or order data in a way that you cannot code efficiently in advance.
Dynamic SQL has proven to be a dangerous area, as with improper use it can open hole in a security.

In general nothing stops you from building and then excuting of SQL string. Our goal, however, is to define rules that make work with dynamic SQL is more managable and verifiable.

Here we outline these rules, and then give some examples and tips.

Rule #1. Isolate dynamic SQL

Put all logic related to building of dynamic SQL into a separate function.
We usually define a separate scheme Dynamic, and define functions like Dynamic.GetSQL_XXX(params).
This makes it simple to perform code review.

Rule #2. Xml as parameters

Use xml type to pass parameters to a function that builds dynamic SQL.
In many cases dynamic SQL depends on variable number of parameters (like a list of values to check against).
Xml fits here to represent structured information.
On a client (e.g. in C# or java) you can define a class with all parameters, populate an instance and serialize it to an xml.

Rule #3. XQuery as template language

Use XQuery to define SQL template and to generate SQL tree from the input parameters.
Here is an example of such XQuery:

@data.query(' <sql> select T.* from Data.Ticket T where { for $ticketID in data/ticketID return <sql>(T.TicketID = <int>{$ticketID}</int>) and </sql> } (1 = 1) </sql>')

You can see that output is an xml with sql element to represent literal SQL, and int element to represent integer literal.

In fact whole output schema can be defined like this:

<xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="sql"/> <xs:element name="name"/> <xs:element name="string" nillable="true"/> <xs:element name="int" nillable="true"/> <xs:element name="decimal" nillable="true"/> <xs:element name="date" nillable="true"/> <xs:element name="time" nillable="true"/> <xs:element name="datetime" nillable="true"/> </xs:schema>

where sql is to represent literal content, name to represent a name, and other elements to represent different literal values.

Rule #4. Escape literals

Use function Dynamic.ToSQL(@template) to build final SQL text.
Here we quote the definition:

-- Builds a text of SQL function for an sql template. create function Dynamic.ToSQL ( -- SQL template. @template xml ) returns nvarchar(max) with returns null on null input as begin return ( select case when N.Node.exist('*[xs:boolean(@xsi:nil)]') = 1 then 'null' when N.Node.exist('self::int') = 1 then isnull(N.Node.value('xs:int(.)', 'nvarchar(max)'), '# int #') when N.Node.exist('self::string') = 1 then 'N''' + replace ( N.Node.value('.', 'nvarchar(max)'), '''', '''''' ) + '''' when N.Node.exist('self::name') = 1 then isnull ( quotename(N.Node.value('.', 'nvarchar(128)'), '['), '# name #' ) when N.Node.exist('self::datetime') = 1 then isnull ( 'convert(datetime2, ''' + N.Node.value('xs:dateTime(.)', 'nvarchar(128)') + ''', 126)', '# datetime #' ) when N.Node.exist('self::date') = 1 then isnull ( 'convert(date, ''' + N.Node.value('xs:date(.)', 'nvarchar(128)') + ''', 126)', '# date #' ) when N.Node.exist('self::time') = 1 then isnull ( 'convert(time, ''' + N.Node.value('xs:time(.)', 'nvarchar(128)') + ''', 114)', '# time #' ) when N.Node.exist('self::decimal') = 1 then isnull ( N.Node.value('xs:decimal(.)', 'nvarchar(128)'), '# decimal #' ) when N.Node.exist('self::*') = 1 then '# invalid template #' else N.Node.value('.', 'nvarchar(max)') end from @template.nodes('//sql/node()[not(self::sql)]') N(Node) for xml path(''), type ).value('.', 'nvarchar(max)'); end;

Now, we want to stress that this function plays an important role in prevention of the SQL injection, as it escapes literals from the SQL tree.

Rule #5 (optional). Collect data

Use SQL to collect additional data required to build dynamic SQL. Here is an example of how we get a Ticket by StatusID, while on input we receive a StatusName:

create function Dynamic.GetSQL_GetTicketByStatus(@data xml) returns nvarchar(max) as begin set @data = ( select @data, ( select T.StatusID from @data.nodes('/data/status') N(Node) inner join Metadata.Status T on T.StatusName = Node.value('.', 'nvarchar(128)') for xml auto, type, elements ) for xml path('') ); return Dynamic.ToSQL ( @data.query (' <sql> select T.* from Data.Ticket T where T.Status in ({ for $status in /T/StatusID return <sql><int>{$status}</int>,</sql> } null) </sql> ') ); end;

Notice code in red that collects some more data before calling XQuery.

Rule #6. Execute

The final step is to call dynamic SQL.
This is done like this:

-- build declare @sql nvarchar(max) = Dynamic.GetSQL_GetTicket(@data); -- execute execute sp_executesql @sql -- {, N'@parameter_name data_type [ OUT | OUTPUT ][ ,...n ]' } -- { , [ @param1 = ] 'value1' [ ,...n ] } with result sets ( ( TicketID int not null, CreatedAt datetime2 not null, Summary nvarchar(256) null, Status int, Severity int, DeadLineAt datetime2 null ) );

Notice that the use of dynamic SQL does not prevent static parameters.
Notice also that with result sets clause is used to specify output.

Example. Tickets system

Let's assume you're dealing with a tickets system (like Bugzilla), and you have a table Data.Ticket to describe tickets. Assume that DDL for this table is like this:

create table Data.Ticket ( TicketID bigint not null primary key, CreatedAt datetime2 not null, Summary nvarchar(128) null, Status int not null, UpdatedAt datetime2(7) not null )

Suppose you have to build C# code to search different tickets, where Entity Framework is used to access the database.
Search should be done by a range of CreatedAt, a range of UpdatedAt, Summary, or by different Status values. It should be possible to order results in different ways.

We start out solution from the C# and define classes for a request:

public enum Direction { Asc, Desc } public struct Order { public string Field { get; set; } public Direction Direction {get; set; } } public class DateRange { public DateTime? From { get; set; } // This property is to omit From element if value is null. // See rules for xml serialization. public bool FromSpecified { get { return From != null; } } public DateTime? To { get; set; } public bool ToSpecified { get { return To != null; } } } public class TicketsRequest { public DateRange CreatedAt { get; set; } public string Summary { get; set; } public DateRange UpdatedAt { get; set; } [XmlElement] public Order[] Order { get; set; } [XmlElement] public int[] Status { get; set; } }

Notice that we're going to use XmlSerializer to convert request to xml and then to pass parameter into EF's model. Here is utility method to perform such conversion:

public static string ToXmlString<T>(T value) { if (value == null) { return null; } var serializer = new XmlSerializer(typeof(T)); var builder = new StringBuilder(); var writer = XmlWriter.Create( builder, new XmlWriterSettings { OmitXmlDeclaration = true, Indent = false }); serializer.Serialize(writer, value); writer.Flush(); return builder.ToString(); }

Now we proceed to the database and define a procedure that runs the search:

-- Gets tickets. create procedure Data.GetTickets ( -- A query parameters. @params xml ) as begin set nocount on; -- This is for EF to guess type of result. if (1 = 0) begin select TicketID, CreatedAt, Summary, Status, UpdatedAt from Data.Ticket; end; declare @sql nvarchar(max) = Dynamic.GetSQL_GetTickets(@params); execute sp_executesql @sql with result sets ( ( TicketID int not null, CreatedAt datetime2 not null, Summary nvarchar(256) null, Status int, UpdatedAt datetime2 null ) ); end;

Switch back to C#, import the Data.GetTickets into the EF model, and create a search method:

public IEnumerable<Ticket> GetTickets(TicketsRequest request) { var model = new Model(); return model.GetTickets(ToXmlString(request)); }

The last ingredient is Dynamic.GetSQL_GetTickets() function.

create function Dynamic.GetSQL_GetTickets(@data xml) returns nvarchar(max) as begin return Dynamic.ToSQL ( @data.query(' <sql> select T.TicketID, T.CreatedAt, T.Summary, T.Status, T.UpdatedAt from Data.Ticket T where { for $range in */CreatedAt return ( for $date in $range/From return <sql> (T.CreatedAt >= <datetime>{$date}</datetime>) and </sql>, for $date in $range/To return <sql> (<datetime>{$date}</datetime> > T.CreatedAt) and </sql> ), for $range in */UpdatedAt return ( for $date in $range/From return <sql> (T.UpdatedAt >= <datetime>{$date}</datetime>) and </sql>, for $date in $range/To return <sql> (<datetime>{$date}</datetime> > T.UpdatedAt) and </sql> ), for $summary in */Summary return <sql> (T.Summary like <string>{$summary}</string>) and </sql>, if (*/Status) then <sql> T.Status in ({ for $status in */Status return <sql><int>{$status}</int>, </sql> } null) and </sql> else () } (1 = 1) order by { for $order in */Order [ Field = ("TicketID", "CreatedAt", "Summary", "UpdatedAt", "Status") ] return <sql> <name>{$order/Field}</name> {" desc"[$order[Direction = "Desc"]]}, </sql> } (select null) </sql> ') ); end;

SQL text from Dynamic.GetSQL_GetTickets()

Consider now SQL text produced by this function. For an input:

<TicketsRequest> <CreatedAt> <From>2014-01-01T00:00:00</From> </CreatedAt> <Summary>hello%</Summary> <Order> <Field>Status</Field> <Direction>Desc</Direction> </Order> <Status>1</Status> <Status>3</Status> </TicketsRequest>

the output is:

select T.TicketID, T.CreatedAt, T.Summary, T.Status, T.UpdatedAt from Data.Ticket T where (T.CreatedAt >= convert(datetime2, '2014-01-01T00:00:00', 126)) and (T.Summary like N'hello%') and T.Status in (1, 3, null) and (1 = 1) order by [Status] desc, (select null)

Though the text is not formatted as we would like, it's perfectly valid SQL.

Tips for building XQuery templates

What is called XQuery in SQL Server is in fact a very limited subset of XQuery 1.0. Microsoft clearly states this fact. What is trivial in XQuery is often impossible or ugly in XQuery of SQL Server.

Nevertheless XQuery in SQL Server works rather well as SQL template language. To make it most efficient, however, you should learn several tips.

Tip #1. Where clause

In template you might want to build a where clause:

<sql> select ... where { if (...) then <sql>...</sql> else () } </sql>

and it might happen that for a certain input a condition under where might collapse, and you will be left with where keyword without a real condition, which is wrong. A simple work around is to always add some true condition under ther where like this:

<sql> select ... where { if (...) then <sql>... and </sql> else () } (1 = 1) </sql>

Tip #2. "in" expression

If you want to generate "in" expression like this:

value in (item1, item2,...)

then you might find that it's much easier generate equivalent a code like this:

value in (item1, item2,..., null).

Here is a XQuery to generate such template:

value in ({ for $item in ... return <sql><int>{$item}</int>, </sql> } null) and

Tip #3. Order by

You can conclude an order by clause built from a data with a dummy expression like this:

order by { for $item in ... return <sql> <name>{$item/Field}</name> {" desc"[$item/Direction = "Desc"]}, </sql> } (select null)

Alternatively you can use first column from a clustered index.

Tip #4. Group by

In a group by clause we cannot introduce terminator expression as it was with order by, so a code is a less trivial:

{ let $items := ... return if ($items) then <sql> group by <name>{$items[1]}</name> { for $item in $items[position() > 1] return <sql>, <name>{$item}</name></sql> } </sql> else () }

In fact similar logic may work with order by.

Tip #5. Escape literals

It's crusial not to introduce SQL injection while building SQL. Thus use:

<int>{...}</int> - for literal int;
<decimal>{...}</decimal> - for literal decimal;
<string>{...}</string> - for literal string;
<datetime>{...}</datetime> - for literal datetime2;
<date>{...}</date> - for literal date;
<time>{...}</time> - for literal time;
<name>{...}</name> - for a name to quote.

Note that you can use xsi:nil, so <int xsi:nil="true"/> means null.

If you generate a field name from an input data then it worth to validate it against a list of available names.

Tip #6. Validate input.

It worth to define xml schema for an input xml, and to validate parameters against it.
This makes code more secure, and also adds a documentation.

Tip #7. Don't abuse dynamic SQL

There are not too many cases when you need a dynamic SQL. Usually SQL engine knows how to build a good execution plan. If your query contains optional conditions then you can write it a way that SQL Server can optimize, e.g.:

select * from T where ((@name is null) or (Name = @name)) and ((@date is null) or (Date = @date)) option(recompile)

Tuesday, 11 February 2014 09:48:07 UTC

Comments [0] -
SQL Server puzzle | Thinking aloud | Tips and tricks

Shared state

Consider how would you implement Style object in the HTML DOM?

These are some characteristics of that object:

It has a long list of properties, e.g. in IE 11 there are more than 300 properties over a style object.
Any specific instance usually have only several properties assigned.
Reads of properties are much more frequent than writes. In fact style often stays unchanged after initialization.
DOM contains many style instances (often thousands).
The number of distinct instances in terms of values of properties is moderate (usually dozens).

Here is how would we approached to such an object.

1. Styles are sparse objects, thus there is no point to implement plain class with all those properties, as it's wasteful.

We would rather use two techniques to keep style's state:

A dictionary of properties with their values;
An aggregation of objects, where all properies are grouped into families, each group is defined by a separate type, and a style's state is an aggregation of that groups.

A current style of an element is an aggregation of styles of ancestor element. It can either by dynamic or be fused into a single style instance.

2. Make style's state immutable, and share all these states among all style instances.

In this implementation property write turns into a state transition operation: state = set(state, property, value). Thus no state is modified but replaced with other state that corresponds to a required change.

If state is seen as a dictionary then API may look like this :

public class State<K, V> { // Gets shared dictionary for an input dictionary. public IDictionary<K, V> Get(IDictionary<K, V> dictionary); // Gets a shared dictionary for an input dictionary with key set to a value. public IDictionary<K, V> Set(IDictionary<K, V> dictionary, K key, V value); // Gets a shared dictionary for an input dictionary. public IDictionary<K, V> Remove(IDictionary<K, V> dictionary, K key); // Gets typed value. public T Get<T>(IDictionary<K, V> dictionary, K key) where T: V { V value; if ((dictionary == null) || !dictionary.TryGetValue(key, out value)) { return default(T); } return (T)value; } // Sets or removes a typed value. // dictionary can be null. // null returned if output dictionary would be empty. public IDictionary<K, V> Set<T>(IDictionary<K, V> dictionary, K key, T value) where T : V { return value == null ? Remove(dictionary, key) : Set(dictionary, key, (V)value); } }

States can be cached. Provided the cache keeps states in a weak way, no unsued state will be stored for a long time. We may use weak table of dictionary to dictionary WeakTable<Dictionary<K, V>, Dictionary<K, V>> as a storage for such a cache. All required API is described in the WeakTable and Hash Code of Dictionary posts.

3. Style can be implemented as a structure with shared state as a storage. Here is a scetch:

[Serializable] public struct Style { // All properties. public enum Property { Background, BorderColor, BorderStyle, Color, FontFamily, FontSize, // ... } public int? Background { get { return states.Get<int?>(state, Property.Background); } set { state = states.Set(state, Property.Background, value); } } public int? BorderColor { get { return states.Get<int?>(state, Property.BorderColor); } set { state = states.Set(state, Property.BorderColor, value); } } public string BorderStyle { get { return states.Get<string>(state, Property.BorderStyle); } set { state = states.Set(state, Property.BorderStyle, value); } } public int? Color { get { return states.Get<int?>(state, Property.Color); } set { state = states.Set(state, Property.Color, value); } } public string FontFamily { get { return states.Get<string>(state, Property.FontFamily); } set { state = states.Set(state, Property.FontFamily, value); } } public double? FontSize { get { return states.Get<double?>(state, Property.FontSize); } set { state = states.Set(state, Property.FontSize, value); } } // ... [OnDeserialized] private void OnDeserialized(StreamingContext context) { state = states.Get(state); } // A state. private IDictionary<Property, object> state; // A states cache. private static readonly State<Property, object> states = new State<Property, object>(); }

Note that:

default state is a null dictionary;
states are application wide shared.

The following link is our implementation of State<K, V> class: State.cs.

Here we have outlined the idea of shared state object, and how it can be applied to sparse mostly immutable objects. We used HTML style as an example of such an object. Shared state object may work in many other areas, but for it to shine its use case should fit to the task.

Wednesday, 22 January 2014 19:43:25 UTC

Comments [0] -
.NET | Thinking aloud | Tips and tricks

Hash Code of Dictionary

Dealing recently with some task (the same that inspired us to implement WeakTable), we were in a position to use a dictionary as a key in another dictionary.

What are the rules for the class to be used as key:

key should be immutable;
key should implement a GetHashCode() method;
key should implement a Equals() method.

The first requirement is usually implemented as a documentation contract like this:

As long as an object is used as a key in the Dictionary<TKey, TValue>, it must not change in any way that affects its hash value.

The third requirement about equals is trivially implemented as a method:

public bool Equals(IDictionary<K, V> x, IDictionary<K, V> y) { if (x == y) { return true; } if ((x == null) || (y == null) || (x.Count != y.Count)) { return false; } foreach(var entry in x) { V value; if (!y.TryGetValue(entry.Key, out value) || !valueComparer.Equals(entry.Value, value)) { return false; } } return true; }

But how would you implement hash code?

We argued like this.

1. Let's consider the dictionary as a sparse array of values with only populated items that correspond to key hash codes.

2. Hash code is constructed using some fair algorithm. E.g like that used in java to calculate string's hash code:

n-1 h(s) = SUM (s[i]*p^(n-1-i)) mod m, where m = 2^31 i=0

In our case:

n can be arbitrary large int value, so in fact it's 2^32;
items are enumerated in unknown order;
there is only limited set of items, so most s[i] are zeros.

As result we cannot use recurrent function to calculate a power p^k mod m. Fortunately one can build fast exponentiation arguing like this:

32/s - 1 p^k = p^ SUM 2^((s*i)*k[i]) mod m, where s some int: 1, 2, 4, 8, 16, or 32. i=0

Thus

32/s - 1 p^k = PRODUCT (p^(2^(s*i)))^k[i] mod m i=0

If s = 1 then k[i] is either 1 or 0 (a bit), and there is 32 different p^(2^i) mod m values, which can be precalculated.

On the other hand, if we select s = 8 we can write the formula as:

p^k = p^k[0] * (p^(2^8))^k[1] * (p^(2^16))^k[2] * (p^(2^24))^k[3] mod m

where k[i] is a 8 bit value (byte).

Precalculating all values p^n, (p^(2^8))^n, (p^(2^16))^n, (p^(2^24))^n for n in 0 to 255 we reach the formula with 4 multiplications and with 1024 precalculated values.

Here is the whole utility to calculate hash factors:

/// <summary> /// Hash utilities. /// </summary> public class Hash { /// <summary> /// Returns a P^value mod 2^31, where P is hash base. /// </summary> /// <param name="value">A value to get hash factor for.</param> /// <returns>A hash factor value.</returns> public static int GetHashFactor(int value) { return factors[(uint)value & 0xff] * factors[(((uint)value >> 8) & 0xff) | 0x100] * factors[(((uint)value >> 16) & 0xff) | 0x200] * factors[(((uint)value >> 24) & 0xff) | 0x300]; } /// <summary> /// Initializes hash factors. /// </summary> static Hash() { var values = new int[4 * 256]; var value = P; var current = 1; var i = 0; do { values[i++] = current; current *= value; } while(i < 256); value = current; current = 1; do { values[i++] = current; current *= value; } while(i < 512); value = current; current = 1; do { values[i++] = current; current *= value; } while(i < 768); value = current; current = 1; do { values[i++] = current; current *= value; } while(i < 1024); factors = values; } /// <summary> /// A base to calculate hash factors. /// </summary> public const int P = 1103515245; /// <summary> /// Hash factors. /// </summary> private static readonly int[] factors; }

With this API hash code for a dictionary is a trivial operation:

public int GetHashCode(IDictionary<K, V> dictionary) { if (dictionary == null) { return 0; } var result = 0; foreach(var entry in dictionary) { if ((entry.Key == null) || (entry.Value == null)) { continue; } result += Hash.GetHashFactor(keyComparer.GetHashCode(entry.Key)) * valueComparer.GetHashCode(entry.Value); } return result; }

And finally, here is a reference to a class DictionaryEqualityComparer<K, V>: IEqualityComparer<IDictionary<K, V>> that allows a dictionary to be a key in another dictionary.

Update

We have commited some tests, and have found that with suffiently "good" implementation of GetHashCode() of key or value we achieve results almost of the same quality, as the results of the algorithm we have outlined above with much simpler and straightforward algorithm like this:

public int GetHashCode(IDictionary<K, V> dictionary) { if (dictionary == null) { return 0; } var result = 0; foreach(var entry in dictionary) { if ((entry.Key == null) || (entry.Value == null)) { continue; } var k = entry.Key.GetHashCode(); var v = entry.Value.GetHashCode(); k = (k << 5) + k; v = (v << (k >> 3)) + v; result += k ^ v; //result += Hash.GetHashFactor(keyComparer.GetHashCode(entry.Key)) * // valueComparer.GetHashCode(entry.Value); } return result; }

It was worth to blog about this just to find out that we have outwitted ourselves, and finally to reach to a trivial hash code implementation for the dictionary.

Monday, 13 January 2014 20:33:31 UTC

Comments [0] -
.NET | Thinking aloud | Tips and tricks

WeakTable

Dealing recently with some task, we were in a position to use a weak dictionary in the .NET. Instinctively we assumed that it should exist somewhere in the standard library. We definitely knew that there is a WeakReference class to for a single instance. We also knew that there is WeakHashMap in java, and that it's based on java's WeakReference.

So, we were surprised to find that there is no such thing out of the box in .NET.

We have found that java's and .NET's weak references are different. In java weak references whose targets are GCed can be automatically put into a queue, which can be used to build clean up logic to remove dead keys from weak hash map. There is nothing similar in .NET, where weak reference just silently loses it's value.

Internet is full with custom implementations of weak dictionaries in .NET.

.NET 4.5 finally defines a class ConditionalWeakTable<TKey, TValue>, which solves the problem in case when you need to match keys by instance identity.

Unfortunately in our case we needed to match keys using key's GetHashCode() and Equals(). So, ConditionalWeakTable<TKey, TValue> did not directly work, but then we found a way to make it work for us.

Here is a quote from the definition:

A ConditionalWeakTable<TKey, TValue> object is a dictionary that binds a managed object, which is represented by a key, to its attached property, which is represented by a value. The object's keys are the individual instances of the TKey class to which the property is attached, and its values are the property values that are assigned to the corresponding objects.

...in the ConditionalWeakTable<TKey, TValue> class, adding a key/value pair to the table does not ensure that the key will persist, even if it can be reached directly from a value stored in the table... Instead, ConditionalWeakTable<TKey, TValue> automatically removes the key/value entry as soon as no other references to a key exist outside the table.

This property of ConditionalWeakTable<TKey, TValue> has helped us to build a way to get a notification when the key is being finalized, which is the missed ingredient in .NET's weak references.

Assume you have an instance key of type Key. To get a notification you should define a class Finalizer that will call some handler when it's finalized, and you should bind key and a finalizer instance using weak table.

The code looks like this:

public class Finalizer<K> where K: class { public static void Bind(K key, Action<K> handler) { var finalizer = table.GetValue(key, k => new Finalizer<K> { key = k }); finalizer.Handler += handler; } public static void Unbind(K key, Action<K> handler) { Finalizer finalizer; if (table.TryGetValue(key, out finalizer)) { finalizer.Handler -= handler; } } ~Finalizer() { var handler = Handler; if (handler != null) { handler(key); } } private event Action<K> Handler; private K key; private static readonly ConditionalWeakTable<K, Finalizer> table = new ConditionalWeakTable<K, Finalizer>(); } Key key = ... Finalizer.Bind(key, k => { /* clean up. */ });

Using this approach we have created a class WeakTable<K, V> modeled after ConditionalWeakTable<TKey, TValue>.

So, this is our take in the problem: WeakTable.cs.

Wednesday, 08 January 2014 21:57:16 UTC

Comments [0] -
.NET | Java | Thinking aloud | Tips and tricks

Bidi Visual Converter

Oftentimes we deal with Hebrew in .NET.

The task we face again and again is attempt to convert a Hebrew text from visual to logical representation.

The latest demand of such task was when we processed content extracted from PDF. It's turned out that PDF stores content as graphic primitives, and as result text is stored visually (often each letter is kept separately).

We solved the task more than a decade ago, by calling Uniscribe API.

The function by itself is a small wrapper around that API, so in .NET 1.0 we were using managed C++, several years later we have switched to C++/CLI.

But now after many .NET releases, and with 32/64 versions we can see that C++ is only a guest in .NET world.

To run C++ in .NET you have to install VC runtime libraries adjusted to a specific .NET version. This turns C++ support in .NET into not a trivial task.

So, we have finally decided to define C# interop for the Uniscribe API, and recreate that function in pure C#:

namespace NesterovskyBros.Bidi { /// <summary> /// An utility to convert visual string to logical. /// <summary> public static class BidiConverter { /// <summary> /// Converts visual string to logical. /// </summary> /// <param name="value">A value to convert.</param> /// <param name="rtl">A base direction.</param> /// <param name="direction"> /// true for visual to logical, and false for logical to visual. /// </param> /// <returns>Converted string.</returns> public static string Convert(string value, bool rtl, bool direction);

You can download this project from BidiVisualConverter.zip.

Friday, 27 December 2013 20:36:20 UTC

Comments [0] -
.NET | Tips and tricks

Library resources in JSF

Before to start we have to confess that afer many years of experience we sincerely dislike JSF technology, as we think it's outdated compared to html 5 + REST.

We have a JSF 2.2 application, which is configured to track session through url. In this case Session ID is stored in url and not in cookies, as there may be many sessions opened per a client.

At the same time application uses libraries that expose scripts and css resources. This resources are referred to like this:

<link rel="stylesheet" type="text/css" jsfc="h:outputStylesheet" library="css" name="library-name.css"/> <script type="text/javascript" jsfc="h:outputScript" name="library-name.js" library="scripts" target="head"></script>

At runtime this is rendered as:

<link type="text/css" rel="stylesheet" href="/App/javax.faces.resource/library-name.css.jsf;jsessionid=FC4A893330CCE12E8E20DFAFC73CDF35?ln=css" /> <script type="text/javascript" src="/App/javax.faces.resource/library-name.js.jsf;jsessionid=FC4A893330CCE12E8E20DFAFC73CDF35?ln=scripts"></script>

You can see that Session ID is a part of url path, which prevents resource caching on a client.

It's not clear whether it's what JSF spec dictates or it's Oracle's Reference Implementation detail. We're certain, however, that it's too wasteful in heavy loaded environment, so we have tried to resolve the problem.

From JSF sources we have found that h:outputStylesheet, h:outputScript, and h:outputLink all use ExternalContext.encodeResourceURL() method to build markup url.

So, here is a solution: to provide custom wrapper for the ExternalContext.

This is done in two steps:

create a factory class;
register a factory in faces-config.xml;

1. Factory is a simple class but unfortunately it's implementation specific:

package com.nesterovskyBros.jsf; import javax.faces.FacesException; import javax.faces.context.ExternalContext; import javax.faces.context.ExternalContextWrapper; import com.sun.faces.context.ExternalContextFactoryImpl; /** * {@link ExternalContextFactory} to prevent session id in resource urls. */ public class ExternalContextFactory extends ExternalContextFactoryImpl { /** * {@inheritDoc} */ @Override public ExternalContext getExternalContext( Object context, Object request, Object response) throws FacesException { final ExternalContext externalContext = super.getExternalContext(context, request, response); return new ExternalContextWrapper() { @Override public ExternalContext getWrapped() { return externalContext; } @Override public String encodeResourceURL(String url) { return shouldEncode(url) ? super.encodeResourceURL(url) : url; } private boolean shouldEncode(String url) { // Decide here whether you want to encode url. // E.g. in case of h:outputLink you may want to have session id in url, // so your decision is based on some marker (like &session=1) in url. return false; } }; } }

2. Registration is just three lines in faces-config.xml:

<factory> <external-context-factory>com.nesterovskyBros.jsf.ExternalContextFactory</external-context-factory> </factory>

After that change at runtime we have:

<link type="text/css" rel="stylesheet" href="/App/javax.faces.resource/library-name.css.jsf?ln=css" /> <script type="text/javascript" src="/App/javax.faces.resource/library-name.js.jsf?ln=scripts"></script>

Monday, 11 November 2013 13:08:53 UTC

Comments [0] -
Java | JSF and Facelets | Tips and tricks

String comparision: SQL vs COBOL

Till recently we were living in simple world of string comparisons in SQL style, and now everything has changed.

From the university years we knew that strings in SQL are compared by first trimming traling spaces, and then comparing in C style.

Well, the picture was a little more complex, as collations were involved (national, case sensivity), and as different SQL vendors implemented it differently.

Next,
we're dealing with programs converted from COBOL, which we originally thought follow SQL rules when strings are compared.

Here is where the problem has started.

Once we have found that java program has branched differently than original COBOL, and the reason was that the COBOL and java compared two strings differently:

COBOL: "A\n" < "A";
Java: "A\n" > "A"

We have looked into COBOL Language Reference and found the rules:

Operands of equal size

Characters in corresponding positions of the two operands are compared, beginning with the leftmost character and continuing through the rightmost character.

If all pairs of characters through the last pair test as equal, the operands are considered as equal.

If a pair of unequal characters is encountered, the characters are tested to determine their relative positions in the collating sequence. The operand that contains the character higher in the sequence is considered the greater operand.

Operands of unequal size

If the operands are of unequal size, the comparison is made as though the shorter operand were extended to the right with enough spaces to make the operands equal in size.

You can see that strings must not be trimmed but padded with spaces to the longer string, and only then they are compared. This subtle difference has significant impact for characters below the space.

So, here we've found that COBOL and SQL comparisons are different.

But then we have questioned how really SQL beheaves?

We've tested comparisons in SQL Server and DB2, and have seen that our understanding of SQL comparison holds. It works as if trimming spaces, and then comparing.

But again we have looked into SQL-92 definition, and that's what we see there:

8.2 <comparison predicate>
3) The comparison of two character strings is determined as follows:

a) If the length in characters of X is not equal to the length in characters of Y, then the shorter string is effectively replaced, for the purposes of comparison, with a copy of itself that has been extended to the length of the longer string by concatenation on the right of one or more pad characters, where the pad character is chosen based on CS. If CS has the NO PAD attribute, then the pad character is an implementation-dependent character different from any character in the character set of X and Y that collates less than any string under CS. Otherwise, the pad character is a <space>.

So, what we see is that SQL-92 rules are very close to COBOL rules, but then we reach the question: how come that at least SQL Server and DB2 implement string comparison differently than SQL-92 dictates?

Update: we have found that both SQL Server and DB2 have their string collation defined in a way that <space> is less than any other character. So the following is always true: '[' + char(13) + ']' > '[ ]'.

Monday, 14 October 2013 20:23:11 UTC

Comments [0] -
Java | SQL Server puzzle | Thinking aloud | Tips and tricks

KendoUI's guid function

Earlier we have written a post KendoUI's slowest function and now, we want to point to the next slow function, which is kendo.guid(). It's used to assign uid to each observable object, and also in couple of other places.

Here is its source:

guid: function() { var id = "", i, random; for (i = 0; i < 32; i++) { random = math.random() * 16 | 0; if (i == 8 || i == 12 || i == 16 || i == 20) { id += "-"; } id += (i == 12 ? 4 : (i == 16 ? (random & 3 | 8) : random)).toString(16); } return id; }

KendoUI people have decided to define uid as a string in format of Globally unique identifier. We think there is no reason to have such a complex value; it's enough to have counter to generate uid values. As KendoUI relies on the string type of uid, so we have defined a patch like this:

var guid = 0 kendo.guid = function() { return ++guid + "" }

Consider now a test case. It's almost identical to that in previous post:

<!DOCTYPE html> <html> <head> <title>Test</title> <script src="scripts/jquery/jquery.min..js"></script> <script src="scripts/kendo/kendo.web.min.js"></script> <link href="styles/kendo.common.min.css" rel="stylesheet" /> <link href="styles/kendo.default.min.css" rel="stylesheet" /> <link href="styles/style.css" rel="stylesheet" /> <script> var model; function init() { var source = []; for(var i = 0; i < 1000; ++i) { source.push({ text: "value " + i, value: "" + i }); } model = kendo.observable( { value: "1", source: new kendo.data.DataSource( { data: source }) }); model.source.read(); } function patch() { var base = kendo.data.binders.widget.source.fn._ns; var result; var guid = 0; kendo.guid = function() { return ++guid + ""; }; kendo.data.binders.widget.source.fn._ns = function(ns) { return ns ? base.call(this, ns) : (result || (result = base.call(this, ns))); } } function test() { init(); kendo.bind("#view", model); } patch(); </script> </head> <body> <p> <button onclick="test()">Click to start test</button> </p> <p id="view"> Select: <input data-role="dropdownlist" data-bind="value: value, source: source" data-text-field="text" data-value-field="value"/> </p> </body> </html>

Now, we can compare performance with and without that patch.

Here is a run statistics without patch:

Level	Function	Count	Inclusive time (ms)	Inclusive time %	Avg time (ms)
1	onclick	1	270.73	100	270.73
1.1	test	1	269.73	99.63	269.73
1.1.1	init	1	117.07	43.24	117.07
1.1.1.1	guid	1,001	72.05	26.61	0.07
1.1.2	bind	1	152.65	56.39	152.65

and with patch:

Level	Function	Count	Inclusive time (ms)	Inclusive time %	Avg time (ms)
1	onclick	1	172.64	100	172.64
1.1	test	1	171.65	99.42	171.65
1.1.1	init	1	62.04	35.94	62.04
1.1.1.1	guid	1,001	1	0.58	0
1.1.2	bind	1	109.6	63.49	109.6

Note that statistics were collected for IE 10.
An example can be found at slow2.html.

Friday, 09 August 2013 14:24:16 UTC

Comments [0] -
javascript | kendoui | Tips and tricks

Visual to logical conversion in Java

Time after time we run into the same problem on different platforms, with different languages. The problem's name is "Visual to Logical conversion for right-to-left or bidirectional text". The problem is usually due to legacy code, which stores texts in visual order from left to right. In case of English it's ok, but with Hebrew this means that texts are partially reversed.

It worth to note that we've solved the same task with Windows API for native and .NET applications more than 10 years ago.

On the other hand, for Java, we yet didn't see any acceptable standalone solution. To remedy this omission, we publish here our solution to this problem.

package com.nesterovskyBros.text;

import java.text.Bidi;

/**
 * Utility that uses {@link Bidi} class. 
 */
public class BidiUtils
{
  /**
   * Implements visual to logical order converter.
   * 
   * @author <a href="http://www.nesterovsky-bros.com">Nesterovsky bros</a>
   *
   * @param text an input text in visual order to convert.
   * @return a String value in logical order.
   */
  public static String visualToLogical(String text)
  {
    if ((text == null) || (text.length() == 0))
    {
        return text;
    }
  
    Bidi bidi = new Bidi(text, Bidi.DIRECTION_DEFAULT_LEFT_TO_RIGHT);
  
    if (bidi.isLeftToRight())
    {
        return text;
    }
  
    int count = bidi.getRunCount();
    byte[] levels = new byte[count];
    Integer[] runs = new Integer[count];
  
    for (int i = 0; i < count; i++)
    {
       levels[i] = (byte)bidi.getRunLevel(i);
       runs[i] = i;
    }
  
    Bidi.reorderVisually(levels, 0, runs, 0, count);

    StringBuilder result = new StringBuilder();

    for (int i = 0; i < count; i++)
    {
       int index = runs[i];
       int start = bidi.getRunStart(index);
       int end = bidi.getRunLimit(index);
       int level = levels[index];
  
       if ((level & 1) != 0)
       {
         for (; --end >= start;)
         {
           result.append(text.charAt(end));
         }
       }
       else
       {
         result.append(text, start, end);
       }
    }

    return result.toString();
  }
}

This method utilizes embeded Bidi's algorithm, see class java.text.Bidi.

Be aware that there is no perfect algorithm that covers all possible cases, since BIDI was written for an opposite task, but our implementation based on Bidi.reorderVisually is usually acceptable.

Here is an JUnit test for this method:

package com.nesterovskyBors.text;

import static org.junit.Assert.*;

import org.junit.Test;

import com.nesterovskyBros.text.BidiUtils;

public class BidiUtilsTests
{

  @Test
  public void testsVisualToLogical()
  {
    String text = "123 יתימאה ןחבמ";
    String actual = BidiUtils.visualToLogical(text);
    String expected = "מבחן האמיתי 123";
  
    assertEquals(expected, actual);
    
    text = "תירבע English תירבע בוש";
    actual = BidiUtils.visualToLogical(text);
    expected = "שוב עברית English עברית";
    
    assertEquals(expected, actual);
  }
}

Sunday, 28 July 2013 15:44:17 UTC

Comments [0] -
Java | Tips and tricks

Kendo progress utility

While developing with KendoUI we have found kendo.ui.progress(container, toggle) function to be very useful. It's used to show or hide a progress indicator in the container element.

At the same time we have found that we usually used it in a context of async operation. This way, we want to show progress, perform some asynchronous operations, hide progress. So, we clearly want to benifit from RAII pattern: we would like to open a progress scope, and to perform some activity withing this scope.

Arguing like this, we have defined a utility function, which is the fusion of kendo.ui.progress() and $.when(). Its signature is like this:

nesterovskyBros.progress = function(instance /*, task ... */)

where instance is either Model, Widget, JQuery or DOM Element, and task is one or more deferred objects. This function shows a progress and returns a Promise that will hide a progress when all tasks will be complete. Implementation is trivial, so we quote it here:

// Fusion of kendo.ui.progress() and $.when(). scope.progress = function(instance /*, task ... */) { if (instance instanceof Model) { instance = instance.owner && instance.owner(); } if (instance instanceof Widget) { instance = instance.element; } if (instance && instance.nodeType) { instance = $(instance); } var id = ns + "-progress"; // "nesterovskyBros-progress"; var progress = (instance && instance.data(id)) || 0; if (arguments.length < 2) { return progress; } var result = $.when.apply(null, [].slice.call(arguments, 1)); if (instance) { instance.data(id, ++progress); kendo.ui.progress(instance, progress > 0); result.always( function() { progress = instance.data(id) || 0; instance.data(id, --progress); kendo.ui.progress(instance, progress > 0); }); } return result; };

The use is like this:

nesterovskyBros.progress(element, $.ajax("/service1"), $.ajax("/service2")).then(myFunc);

The code can be found at controls.js.

Tuesday, 28 May 2013 05:54:52 UTC

Comments [0] -
javascript | kendoui | Tips and tricks

KendoUI controls.js

While trying to generalize our practices from KendoUI related projects we've participated so far, we updated control.js - a small javascript additions to KendoUI.

At present we have defined:

1. An extended model. See KendoUI extended model.

2. A lightweight user control - a widget to bind a template and a model, and to facilitate declarative instantiation. See KendoUI User control.

3. A reworked version of nesterovskyBros.defineControl() function.

var widgetType = scope.defineControl( { name: widget-name-string, model: widget-model-type, template: optional-content-template, windowOptions: optional-window-options }, base);

When optional-content-template is not specified then template is calculated as following:

var template = options.temlate || proto.template || model.temlate; if (template === undefined) { template = scope.template(options.name.toLowerCase() + "-template"); }

When windowOptions is specified then widgetType.dialog(options) function is defined. It's used to open dialog based on the specified user control. windowOptions is passed to kendo.ui.Window constructor. windowOptions.closeOnEscape indicates whether to close opened dialog on escape.

widgetType.dialog() returns a kendo.ui.Window instance with content based on the user control. Window instance contains functions:

result() - a $.Deffered for the dialog result, and
model() - referring to the user control model.

The model instance has functions:

dialog() referring to the dialog, and
result() referring to the dialog result.

widget.dialog() allows all css units in windowOptions.width and windowOptions.height parameters.

base - is optional user control base. It defaults to nesterovskyBros.ui.UserControl.

4. Adjusted splitter. See Adjust KendoUI Splitter.

5. Auto resize support.

Layout is often depends on available area. One example is Splitter widget that recalculates its panes when window or container Splitter is resized. There are other cases when you would like to adjust layout when a container's area is changed like: adjust grid, tab, editor or user's control contents.

KendoUI does not provide a solution for this problem, so we have defined our own.

A widget can be marked with class="auto-resize" marker;
A widget may define a widgetType.autoResize(element) function that adapts widget to a new size.
A code can call nesterovskyBros.resize(element) function at trigger resizing of the subtree.

To support existing controls we have defined autoResize() function for Grid, Splitter, TabStrip, and Editor widgets.

To see how auto resizing works, it's best to look into index.html, products.tmpl.html, and into the implementation controls.js.

Please note that we consider controls.js as an addition to KendoUI library. If in the future the library will integrate or implement similar features we will be happy to start using their API.

Extensions in `nesterovskyBros.data.Model`

As with kendo.data.Model there is fields Object - a set of key/value pairs to configure the model fields, but fields have some more options:
- fields.fieldName.serializable Boolean - indicates whether the field appears in an object returned in model.toJSON(). Default is true.
- fields.fieldName.updateDirty Boolean - indicates whether the change of the property should trigger dirty field change. Default is true.
When model defines a field and there is a prototype function with the same name then this function is used to get and set a field value.
When property is changed through the model.set() method then dirty change event is triggered (provided that fields.fieldName.updateDirty !== false). This helps to build a dependcy graph on that property.
When model instance is consturcted, the data passed in are validated, nullable and default values are set.

Model example

Here is an example of a model:

nesterovskyBros.data.ProductModel = nesterovskyBros.data.Model.define( { fields: { name: { type: "string", defaultValue: "Product Name" }, price: { type: "number", defaultValue: 10 }, unitsInStockValue: { type: "number", defaultValue: 10, serializable: false }, unitsInStock: { type: "string" } }, unitsInStock: function(value) { if (value === undefined) { var count = this.get("unitsInStockValue"); return ["one", "two", "three", "four"][count] || (count + ""); } else { this.set("unitsInStockValue", ({one: 1, two: 2, three: 3, four: 4 })[value] || value); } } });

Notice that:

unitsInStock property is implemented as a function - this helps to map model values to presentation values.
when you call model.toJSON(), or JSON.stringify() you will see in result name, price, unitsInStock values only - this helps to get model's state and to store it somewhere (e.g. in sessionStorage).
in a code:
var model = new nesterovskyBros.data.ProductModel({ price: "7", unitsInStock: "one" });
the following is true:
(typeof(model.price) == "number") && (mode.price == 7) && (model.name == "Product Name") && (model.unitsInStockValue == 1)

As with UserControl the implemntation is defined in the controls.js. The sample page is the same index.html

Wednesday, 03 April 2013 20:37:49 UTC

Comments [0] -
javascript | Thinking aloud | Tips and tricks

KendoUI User control

Developing with KendoUI we try to formalize tasks. With this in mind we would like to have user controls.

We define user control as following:

It is a javascript class that extends Widget.
It offers a way to reuse UI.
It allows to define a model and a template with UI and data binding.

Unfortunately, KendoUI does not have such API, though one can easily define it; so we have defined our version.

Here we review our solution. We have taken a grid KendoUI example and converted it into a user control.

User control on the page

See index.html

<!DOCTYPE html> <html> <head> <title>Test</title>  <script src="scripts/templates.js"></script> <script src="scripts/jquery/jquery.js"></script> <script src="scripts/kendo/kendo.web.min.js"></script>  <script src="scripts/controls.js"></script>  <script src="scripts/controls/confirm.js"></script>  <script src="scripts/controls/products.js"></script> <link href="styles/kendo.common.min.css" rel="stylesheet" /> <link href="styles/kendo.default.min.css" rel="stylesheet" /> <script> $(function () { // (5) Bind the page. kendo.bind( document.body, // (6) Model as a datasource. { source: [new nesterovskyBros.data.ProductsModel] }); }); </script> </head> <body>  <div data-role="products" data-bind="source: source"></div> </body> </html>

That's what we see here:

Templates that define layouts. See "How To: Load KendoUI Templates from External Files", and templates.tt.
Definition of the UserControl widget.
Confirm dialog user control (we shall mention it later).
Products user control.
Data binding that instantiates page controls.
Model is passed to a user control through the dataSource.
Use of Products user control. Notice that "data-role" defines control type, "source" refers to the model.

User Control declaration

Declaration consists of a view and a model.

View is html with data binding. See products.tmpl.html

We build our project using Visual Studio, so templates packaging is done with templates.tt. This transformation converts products template into a tag:

<script id="products-template" type="text/x-kendo-template">

thus template can be referred by a utility function: nesterovskyBros.template("products-template").

Model inherits kedo.data.Model. Here how it looks:

// (1) Define a ProducsModel class. nesterovskyBros.data.ProductsModel = kendo.data.Model.define( { // (2) Model properties. fields: { productName: { type: "string", defaultValue: "Product Name" }, productPrice: { type: "number", defaultValue: 10 }, productUnitsInStock: { type: "number", defaultValue: 10 }, products: { type: "default", defaultValue: [] } }, // (3) Model methods. addProduct: function () { ... }, deleteProduct: function (e) { ... }, ... }); // (4) Register user control. nesterovskyBros.ui.Products = nesterovskyBros.defineControl( { name: "Products", model: nesterovskyBros.data.ProductsModel });

That's what we have here:

We define a model that inherits KendoUI Model.
We define model fields.
We define model methods.
Register user control with nesterovskyBros.defineControl(proto) call, where:
- proto.name - defines user control name;
- proto.model - defines model type;
- proto.template - defines optional template. If not specified, a template is retrieved from $("#" + proto.name.toLowerCase() + "-template").html().

UserControl API

Now, what's remained is API for the UserControl. See controls.js.

UserControl defines following events:
- change - triggered when data source is changed;
- dataBound - triggered when widget is data bound;
- dataBinding - triggered befor widget data binding;
- save - used to notify user to save model state.
UserControl defines following options:
- autoBind (default false) - autoBind data source;
- template (default $.noop) - user control template.
UserControl defines dataSource field and setDataSource() method.
UserControl defines rebind() method to manually rebuild widget's view from the template and model.
UserControl sets/deletes model.owner, which is a function returning a user control widget when model is bound/unbound to the widget.
When UserControl binds/unbinds model a model.refresh method is called, if any.
You usually define you control with a call nesterovskyBros.defineControl(proto). See above.
There is also a convenience method to build a dialog based on a user control: nesterovskyBros.defineDialog(options), where
- options.name - a user control name (used in the data-role);
- options.model - a model type;
- options.windowOptions - a window options.
This method returns a function that recieves a user control model, and returns a dialog (kendo.ui.Window) based on the user control.
Dialog has model() function that returns an instance of model.
Model has dialog() function that returns an instance of the dialog.
Dialog and model have result() function that returns an instance of deferred object used to track dialog completion.
The example of user control dialog is confirm.js and confirm.tmpl.html. The use is in the products.js deleteProduct():

deleteProduct: function(e) { var that = this; return nesterovskyBros.dialog.confirm( { title: "Please confirm", message: "Do you want to delete the record?", confirm: "Yes", cancel: "No" }). open(). center(). result(). then( function(confirmed) { if (!confirmed) { return; } ... }); }

Last

User controls along with technique to manage and cache templates allow us to build robust web applications. As the added value it's became a trivial task to build SPA.

Later on...

To clarify using of MQPoolManager I've decided to show here following code snippet:

  MQQueueManager queueManager = 
    MQPoolManager.Get(QueueManagerName, ChannelName, ConnectionName);

  try
  {
     // TODO: some work with MQ here
  }
  finally
  {
    MQPoolManager.Release(ref queueManager);
  }

  // at this point the queueManager is null

Saturday, 05 April 2008 20:55:57 UTC

Comments [7] -
Tips and tricks

Xslt 2.0 tips, public and private stylesheet members

In the xslt world there is no widely used custom to think of stylesheet members as of public and private in contrast to other programming languages like C++/java/c# where access modifiers are essential. The reason is in complexity of stylesheets: the less size of code - the easier to developer to keep all details in memory. Whenever xslt program grows you should modularize it to keep it manageable.

At the point where modules are introduced one starts thinking of public interface of module and its implementation details. This separation is especially important for the template matching as you won't probably want to match private template just because you've forgotten about some template in implementation of some module.

To make public or private member distinction you can introduce two namespaces in your stylesheet, like:

xmlns:t="http://www.nesterovsky-bros.com/public" - for public members;
xmlns:p="http://www.nesterovsky-bros.com/private/expression.xslt" - for private members.

For the private namespace you can use a unique name, e.g. stylesheet name as part of uri.

The following example is based on jxom. This stylesheet builds expression from expression tree. Public part consists only of t:get-expression function, other members are private:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:t="http://www.nesterovsky-bros.com/public"
xmlns:p="http://www.nesterovsky-bros.com/private/expression.xslt"
xmlns="http://www.nesterovsky-bros.com/download/jxom.zip"
xpath-default-namespace="http://www.nesterovsky-bros.com/download/jxom.zip"
exclude-result-prefixes="xs t p">

<xsl:output method="text" indent="yes"/>


<xsl:template match="/">
    <xsl:variable name="expression" as="element()">
      <lt>
        <sub>
          <mul>
            <var name="b"/>
            <var name="b"/>
          </mul>
          <mul>
            <mul>
              <int>4</int>
              <var name="a"/>
            </mul>
            <var name="c"/>
          </mul>
        </sub>
        <double>0</double>
      </lt>
    </xsl:variable>

    <xsl:value-of select="t:get-expression($expression)" separator=""/>
</xsl:template>


<xsl:function name="t:get-expression" as="item()*">
    <xsl:param name="element" as="element()"/>

    <xsl:apply-templates mode="p:expression" select="$element"/>
</xsl:function>


<xsl:function name="p:get-binary-expression" as="item()*">
    <xsl:param name="element" as="element()"/>
    <xsl:param name="type" as="xs:string"/>

    <xsl:sequence select="t:get-expression($element/*[1])"/>
    <xsl:sequence select="' '"/>
    <xsl:sequence select="$type"/>
    <xsl:sequence select="' '"/>
    <xsl:sequence select="t:get-expression($element/*[2])"/>
</xsl:function>


<xsl:template mode="p:expression" match="@*|node()">
    <xsl:sequence select="error(xs:QName('invalid-expression'), name())"/>
</xsl:template>


<xsl:template mode="p:expression" match="or">
    <xsl:sequence select="p:get-binary-expression(., '||')"/>
</xsl:template>


<xsl:template mode="p:expression" match="and">
    <xsl:sequence select="p:get-binary-expression(., '&&')"/>
</xsl:template>


<xsl:template mode="p:expression" match="eq">
    <xsl:sequence select="p:get-binary-expression(., '==')"/>
</xsl:template>


<xsl:template mode="p:expression" match="ne">
    <xsl:sequence select="p:get-binary-expression(., '!=')"/>
</xsl:template>


<xsl:template mode="p:expression" match="le">
    <xsl:sequence select="p:get-binary-expression(., '<=')"/>
</xsl:template>


<xsl:template mode="p:expression" match="ge">
    <xsl:sequence select="p:get-binary-expression(., '>=')"/>
</xsl:template>


<xsl:template mode="p:expression" match="lt">
    <xsl:sequence select="p:get-binary-expression(., '<')"/>
</xsl:template>


<xsl:template mode="p:expression" match="gt">
    <xsl:sequence select="p:get-binary-expression(., '>')"/>
</xsl:template>


<xsl:template mode="p:expression" match="add">
    <xsl:sequence select="p:get-binary-expression(., '+')"/>
</xsl:template>


<xsl:template mode="p:expression" match="sub">
    <xsl:sequence select="p:get-binary-expression(., '-')"/>
</xsl:template>


<xsl:template mode="p:expression" match="mul">
    <xsl:sequence select="p:get-binary-expression(., '*')"/>
</xsl:template>


<xsl:template mode="p:expression" match="div">
    <xsl:sequence select="p:get-binary-expression(., '/')"/>
</xsl:template>


<xsl:template mode="p:expression" match="neg">
    <xsl:sequence select="'-'"/>
    <xsl:sequence select="t:get-expression(*[1])"/>
</xsl:template>


<xsl:template mode="p:expression" match="not">
    <xsl:sequence select="'!'"/>
    <xsl:sequence select="t:get-expression(*[1])"/>
</xsl:template>


<xsl:template mode="p:expression" match="parens">
    <xsl:sequence select="'('"/>
    <xsl:sequence select="t:get-expression(*[1])"/>
    <xsl:sequence select="')'"/>
</xsl:template>


<xsl:template mode="p:expression" match="var">
    <xsl:sequence select="@name"/>
</xsl:template>


<xsl:template mode="p:expression"
    match="int | short | byte | long | float | double">
    <xsl:sequence select="."/>
</xsl:template>

</xsl:stylesheet>

Tuesday, 25 March 2008 06:23:30 UTC

Comments [0] -
Tips and tricks | xslt

Xslt for the jxom (Java xml object model)

Hello again!

To see first part about jxom please read.

I'm back with jxom (Java xml object model). I've finally managed to create an xslt that generates java code from jxom document.

Will you ask why it took as long as a week to produce it?

There are two answers:
1. My poor talents.
2. I've virtually created two implementations.

My first approach was to directly generate java text from xml. I was a truly believer that this is the way. I've screwed things up on that way, as when you're starting to deal with indentations, formatting and reformatting of text you're generating you will see things are not that simple. Well, it was a naive approach.

I could finish it, however at some point I've realized that its complexity is not composable from complexity of its parts, but increases more and more. This is not permissible for a such simple task. Approach is bad. Point.

An alternative I've devised is simple and in fact more natural than naive approach. This is a two stage generation:
a) generate sequence of tokens - serializer;
b) generate and then print a sequence of lines - streamer.

Tokens (item()*) are either control words (xs:QName), or literals (xs:string).

I've defined following control tokens:

Token	Description
t:indent	indents following content.
t:unindent	unindents following content.
t:line-indent	resets indentation for one line.
t:new-line	new line token.
t:terminator	separates token sequences.
t:code	marks line as code (default line type).
t:doc	marks line as documentation comment.
t:begin-doc	marks line as begin of documentation comment.
t:end-doc	marks line as end of documentation comment.
t:comment	marks line as comment.

Thus an input for the streamer looks like:

<xsl:sequence select="'public'"/>
<xsl:sequence select="' '"/>
<xsl:sequence select="'class'"/>
<xsl:sequence select="' '"/>
<xsl:sequence select="'A'"/>
<xsl:sequence select="$t:new-line"/>
<xsl:sequence select="'{'"/>
<xsl:sequence select="$t:new-line"/>
<xsl:sequence select="$t:indent"/>
<xsl:sequence select="'public'"/>
<xsl:sequence select="' '"/>
<xsl:sequence select="'int'"/>
<xsl:sequence select="' '"/>
<xsl:sequence select="'a'"/>
<xsl:sequence select="';'"/>
<xsl:sequence select="$t:unindent"/>
<xsl:sequence select="$t:new-line"/>
<xsl:sequence select="'}'"/>
<xsl:sequence select="$t:new-line"/>

Streamer receives a sequence of tokens and transforms it in a sequence of lines.

One beautiful thing about tokens is that streamer can easily perform line breaks in order to keep page width, and another convenient thing is that code generating tokens should not track indentation level, as it just uses t:indent, t:unindent control tokens to increase and decrease current indentation.

The way the code is built allows mimic any code style. I've followed my favorite one. In future I'll probably add options controlling code style. In my todo list there still are several features I want to implement, such as line breaker to preserve page width, and type qualification optimizer (optional feature) to reduce unnecessary type qualifications.

Current implementation can be found at jxom.zip. It contains:

File	Description
java.xsd	jxom xml schema.
java-serializer-main.xslt	transformation entry point.
java-serializer.xslt	generates tokens for top level constructs.
java-serializer-statements.xslt	generates tokens for statements.
java-serializer-expressions.xslt	generates tokens for expressions.
java-streamer.xslt	converts tokens into lines.
DataAdapter.xml	sample jxom document.

This was my first experience with xslt 2.0. I feel very pleased with what it can do. The only missed feature is indirect function call (which I do not want to model with dull template matching approach).

Note that in spite that xslt I've built is platform independed I want to point out that I was experimenting with saxon 9. Several times I've relied on efficient tail call implementation (see t:cumulative-integer-sum), which otherwise will lead to xslt stack overflow.

I shall be pleased to see your feedback on the subject.

Saturday, 16 February 2008 10:42:16 UTC

Comments [7] -
Tips and tricks | xslt

Java xml object model

Hello,

I was not writing for a long time. IMHO: nothing to say? - do not noise!

Nowadays I'm busy with xslt.

Should I be pleased that w3c committee has finally delivered xpath 2.0/xslt 2.0/xquery? There possibly were people who have failed to wait till this happened, and who have died. Be grateful to the fate we have survived!

I'm working now with saxon 9. It's good implementation, however too interpreter like in my opinion. I think these languages could be compiled down to machine/vm code the same way as c++/java/c# do.

To the point.
I need to generate java code in xslt. I've done this earlier; that time I dealt with relatively simple templates like beans or interfaces. Now I need to generate beans, interfaces, classes with logic. In fact I should cover almost all java 6 features.

Immediately I've started thinking in terms of java xml object model (jxom). Thus there will be an xml schema of jxom (Am I inventing bicycle? I pray you to point me to an existing schema!) - java grammar as xml. There will be xslts, which generate code according to this schema, and xslt that will serialize jxom documents derectly into java.

This two stage generation is important as there are essentially two different tasks: generate java code, and serialize it down to a text format. Moreover whenever I have jxom document I can manipulate it! And finally this will allow to our team to concentrate efforts, as one should only generate jxom document.

Yesterday, I've found java ANLT grammar, and have converted it into xml schema: java.xsd. It is important to have this xml schema defined, even if no one shall use it except in editor, as it makes jxom generation more formal.

The next step is to create xslt serializer, which is in todo list.

To feel how jxom looks I've created it manually for some simple java file:

// $Id: DataAdapter.java 1122 2007-12-31 12:43:47Z arthurn $
package com.bphx.coolgen.data;

import java.util.List;

/**
* Encapsulates encyclopedia database access.
*/
public interface DataAdapter
{
/**
   * Starts data access session for a specified model.
   * @param modelId - a model to open.
   */
void open(int modelId)
throws Exception;

/**
   * Ends data access session.
   */
void close()
   throws Exception;

/**
   * Gets current model id.
   * @return current model id.
   */
int getModelId();

/**
   * Gets data objects for a specified object type for the current model.
   * @param type - an object type to get data objects for.
   * @return list of data objects.
   */
List<DataObject> getObjectsForType(short type)
throws Exception;

/**
   * Gets a list of data associations for an object id.
   * @param id - object id.
   * @return list of data associations.
   */
List<DataAssociation> getAssociations(int id)
throws Exception;

/**
   * Gets a list of data properties for an object id.
   * @param id - object id.
   * @return list of data properties.
   */
List<DataProperty> getProperties(int id)
throws Exception;
}

jxom:

<unit xmlns="http://www.bphx.com/java-1.5/2008-02-07" package="com.bphx.coolgen.data">
<comment>$Id: DataAdapter.java 1122 2007-12-31 12:43:47Z arthurn $</comment>
<import package="java.util.List"/>
<interface access="public" name="DataAdapter">
    <comment doc="true">Encapsulates encyclopedia database access.</comment>
    <method name="open">
      <comment doc="true">
        Starts data access session for a specified model.
        <para type="param" name="modelId">a model to open.</para>
      </comment>
      <parameters>
        <parameter name="modelId"><type name="int"/></parameter>
      </parameters>
      <throws><type name="Exception"/></throws>
    </method>
    <method name="close">
      <comment doc="true">Ends data access session.</comment>
      <throws><type name="Exception"/></throws>
    </method>
    <method name="getModelId">
      <comment doc="true">
        Gets current model id.
        <para type="return">current model id.</para>
      </comment>
      <returns><type name="int"/></returns>
      <throws><type name="Exception"/></throws>
    </method>
    <method name="getObjectsForType">
      <comment doc="true">
        Gets data objects for a specified object type for the current model.
        <para name="param" type="type">
          an object type to get data objects for.
        </para>
        <para type="return">list of data objects.</para>
      </comment>
      <returns>
        <type>
          <part name="List">
            <typeArgument><type name="DataObject"/></typeArgument>
          </part>
        </type>
      </returns>
      <parameters>
        <parameter name="type"><type name="short"/></parameter>
      </parameters>
      <throws><type name="Exception"/></throws>
    </method>
    <method name="getAssociations">
      <comment doc="true">
        Gets a list of data associations for an object id.
        <para type="param" name="id">object id.</para>
        <para type="return">list of data associations.</para>
      </comment>
      <returns>
        <type>
          <part name="List">
            <typeArgument><type name="DataAssociation"/></typeArgument>
          </part>
        </type>
      </returns>
      <parameters>
        <parameter name="id"><type name="int"/></parameter>
      </parameters>
      <throws><type name="Exception"/></throws>
    </method>
    <method name="getProperties">
      <comment doc="true">
        Gets a list of data properties for an object id.
        <para type="param" name="id">object id.</para>
        <para type="return">list of data properties.</para>
      </comment>
      <returns>
        
        <type name="List<DataProperty>"/>
      </returns>
      <parameters>
        <parameter name="id"><type name="int"/></parameter>
      </parameters>
      <throws><type name="Exception"/></throws>
    </method>
</interface>
</unit>

To read about xslt for jxom please follow this link.

Saturday, 09 February 2008 17:56:45 UTC

Comments [3] -
Tips and tricks | xslt

C++ Standard Library Issues List, Issue 254

C++ Standard Library Issues List, Issue 254

I'm tracking this issue already for the several years, and have my unpretentious opinion. To make my arguments clear I'll bring the issue description here.

254. Exception types in clause 19 are constructed from `std::string`

Section: 19.1 [std.exceptions], 27.4.2.1.1 [ios::failure] Status: Tentatively Ready Submitter: Dave Abrahams Date: 2000-08-01

Discussion:

Many of the standard exception types which implementations are required to throw are constructed with a const std::string& parameter. For example:

     19.1.5  Class out_of_range                          [lib.out.of.range]
     namespace std {
       class out_of_range : public logic_error {
       public:
         explicit out_of_range(const string& what_arg);
       };
     }

   1 The class out_of_range defines the type of objects  thrown  as  excep-
     tions to report an argument value not in its expected range.

     out_of_range(const string& what_arg);

     Effects:
       Constructs an object of class out_of_range.
     Postcondition:
       strcmp(what(), what_arg.c_str()) == 0.

There are at least two problems with this:

A program which is low on memory may end up throwing std::bad_alloc instead of out_of_range because memory runs out while constructing the exception object.
An obvious implementation which stores a std::string data member may end up invoking terminate() during exception unwinding because the exception object allocates memory (or rather fails to) as it is being copied.

There may be no cure for (1) other than changing the interface to out_of_range, though one could reasonably argue that (1) is not a defect. Personally I don't care that much if out-of-memory is reported when I only have 20 bytes left, in the case when out_of_range would have been reported. People who use exception-specifications might care a lot, though.

There is a cure for (2), but it isn't completely obvious. I think a note for implementors should be made in the standard. Avoiding possible termination in this case shouldn't be left up to chance. The cure is to use a reference-counted "string" implementation in the exception object. I am not necessarily referring to a std::string here; any simple reference-counting scheme for a NTBS would do.

Further discussion, in email:

...I'm not so concerned about (1). After all, a library implementation can add const char* constructors as an extension, and users don't need to avail themselves of the standard exceptions, though this is a lame position to be forced into. FWIW, std::exception and std::bad_alloc don't require a temporary basic_string.

...I don't think the fixed-size buffer is a solution to the problem, strictly speaking, because you can't satisfy the postcondition
strcmp(what(), what_arg.c_str()) == 0
For all values of what_arg (i.e. very long values). That means that the only truly conforming solution requires a dynamic allocation.

Further discussion, from Redmond:

The most important progress we made at the Redmond meeting was realizing that there are two separable issues here: the const string& constructor, and the copy constructor. If a user writes something like throw std::out_of_range("foo"), the const string& constructor is invoked before anything gets thrown. The copy constructor is potentially invoked during stack unwinding.

The copy constructor is a more serious problem, becuase failure during stack unwinding invokes terminate. The copy constructor must be nothrow. Curaçao: Howard thinks this requirement may already be present.

The fundamental problem is that it's difficult to get the nothrow requirement to work well with the requirement that the exception objects store a string of unbounded size, particularly if you also try to make the const string& constructor nothrow. Options discussed include:

Limit the size of a string that exception objects are required to throw: change the postconditions of 19.1.2 [domain.error] paragraph 3 and 19.1.6 [runtime.error] paragraph 3 to something like this: "strncmp(what(), what_arg._str(), N) == 0, where N is an implementation defined constant no smaller than 256".
Allow the const string& constructor to throw, but not the copy constructor. It's the implementor's responsibility to get it right. (An implementor might use a simple refcount class.)
Compromise between the two: an implementation is not allowed to throw if the string's length is less than some N, but, if it doesn't throw, the string must compare equal to the argument.
Add a new constructor that takes a const char*

(Not all of these options are mutually exclusive.)

...

To be honest, I do not understand their (committee members') decisions. It seems they are trying to conceal themselves from the problem virtually proposing to store character buffer in the exception object. In fact the problem is more general, and is related to any exception types that store some data, and which can throw during copy construction. How to avoid problems during copy construction? Well, do not perform activity that can lead to an exception. If copying data can throw, then do not copy it! Thus we have to share data between exception objects.

This logic brought me to a safe exception type design. E.g. exception object should keep refcounted handle to a data object that is shared between type instances.

The only question is: why didn't they even consider this way?

Monday, 12 March 2007 09:52:09 UTC

Comments [0] -
Tips and tricks

Introduction

Parallel refactoring

Reference

How it works.

Extensions in nesterovskyBros.data.Model

Model example

User control on the page

User Control declaration

UserControl API

Last

Later on...

254. Exception types in clause 19 are constructed from std::string

Thread Safety

Extensions in `nesterovskyBros.data.Model`

254. Exception types in clause 19 are constructed from `std::string`