Monday, 30 June 2014 - Nesterovsky bros

Monday, 30 June 2014

Dasha

Dasha Nesterovsky

Picture of Dasha

Monday, 30 June 2014 13:01:25 UTC

Comments [0] -
Announce

Saturday, 28 June 2014

Async and scarce resources in C#

This is a small post about refactoring lock statements in async methods.

Before refactoring we had a code like this:

lock(sync) { result = methodToRefactorIntoAsync(); } ... private object sync = new object();

Lock is bound to a thread, thus no way you to use it in async code. As an alternative you may use SemaphoreSlim class:

await sync.WaitAsync(cancellationToken); try { result = await methodAsync(cancellationToken); } finally { sync.Release(); } ... private SemaphoreSlim sync = new SemaphoreSlim(1, 1);

Saturday, 28 June 2014 11:56:32 UTC

Comments [0] -
.NET | Thinking aloud

Friday, 27 June 2014

Client disconnects in ASP.NET Web API

What will you do if you have async Web API method that runs on server for a some time but your client is dropped?

There are two solutions:

Run method to the end and allow to a framework to deal with disconnect;
Try to be notified about client's drop and to break early.

The first approach is simplest but might result in some overconsumption of server resources. The other method requires you to check client status from time to time.

Fortunatelly, ASP.NET provides a HttpResponse.ClientDisconnectedToken property, which is limited to IIS 7.5+ in integrated mode, but still fits our needs. So, you should request ClientDisconnectedToken, if any, and implement your async code using that token.

The following extension function gets that token:

using System.Linq; using System.Net.Http; using System.Threading.Tasks; using System.Threading; using System.Web; public static class HttpApiExtensions { public static CancellationToken GetCancellationToken( this HttpRequestMessage request) { CancellationToken cancellationToken; object value; var key = typeof(HttpApiExtensions).Namespace + ":CancellationToken"; if (request.Properties.TryGetValue(key, out value)) { return (CancellationToken)value; } var httpContext = HttpContext.Current; if (httpContext != null) { var httpResponse = httpContext.Response; if (httpResponse != null) { try { cancellationToken = httpResponse.ClientDisconnectedToken; } catch { // Do not support cancellation. } } } request.Properties[key] = cancellationToken; return cancellationToken; } }

And here is a Web API WordCount service described in the previous post:

using System; using System.Collections.Generic; using System.Linq; using System.Net; using System.Net.Http; using System.Threading; using System.Threading.Tasks; public class ValuesController: ApiController { public async Task<int> GetWordCount([FromUri(Name = "url")] string[] urls) { var cancellationToken = Request.GetCancellationToken(); using(var client = new HttpClient()) { return (await Task.WhenAll( urls.Select(url => WordCountAsync(client, url, cancellationToken)))).Sum(); } } public static async Task<int> WordCountAsync( HttpClient client, string url, CancellationToken cancellationToken) { string content = await (await client.GetAsync(url, cancellationToken)). Content.ReadAsStringAsync(); return WordCount(content); } private static int WordCount(string text) { var count = 0; var space = true; for (var i = 0; i < text.Length; ++i) { if (space != char.IsWhiteSpace(text[i])) { space = !space; if (!space) { ++count; } } } return count; } }

Though is simple there is a nuisance. You should pass cancellation token here and there, which adds to a pollution from async.

Friday, 27 June 2014 14:42:44 UTC

Comments [0] -
.NET | Thinking aloud

Wednesday, 25 June 2014

Parallel vs Async in C#

Though parallel and async algorithms solve different tasks, they converge in some cases. And it's not always immediately clear what's the best.

Consider the following task: get a total word count contained in a given a set of urls.

At first we've solved it as a parallel task: indeed this fits to MapReduce pattern when you get urls' contents to count the number of words in parallel (Map), and then sum word counts per each url to get final result (Reduce). But then we decided that the very same MapReduce algorithm can be implemented with async.

This is a parallel word count:

public static int ParallelWordCount(IEnumerable<string> urls) { var result = 0; Parallel.ForEach( urls, url => { string content; using(var client = new WebClient()) { content = client.DownloadString(url); } var count = WordCount(content); Interlocked.Add(ref result, count); }); return result; }

Here is async word count:

public static async Task<int> WordCountAsync(IEnumerable<string> urls) { return (await Task.WhenAll(urls.Select(url => WordCountAsync(url)))).Sum(); } public static async Task<int> WordCountAsync(string url) { string content; using(var client = new WebClient()) { content = await client.DownloadStringTaskAsync(url); } return WordCount(content); }

And this is an implementation of word count for a text (it's less important for this discussion):

public static int WordCount(string text) { var count = 0; var space = true; for(var i = 0; i < text.Length; ++i) { if (space != char.IsWhiteSpace(text[i])) { space = !space; if (!space) { ++count; } } } return count; }

Our impressions are:

The parallel version is contained in one method, while the async one is implemeneted with two methods.

This is due to the fact that C# compiler fails to generate async labmda function. We attribute this to Microsoft who leads and implements C# spec. Features should be composable. If one can implement a method as a lambda function, and one can implement a method as async then one should be able to implement a method as an async lambda function.
Both parallel and async versions are using thread pool to run their logic.
While both implementations follow MapReduce pattern, we can see that async version is much more scaleable. It's because of parallel threads stay blocked while waiting for an http response. On the other hand async tasks are not bound to any thread and are just not running while waiting for I/O.

This sample helped us to answer the question as to when to use parallel and when async. The simple answer goes like this:

if your logic is only CPU bound then use parallel API;
otherwise use async API (this accounts I/O waits).

Wednesday, 25 June 2014 12:51:28 UTC

Comments [5] -
.NET | Thinking aloud

Monday, 23 June 2014

Async code in C#

Not a long ago C# has introduced special language constructs to simplify asynchronous programming. It seems C++1x will follow async trend. But only recently when frameworks like ASP.NET Web API and Entity Framework started to catch up we've felt how it's to program with async and await keywords.

At first glance it seems it's a pure pleasure to write async methods:

private async Task SumPageSizesAsync() { // To use the HttpClient type in desktop apps, you must include a using directive and add a // reference for the System.Net.Http namespace. HttpClient client = new HttpClient(); // . . . byte[] urlContents = await client.GetByteArrayAsync(url); // . . . }

To dereference a Task<T> into T you just write await task_t_expression, mark your method with async specifier, and adjust output type (if not void) to Task or Task<Result>. Compiler applies its magic to convert your code into an asynchronous state machine.

We liked this feature and immediately have started to use it. But, as we said, async/await has shined in full when frameworks made it a core element, and at that point we have started to see that while async/await solve the task, they does not abstract the developer from implementation details, as a code gets considerably polluted.

Consider a method with pollution marked:

public static async Task<UserAuthorization> GetAuthorizationAsync(string accessToken) { var key = "oauth2:" + accessToken; var authorization = cache.Get<UserAuthorization>(key); if (authorization != null) { return authorization; } using(var model = new ModelContainer()) { authorization = (await model.UserAuthorizations. Where(item => item.AccessToken == accessToken). ToListAsync()). FirstOrDefault(); } if (authorization == null) { authorization = await ValidateAsync(accessToken); } cache.Set(key, cache.ShortDelay, authorization); return authorization; }

The more you use async, the more pollution you will see in your code. Ideally we would like to see the method quoted without any marked parts.

Monday, 23 June 2014 06:15:55 UTC

Comments [0] -
.NET | Thinking aloud

Thursday, 12 June 2014

oauth2 and angularjs

We needed to have oauth2 authorization in angularjs project.

Internet search on the subject immediately brings large amout of solutions like:

oauth2 provider specific javascript libraries;
angular-oauth project;
oauth.io aggregator;

But unfortunatelly:

provider specific libraries have too different set of APIs, which requires another umbrella library to allow for application to accept several providers;
angular-oauth - supports Google only, and does not work in IE 11 with default security settings;
oauth.io looks attractive but adds additional level of indirection server, and is free for a basic plan only.

However there is a problem with all those approaches.

Let's assume that you have properly implemented client side authorization, and finally have gotten an access_token.

Now, you access your server with that access_token. What is your first step on the server?

Right! Your should validate it against oauth2 provider.

So, while client side authorization, among other things, included a validation of your token, you have to perform the validation on the server once more.

At this point we felt that we need to implement our oauth2 API. :-)

It can be found at nesterovsky-bros/oauth2.

This is the readme from that project:

Here we implement oauth2 authorization within angularjs.

Authentication is done as follows:

Open oauth2 provider login/grant screen.
Redirect to the oauth2 callback screen with access token.
Verify of the access token against provider.
Get some basic profile.

A base javascript class OAuth2 implements these steps.

There are following implementations that authorize against specific providers:

OAuth2Server - implements authorization through known providers, but calls server side to validate access token. This way, the server side can establish a user's session.

The file Config.json contains endpoints and request parameters per supported provider.

Note: You should register a client_id for each provider.

Note: user_id and access_tokens are unique only in a scope of access provider, thus a session is identified by Provider + access_token, and a user is identified by Provider + user_id.

The use case can be found in test.js E.g. authorization against OAuth2Server is done like this:

var login = new OAuth2Server(provider);

    

token = login.authorize();

    

token.$promise.then(

   
  function()

   
  {

       
    // token contains populated data.

   
  },

   
  function(error)

   
  {

       
    if (error)

       
    {

           
      // handle an error

       
    }

   
  });

Authorization token contains:

a promise to handle authorization outcome.
cancelToken (a Deferred object) to cancel authorization in progress.

Whole sample is implemented as VS project. All scripts are build with app.tt, that combines content of Scripts/app int app.js.

Server side is implemented with ASP.NET Web API. Authorization controllers are:

Thursday, 12 June 2014 14:59:42 UTC

Comments [0] -
.NET | AngularJS | javascript

Sunday, 25 May 2014

AngularJS

After several years of experience with KendoUI we turned our attention to AngularJS. As many other libraries it has its strong and weak sides. There are many resources describing what AngularJS is, and what it is not. Our approach to study AngularJS was through an attempt to integrate it into an existing KendoUI web application.

It's rather straightforward to convert model from KendoUI into AngularJS, as logically both frameworks are equal in this regard. But tactically KendoUI implements model-view binding very differently than AngularJS does. KendoUI binds model to view immediately per each model field, where AngularJS delays a binding of each model field and performs whole model binding in one go. Angular's approach is more performant, and even more appealing to a developer, though the problem is that the time it takes to make whole model binding is proportional to a size (number of objects and properties) of model. This means that if you have a relatively big model you will experience tangible halts in browser's UI while a javascript updating view/model is running.

AngularJS advices some workaround, which in essence is to avoid big model. The problem is that a couple of thousands or even several hundrends of objects and properties are already considered big model. So, you should immediately plan your model, and view to avoid any potential impact. This seriously distracts from the task your're solving.

The idea that your UI will halt for the time proportional to the size of your whole model looks flawed in our opinion. KendoUI knows no such a problem. That's the reason why our KendoUI to AngularJS transition experience was not smooth.

Our analysis of AngularJS sources shows that the issue could be resolved provided model to view binding (it's called digest in that library) was asynchronous.

To verify our ideas we have created a branch nesterovsky-bros/angular.js where we implemented required refactorings. It includes:

API based on existing deferred/promise to write algorithms in async way, and
refactored digest logic.

At the end we have proposed to integrate our changes into the main branch: Make $digest async.

We're not sure whether our proposition will be integrated (rather no than yes). Nevertheless what we have come with is an interesting extension of deferred object that we neither have seen in AngularJS nor in JQuery, so later we will quote that API from q.js and scheduler.js.

Sunday, 25 May 2014 08:02:41 UTC

Comments [2] -
AngularJS | javascript | kendoui | Thinking aloud

Monday, 28 April 2014

New features in XPath 3.1

Among proposed new features (other than Maps and Arrays) in XPath 3.1 we like Arrow operator (=>).

It's defined like this:

[Definition: An arrow operator is a postfix operator that applies a function to an item, using the item as the first argument to the function.] If $i is an item and f() is a function, then $i=>f() is equivalent to f($i), and $i=>f($j) is equivalent to f($i, $j).

This syntax is particularly helpful when conventional function call syntax is unreadable, e.g. when applying multiple functions to an item. For instance, the following expression is difficult to read due to the nesting of parentheses, and invites syntax errors due to unbalanced parentheses:

tokenize((normalize-unicode(upper-case($string))),"\s+")

Many people consider the following expression easier to read, and it is much easier to see that the parentheses are balanced:

$string=>upper-case()=>normalize-unicode()=>tokenize("\s+")

What it looks like?

Right! It's like extension functions in C#.

Monday, 28 April 2014 06:20:27 UTC

Comments [0] -
xslt

Thursday, 13 March 2014

Polite advice

MSDN topic CREATE TRIGGER (Transact-SQL) contains (at 2014-03-13) a polite advice we like very much. Here it is:

create trigger note

Thursday, 13 March 2014 13:34:10 UTC

Comments [0] -

Monday, 10 March 2014

Custom error handling with Web API

In the article "Error handling in WCF based web applications" we've shown a custom error handler for RESTful service based on WCF. This time we shall do the same for Web API 2.1 service.

Web API 2.1 provides an elegant way to implementat custom error handlers/loggers, see the following article. Web API permits many error loggers followed by a single error handler for all uncaught exceptions. A default error handler knows to output an error both in XML and JSON formats depending on requested MIME type.

In our projects we use unique error reference IDs. This feature allows to an end-user to refer to any error that has happened during the application life time and pass such error ID to the technical support for further investigations. Thus, error details passed to the client-side contain an ErrorID field. An error logger generates ErrorID and passes it over to an error handler for serialization.

Let's look at our error handling implementation for a Web API application.

The first part is an implementation of IExceptionLogger interface. It assigns ErrorID and logs all errors:

/// Defines a global logger for unhandled exceptions. public class GlobalExceptionLogger : ExceptionLogger { /// Writes log record to the database synchronously. public override void Log(ExceptionLoggerContext context) { try { var request = context.Request; var exception = context.Exception; var id = LogError( request.RequestUri.ToString(), context.RequestContext == null ? null : context.RequestContext.Principal.Identity.Name, request.ToString(), exception.Message, exception.StackTrace); // associates retrieved error ID with the current exception exception.Data["NesterovskyBros:id"] = id; } catch { // logger shouldn't throw an exception!!! } } // in the real life this method may store all relevant info into a database. private long LogError( string address, string userid, string request, string message, string stackTrace) { ... } }

The second part is the implementation of IExceptionHandler:

/// Defines a global handler for unhandled exceptions. public class GlobalExceptionHandler : ExceptionHandler { /// This core method should implement custom error handling, if any. /// It determines how an exception will be serialized for client-side processing. public override void Handle(ExceptionHandlerContext context) { var requestContext = context.RequestContext; var config = requestContext.Configuration; context.Result = new ErrorResult( context.Exception, requestContext == null ? false : requestContext.IncludeErrorDetail, config.Services.GetContentNegotiator(), context.Request, config.Formatters); } /// An implementation of IHttpActionResult interface. private class ErrorResult : ExceptionResult { public ErrorResult( Exception exception, bool includeErrorDetail, IContentNegotiator negotiator, HttpRequestMessage request, IEnumerable<MediaTypeFormatter> formatters) : base(exception, includeErrorDetail, negotiator, request, formatters) { } /// Creates an HttpResponseMessage instance asynchronously. /// This method determines how a HttpResponseMessage content will look like. public override Task<HttpResponseMessage> ExecuteAsync(CancellationToken cancellationToken) { var content = new HttpError(Exception, IncludeErrorDetail); // define an additional content field with name "ErrorID" content.Add("ErrorID", Exception.Data["NesterovskyBros:id"] as long?); var result = ContentNegotiator.Negotiate(typeof(HttpError), Request, Formatters); var message = new HttpResponseMessage { RequestMessage = Request, StatusCode = result == null ? HttpStatusCode.NotAcceptable : HttpStatusCode.InternalServerError }; if (result != null) { try { // serializes the HttpError instance either to JSON or to XML // depend on requested by the client MIME type. message.Content = new ObjectContent<HttpError>( content, result.Formatter, result.MediaType); } catch { message.Dispose(); throw; } } return Task.FromResult(message); } } }

Last, but not least part of this solution is registration and configuration of the error logger/handler:

/// WebApi congiguation. public static class WebApiConfig { public static void Register(HttpConfiguration config) { ... // register the exception logger and handler config.Services.Add(typeof(IExceptionLogger), new GlobalExceptionLogger()); config.Services.Replace(typeof(IExceptionHandler), new GlobalExceptionHandler()); // set error detail policy according with value from Web.config var customErrors = (CustomErrorsSection)ConfigurationManager.GetSection("system.web/customErrors"); if (customErrors != null) { switch (customErrors.Mode) { case CustomErrorsMode.RemoteOnly: { config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.LocalOnly; break; } case CustomErrorsMode.On: { config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.Never; break; } case CustomErrorsMode.Off: { config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.Always; break; } default: { config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.Default; break; } } } } }

The client-side error handler remain almost untouched. The implementation details you may find in /Scripts/api/api.js and Scripts/controls/error.js files.

You may download the demo project here. Feel free to use this solution in your .NET projects.

Monday, 10 March 2014 10:56:24 UTC

Comments [0] -
.NET | ASP.NET | javascript | kendoui | Tips and tricks

Friday, 07 March 2014

LINQ extensions

From time to time we run into tasks that we would like to solve in LINQ style but unfortunately it either cannot be done or a solution is not efficient.

Note that by LINQ style we do not mean C# query expressions (we have a strong distaste for that syntax) but extension methods defined in System.Linq.Enumerable and other classes.

Here we quote several extension methods that are good for a general use:

1. Select with predicate. This is shorthand of items.Where(...).Select(...):

/// <summary> /// Projects each element of a sequence into a new form. /// </summary> /// <typeparam name="T">A type of elements of source sequence.</typeparam> /// <typeparam name="R">A type of elements of target sequence.</typeparam> /// <param name="source">A source sequence.</param> /// <param name="where">A predicate to filter elements.</param> /// <param name="selector">A result element selector.</param> /// <returns>A target sequence.</returns> public static IEnumerable<R> Select<T, R>( this IEnumerable<T> source, Func<T, bool> where, Func<T, R> selector) { return source.Where(where).Select(selector); }

2. Select with predicate with source element index passed both into the predicate and into the selector. This one you cannot trivially implement in LINQ:

/// <summary> /// Projects each element of a sequence into a new form. /// </summary> /// <typeparam name="T">A type of elements of source sequence.</typeparam> /// <typeparam name="R">A type of elements of target sequence.</typeparam> /// <param name="source">A source sequence.</param> /// <param name="where">A predicate to filter elements.</param> /// <param name="selector">A result element selector.</param> /// <returns>A target sequence.</returns> public static IEnumerable<R> Select<T, R>( this IEnumerable<T> source, Func<T, int, bool> where, Func<T, int, R> selector) { var index = 0; foreach(var value in source) { if (where(value, index)) { yield return selector(value, index); } ++index; } }

3. A function with output element as projection of a window of input elements. Such function can be used to get finite difference (operation opposite to a cumulative sum).

/// <summary> /// Projects a window of source elements in a source sequence into target sequence. /// Thus /// target[i] = /// selector(source[i], source[i - 1], ... source[i - window + 1]) /// </summary> /// <typeparam name="T">A type of elements of source sequence.</typeparam> /// <typeparam name="R">A type of elements of target sequence.</typeparam> /// <param name="source">A source sequence.</param> /// <param name="window">A size of window.</param> /// <param name="lookbehind"> /// Indicate whether to produce target if the number of source elements /// preceeding the current is less than the window size. /// </param> /// <param name="lookahead"> /// Indicate whether to produce target if the number of source elements /// following current is less than the window size. /// </param> /// <param name="selector"> /// A selector that derives target element. /// On input it receives: /// an array of source elements stored in round-robing fashon; /// an index of the first element; /// a number of elements in the array to count. /// </param> /// <returns>Returns a sequence of target elements.</returns> public static IEnumerable<R> Window<T, R>( this IEnumerable<T> source, int window, bool lookbehind, bool lookahead, Func<T[], int, int, R> selector) { var buffer = new T[window]; var index = 0; var count = 0; foreach(var value in source) { if (count < window) { buffer[count++] = value; if (lookbehind || (count == window)) { yield return selector(buffer, 0, count); } } else { buffer[index] = value; index = index + 1 == window ? 0 : index + 1; yield return selector(buffer, index, count); } } if (lookahead) { while(--count > 0) { index = index + 1 == window ? 0 : index + 1; yield return selector(buffer, index, count); } } }

This way a finite difference looks like this:

var diff = input.Window( 2, false, false, (buffer, index, count) => buffer[index ^ 1] - buffer[index]);

4. A specialization of Window method that returns a enumeration of windows:

/// <summary> /// Projects a window of source elements in a source sequence into a /// sequence of window arrays. /// </summary> /// <typeparam name="T">A type of elements of source sequence.</typeparam> /// <typeparam name="R">A type of elements of target sequence.</typeparam> /// <param name="source">A source sequence.</param> /// <param name="window">A size of window.</param> /// <param name="lookbehind"> /// Indicate whether to produce target if the number of source elements /// preceeding the current is less than the window size. /// </param> /// <param name="lookahead"> /// Indicate whether to produce target if the number of source elements /// following current is less than the window size. /// </param> /// <returns>Returns a sequence of windows.</returns> public static IEnumerable<T[]> Window<T>( this IEnumerable<T> source, int window, bool lookbehind, bool lookahead) { return source.Window( window, lookbehind, lookahead, (buffer, index, count) => { var result = new T[count]; for(var i = 0; i < count; ++i) { result[i] = buffer[index]; index = index + 1 == buffer.Length ? 0 : index + 1; } return result; }); }

Friday, 07 March 2014 14:11:46 UTC

Comments [3] -
.NET | Tips and tricks

Monday, 24 February 2014

Bug either in C# spec or in Microsoft's implementation

While writing an article Dealing with dynamic SQL in SQL Server we have crushed on a trivial C# code.

Consider a declaration:

public enum Direction { Asc, Desc } public struct Order { public string Field { get; set; } public Direction Direction { get; set; } } public class DateRange { public DateTime? From { get; set; } public DateTime? To { get; set; } } public class Request { public DateRange CreatedAt { get; set; } public string Summary { get; set; } [XmlElement] public int[] State { get; set; } public DateRange UpdatedAt { get; set; } [XmlElement] public Order[] Order { get; set; } }

and a code:

... var request = new Request { CreatedAt = { From = new DateTime(2014, 1, 1) }, Order = new[] { new Order { Field = "CreatedAt" } } }; ...

It was hard to believe that this simple variable declaration throws NullReferenceException.

In fact we could not realize what the reason is until we have decompiled that code and poked into IL.

It appears that the problem is with the line:

CreatedAt = { From = new DateTime(2014, 1, 1) },

It's implemented like this:

var range = request.CreatedAt; range.From = new DateTime(2014, 1, 1) ; range.To = null;

In other words it assumes the instance stored in property CreatedAt is not null, and it reinitializes all its properties.

In contrast the following works as expected (no exception is thrown):

CreatedAt = new { From = new DateTime(2014, 1, 1) },

as it creates a new instance and assigns it to a property.

We think that this is very error prone, so if it's how C# is assumed to be then it's the problem of the spec, otherwise it's the problem of Microsoft implementation of C# (at least in VS 2010 - 2013).

Monday, 24 February 2014 08:48:35 UTC

Comments [4] -

Tuesday, 11 February 2014

Dealing with dynamic SQL in SQL Server

These are initial positions for this writing:

SQL Server allows to execute dynamic SQL.
Dynamic SQL is useful and often unavoidable, e.g. when you have to filter or order data in a way that you cannot code efficiently in advance.
Dynamic SQL has proven to be a dangerous area, as with improper use it can open hole in a security.

In general nothing stops you from building and then excuting of SQL string. Our goal, however, is to define rules that make work with dynamic SQL is more managable and verifiable.

Here we outline these rules, and then give some examples and tips.

Rule #1. Isolate dynamic SQL

Put all logic related to building of dynamic SQL into a separate function.
We usually define a separate scheme Dynamic, and define functions like Dynamic.GetSQL_XXX(params).
This makes it simple to perform code review.

Rule #2. Xml as parameters

Use xml type to pass parameters to a function that builds dynamic SQL.
In many cases dynamic SQL depends on variable number of parameters (like a list of values to check against).
Xml fits here to represent structured information.
On a client (e.g. in C# or java) you can define a class with all parameters, populate an instance and serialize it to an xml.

Rule #3. XQuery as template language

Use XQuery to define SQL template and to generate SQL tree from the input parameters.
Here is an example of such XQuery:

@data.query(' <sql> select T.* from Data.Ticket T where { for $ticketID in data/ticketID return <sql>(T.TicketID = <int>{$ticketID}</int>) and </sql> } (1 = 1) </sql>')

You can see that output is an xml with sql element to represent literal SQL, and int element to represent integer literal.

In fact whole output schema can be defined like this:

<xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="sql"/> <xs:element name="name"/> <xs:element name="string" nillable="true"/> <xs:element name="int" nillable="true"/> <xs:element name="decimal" nillable="true"/> <xs:element name="date" nillable="true"/> <xs:element name="time" nillable="true"/> <xs:element name="datetime" nillable="true"/> </xs:schema>

where sql is to represent literal content, name to represent a name, and other elements to represent different literal values.

Rule #4. Escape literals

Use function Dynamic.ToSQL(@template) to build final SQL text.
Here we quote the definition:

-- Builds a text of SQL function for an sql template. create function Dynamic.ToSQL ( -- SQL template. @template xml ) returns nvarchar(max) with returns null on null input as begin return ( select case when N.Node.exist('*[xs:boolean(@xsi:nil)]') = 1 then 'null' when N.Node.exist('self::int') = 1 then isnull(N.Node.value('xs:int(.)', 'nvarchar(max)'), '# int #') when N.Node.exist('self::string') = 1 then 'N''' + replace ( N.Node.value('.', 'nvarchar(max)'), '''', '''''' ) + '''' when N.Node.exist('self::name') = 1 then isnull ( quotename(N.Node.value('.', 'nvarchar(128)'), '['), '# name #' ) when N.Node.exist('self::datetime') = 1 then isnull ( 'convert(datetime2, ''' + N.Node.value('xs:dateTime(.)', 'nvarchar(128)') + ''', 126)', '# datetime #' ) when N.Node.exist('self::date') = 1 then isnull ( 'convert(date, ''' + N.Node.value('xs:date(.)', 'nvarchar(128)') + ''', 126)', '# date #' ) when N.Node.exist('self::time') = 1 then isnull ( 'convert(time, ''' + N.Node.value('xs:time(.)', 'nvarchar(128)') + ''', 114)', '# time #' ) when N.Node.exist('self::decimal') = 1 then isnull ( N.Node.value('xs:decimal(.)', 'nvarchar(128)'), '# decimal #' ) when N.Node.exist('self::*') = 1 then '# invalid template #' else N.Node.value('.', 'nvarchar(max)') end from @template.nodes('//sql/node()[not(self::sql)]') N(Node) for xml path(''), type ).value('.', 'nvarchar(max)'); end;

Now, we want to stress that this function plays an important role in prevention of the SQL injection, as it escapes literals from the SQL tree.

Rule #5 (optional). Collect data

Use SQL to collect additional data required to build dynamic SQL. Here is an example of how we get a Ticket by StatusID, while on input we receive a StatusName:

create function Dynamic.GetSQL_GetTicketByStatus(@data xml) returns nvarchar(max) as begin set @data = ( select @data, ( select T.StatusID from @data.nodes('/data/status') N(Node) inner join Metadata.Status T on T.StatusName = Node.value('.', 'nvarchar(128)') for xml auto, type, elements ) for xml path('') ); return Dynamic.ToSQL ( @data.query (' <sql> select T.* from Data.Ticket T where T.Status in ({ for $status in /T/StatusID return <sql><int>{$status}</int>,</sql> } null) </sql> ') ); end;

Notice code in red that collects some more data before calling XQuery.

Rule #6. Execute

The final step is to call dynamic SQL.
This is done like this:

-- build declare @sql nvarchar(max) = Dynamic.GetSQL_GetTicket(@data); -- execute execute sp_executesql @sql -- {, N'@parameter_name data_type [ OUT | OUTPUT ][ ,...n ]' } -- { , [ @param1 = ] 'value1' [ ,...n ] } with result sets ( ( TicketID int not null, CreatedAt datetime2 not null, Summary nvarchar(256) null, Status int, Severity int, DeadLineAt datetime2 null ) );

Notice that the use of dynamic SQL does not prevent static parameters.
Notice also that with result sets clause is used to specify output.

Example. Tickets system

Let's assume you're dealing with a tickets system (like Bugzilla), and you have a table Data.Ticket to describe tickets. Assume that DDL for this table is like this:

create table Data.Ticket ( TicketID bigint not null primary key, CreatedAt datetime2 not null, Summary nvarchar(128) null, Status int not null, UpdatedAt datetime2(7) not null )

Suppose you have to build C# code to search different tickets, where Entity Framework is used to access the database.
Search should be done by a range of CreatedAt, a range of UpdatedAt, Summary, or by different Status values. It should be possible to order results in different ways.

We start out solution from the C# and define classes for a request:

public enum Direction { Asc, Desc } public struct Order { public string Field { get; set; } public Direction Direction {get; set; } } public class DateRange { public DateTime? From { get; set; } // This property is to omit From element if value is null. // See rules for xml serialization. public bool FromSpecified { get { return From != null; } } public DateTime? To { get; set; } public bool ToSpecified { get { return To != null; } } } public class TicketsRequest { public DateRange CreatedAt { get; set; } public string Summary { get; set; } public DateRange UpdatedAt { get; set; } [XmlElement] public Order[] Order { get; set; } [XmlElement] public int[] Status { get; set; } }

Notice that we're going to use XmlSerializer to convert request to xml and then to pass parameter into EF's model. Here is utility method to perform such conversion:

public static string ToXmlString<T>(T value) { if (value == null) { return null; } var serializer = new XmlSerializer(typeof(T)); var builder = new StringBuilder(); var writer = XmlWriter.Create( builder, new XmlWriterSettings { OmitXmlDeclaration = true, Indent = false }); serializer.Serialize(writer, value); writer.Flush(); return builder.ToString(); }

Now we proceed to the database and define a procedure that runs the search:

-- Gets tickets. create procedure Data.GetTickets ( -- A query parameters. @params xml ) as begin set nocount on; -- This is for EF to guess type of result. if (1 = 0) begin select TicketID, CreatedAt, Summary, Status, UpdatedAt from Data.Ticket; end; declare @sql nvarchar(max) = Dynamic.GetSQL_GetTickets(@params); execute sp_executesql @sql with result sets ( ( TicketID int not null, CreatedAt datetime2 not null, Summary nvarchar(256) null, Status int, UpdatedAt datetime2 null ) ); end;

Switch back to C#, import the Data.GetTickets into the EF model, and create a search method:

public IEnumerable<Ticket> GetTickets(TicketsRequest request) { var model = new Model(); return model.GetTickets(ToXmlString(request)); }

The last ingredient is Dynamic.GetSQL_GetTickets() function.

create function Dynamic.GetSQL_GetTickets(@data xml) returns nvarchar(max) as begin return Dynamic.ToSQL ( @data.query(' <sql> select T.TicketID, T.CreatedAt, T.Summary, T.Status, T.UpdatedAt from Data.Ticket T where { for $range in */CreatedAt return ( for $date in $range/From return <sql> (T.CreatedAt >= <datetime>{$date}</datetime>) and </sql>, for $date in $range/To return <sql> (<datetime>{$date}</datetime> > T.CreatedAt) and </sql> ), for $range in */UpdatedAt return ( for $date in $range/From return <sql> (T.UpdatedAt >= <datetime>{$date}</datetime>) and </sql>, for $date in $range/To return <sql> (<datetime>{$date}</datetime> > T.UpdatedAt) and </sql> ), for $summary in */Summary return <sql> (T.Summary like <string>{$summary}</string>) and </sql>, if (*/Status) then <sql> T.Status in ({ for $status in */Status return <sql><int>{$status}</int>, </sql> } null) and </sql> else () } (1 = 1) order by { for $order in */Order [ Field = ("TicketID", "CreatedAt", "Summary", "UpdatedAt", "Status") ] return <sql> <name>{$order/Field}</name> {" desc"[$order[Direction = "Desc"]]}, </sql> } (select null) </sql> ') ); end;

SQL text from Dynamic.GetSQL_GetTickets()

Consider now SQL text produced by this function. For an input:

<TicketsRequest> <CreatedAt> <From>2014-01-01T00:00:00</From> </CreatedAt> <Summary>hello%</Summary> <Order> <Field>Status</Field> <Direction>Desc</Direction> </Order> <Status>1</Status> <Status>3</Status> </TicketsRequest>

the output is:

select T.TicketID, T.CreatedAt, T.Summary, T.Status, T.UpdatedAt from Data.Ticket T where (T.CreatedAt >= convert(datetime2, '2014-01-01T00:00:00', 126)) and (T.Summary like N'hello%') and T.Status in (1, 3, null) and (1 = 1) order by [Status] desc, (select null)

Though the text is not formatted as we would like, it's perfectly valid SQL.

Tips for building XQuery templates

What is called XQuery in SQL Server is in fact a very limited subset of XQuery 1.0. Microsoft clearly states this fact. What is trivial in XQuery is often impossible or ugly in XQuery of SQL Server.

Nevertheless XQuery in SQL Server works rather well as SQL template language. To make it most efficient, however, you should learn several tips.

Tip #1. Where clause

In template you might want to build a where clause:

<sql> select ... where { if (...) then <sql>...</sql> else () } </sql>

and it might happen that for a certain input a condition under where might collapse, and you will be left with where keyword without a real condition, which is wrong. A simple work around is to always add some true condition under ther where like this:

<sql> select ... where { if (...) then <sql>... and </sql> else () } (1 = 1) </sql>

Tip #2. "in" expression

If you want to generate "in" expression like this:

value in (item1, item2,...)

then you might find that it's much easier generate equivalent a code like this:

value in (item1, item2,..., null).

Here is a XQuery to generate such template:

value in ({ for $item in ... return <sql><int>{$item}</int>, </sql> } null) and

Tip #3. Order by

You can conclude an order by clause built from a data with a dummy expression like this:

order by { for $item in ... return <sql> <name>{$item/Field}</name> {" desc"[$item/Direction = "Desc"]}, </sql> } (select null)

Alternatively you can use first column from a clustered index.

Tip #4. Group by

In a group by clause we cannot introduce terminator expression as it was with order by, so a code is a less trivial:

{ let $items := ... return if ($items) then <sql> group by <name>{$items[1]}</name> { for $item in $items[position() > 1] return <sql>, <name>{$item}</name></sql> } </sql> else () }

In fact similar logic may work with order by.

Tip #5. Escape literals

It's crusial not to introduce SQL injection while building SQL. Thus use:

<int>{...}</int> - for literal int;
<decimal>{...}</decimal> - for literal decimal;
<string>{...}</string> - for literal string;
<datetime>{...}</datetime> - for literal datetime2;
<date>{...}</date> - for literal date;
<time>{...}</time> - for literal time;
<name>{...}</name> - for a name to quote.

Note that you can use xsi:nil, so <int xsi:nil="true"/> means null.

If you generate a field name from an input data then it worth to validate it against a list of available names.

Tip #6. Validate input.

It worth to define xml schema for an input xml, and to validate parameters against it.
This makes code more secure, and also adds a documentation.

Tip #7. Don't abuse dynamic SQL

There are not too many cases when you need a dynamic SQL. Usually SQL engine knows how to build a good execution plan. If your query contains optional conditions then you can write it a way that SQL Server can optimize, e.g.:

select * from T where ((@name is null) or (Name = @name)) and ((@date is null) or (Date = @date)) option(recompile)

Tuesday, 11 February 2014 09:48:07 UTC

Comments [0] -
SQL Server puzzle | Thinking aloud | Tips and tricks

Wednesday, 22 January 2014

Shared state

Consider how would you implement Style object in the HTML DOM?

These are some characteristics of that object:

It has a long list of properties, e.g. in IE 11 there are more than 300 properties over a style object.
Any specific instance usually have only several properties assigned.
Reads of properties are much more frequent than writes. In fact style often stays unchanged after initialization.
DOM contains many style instances (often thousands).
The number of distinct instances in terms of values of properties is moderate (usually dozens).

Here is how would we approached to such an object.

1. Styles are sparse objects, thus there is no point to implement plain class with all those properties, as it's wasteful.

We would rather use two techniques to keep style's state:

A dictionary of properties with their values;
An aggregation of objects, where all properies are grouped into families, each group is defined by a separate type, and a style's state is an aggregation of that groups.

A current style of an element is an aggregation of styles of ancestor element. It can either by dynamic or be fused into a single style instance.

2. Make style's state immutable, and share all these states among all style instances.

In this implementation property write turns into a state transition operation: state = set(state, property, value). Thus no state is modified but replaced with other state that corresponds to a required change.

If state is seen as a dictionary then API may look like this :

public class State<K, V> { // Gets shared dictionary for an input dictionary. public IDictionary<K, V> Get(IDictionary<K, V> dictionary); // Gets a shared dictionary for an input dictionary with key set to a value. public IDictionary<K, V> Set(IDictionary<K, V> dictionary, K key, V value); // Gets a shared dictionary for an input dictionary. public IDictionary<K, V> Remove(IDictionary<K, V> dictionary, K key); // Gets typed value. public T Get<T>(IDictionary<K, V> dictionary, K key) where T: V { V value; if ((dictionary == null) || !dictionary.TryGetValue(key, out value)) { return default(T); } return (T)value; } // Sets or removes a typed value. // dictionary can be null. // null returned if output dictionary would be empty. public IDictionary<K, V> Set<T>(IDictionary<K, V> dictionary, K key, T value) where T : V { return value == null ? Remove(dictionary, key) : Set(dictionary, key, (V)value); } }

States can be cached. Provided the cache keeps states in a weak way, no unsued state will be stored for a long time. We may use weak table of dictionary to dictionary WeakTable<Dictionary<K, V>, Dictionary<K, V>> as a storage for such a cache. All required API is described in the WeakTable and Hash Code of Dictionary posts.

3. Style can be implemented as a structure with shared state as a storage. Here is a scetch:

[Serializable] public struct Style { // All properties. public enum Property { Background, BorderColor, BorderStyle, Color, FontFamily, FontSize, // ... } public int? Background { get { return states.Get<int?>(state, Property.Background); } set { state = states.Set(state, Property.Background, value); } } public int? BorderColor { get { return states.Get<int?>(state, Property.BorderColor); } set { state = states.Set(state, Property.BorderColor, value); } } public string BorderStyle { get { return states.Get<string>(state, Property.BorderStyle); } set { state = states.Set(state, Property.BorderStyle, value); } } public int? Color { get { return states.Get<int?>(state, Property.Color); } set { state = states.Set(state, Property.Color, value); } } public string FontFamily { get { return states.Get<string>(state, Property.FontFamily); } set { state = states.Set(state, Property.FontFamily, value); } } public double? FontSize { get { return states.Get<double?>(state, Property.FontSize); } set { state = states.Set(state, Property.FontSize, value); } } // ... [OnDeserialized] private void OnDeserialized(StreamingContext context) { state = states.Get(state); } // A state. private IDictionary<Property, object> state; // A states cache. private static readonly State<Property, object> states = new State<Property, object>(); }

Note that:

default state is a null dictionary;
states are application wide shared.

The following link is our implementation of State<K, V> class: State.cs.

Here we have outlined the idea of shared state object, and how it can be applied to sparse mostly immutable objects. We used HTML style as an example of such an object. Shared state object may work in many other areas, but for it to shine its use case should fit to the task.

Wednesday, 22 January 2014 19:43:25 UTC

Comments [0] -
.NET | Thinking aloud | Tips and tricks

Monday, 13 January 2014

Hash Code of Dictionary

Dealing recently with some task (the same that inspired us to implement WeakTable), we were in a position to use a dictionary as a key in another dictionary.

What are the rules for the class to be used as key:

key should be immutable;
key should implement a GetHashCode() method;
key should implement a Equals() method.

The first requirement is usually implemented as a documentation contract like this:

As long as an object is used as a key in the Dictionary<TKey, TValue>, it must not change in any way that affects its hash value.

The third requirement about equals is trivially implemented as a method:

public bool Equals(IDictionary<K, V> x, IDictionary<K, V> y) { if (x == y) { return true; } if ((x == null) || (y == null) || (x.Count != y.Count)) { return false; } foreach(var entry in x) { V value; if (!y.TryGetValue(entry.Key, out value) || !valueComparer.Equals(entry.Value, value)) { return false; } } return true; }

But how would you implement hash code?

We argued like this.

1. Let's consider the dictionary as a sparse array of values with only populated items that correspond to key hash codes.

2. Hash code is constructed using some fair algorithm. E.g like that used in java to calculate string's hash code:

n-1 h(s) = SUM (s[i]*p^(n-1-i)) mod m, where m = 2^31 i=0

In our case:

n can be arbitrary large int value, so in fact it's 2^32;
items are enumerated in unknown order;
there is only limited set of items, so most s[i] are zeros.

As result we cannot use recurrent function to calculate a power p^k mod m. Fortunately one can build fast exponentiation arguing like this:

32/s - 1 p^k = p^ SUM 2^((s*i)*k[i]) mod m, where s some int: 1, 2, 4, 8, 16, or 32. i=0

Thus

32/s - 1 p^k = PRODUCT (p^(2^(s*i)))^k[i] mod m i=0

If s = 1 then k[i] is either 1 or 0 (a bit), and there is 32 different p^(2^i) mod m values, which can be precalculated.

On the other hand, if we select s = 8 we can write the formula as:

p^k = p^k[0] * (p^(2^8))^k[1] * (p^(2^16))^k[2] * (p^(2^24))^k[3] mod m

where k[i] is a 8 bit value (byte).

Precalculating all values p^n, (p^(2^8))^n, (p^(2^16))^n, (p^(2^24))^n for n in 0 to 255 we reach the formula with 4 multiplications and with 1024 precalculated values.

Here is the whole utility to calculate hash factors:

/// <summary> /// Hash utilities. /// </summary> public class Hash { /// <summary> /// Returns a P^value mod 2^31, where P is hash base. /// </summary> /// <param name="value">A value to get hash factor for.</param> /// <returns>A hash factor value.</returns> public static int GetHashFactor(int value) { return factors[(uint)value & 0xff] * factors[(((uint)value >> 8) & 0xff) | 0x100] * factors[(((uint)value >> 16) & 0xff) | 0x200] * factors[(((uint)value >> 24) & 0xff) | 0x300]; } /// <summary> /// Initializes hash factors. /// </summary> static Hash() { var values = new int[4 * 256]; var value = P; var current = 1; var i = 0; do { values[i++] = current; current *= value; } while(i < 256); value = current; current = 1; do { values[i++] = current; current *= value; } while(i < 512); value = current; current = 1; do { values[i++] = current; current *= value; } while(i < 768); value = current; current = 1; do { values[i++] = current; current *= value; } while(i < 1024); factors = values; } /// <summary> /// A base to calculate hash factors. /// </summary> public const int P = 1103515245; /// <summary> /// Hash factors. /// </summary> private static readonly int[] factors; }

With this API hash code for a dictionary is a trivial operation:

public int GetHashCode(IDictionary<K, V> dictionary) { if (dictionary == null) { return 0; } var result = 0; foreach(var entry in dictionary) { if ((entry.Key == null) || (entry.Value == null)) { continue; } result += Hash.GetHashFactor(keyComparer.GetHashCode(entry.Key)) * valueComparer.GetHashCode(entry.Value); } return result; }

And finally, here is a reference to a class DictionaryEqualityComparer<K, V>: IEqualityComparer<IDictionary<K, V>> that allows a dictionary to be a key in another dictionary.

Update

We have commited some tests, and have found that with suffiently "good" implementation of GetHashCode() of key or value we achieve results almost of the same quality, as the results of the algorithm we have outlined above with much simpler and straightforward algorithm like this:

public int GetHashCode(IDictionary<K, V> dictionary) { if (dictionary == null) { return 0; } var result = 0; foreach(var entry in dictionary) { if ((entry.Key == null) || (entry.Value == null)) { continue; } var k = entry.Key.GetHashCode(); var v = entry.Value.GetHashCode(); k = (k << 5) + k; v = (v << (k >> 3)) + v; result += k ^ v; //result += Hash.GetHashFactor(keyComparer.GetHashCode(entry.Key)) * // valueComparer.GetHashCode(entry.Value); } return result; }

It was worth to blog about this just to find out that we have outwitted ourselves, and finally to reach to a trivial hash code implementation for the dictionary.

Monday, 13 January 2014 20:33:31 UTC

Comments [0] -
.NET | Thinking aloud | Tips and tricks

Navigation