Monday, 10 March 2014 - Nesterovsky bros

Monday, 10 March 2014

In the article "Error handling in WCF based web applications" we've shown a custom error handler for RESTful service based on WCF. This time we shall do the same for Web API 2.1 service.

Web API 2.1 provides an elegant way to implementat custom error handlers/loggers, see the following article. Web API permits many error loggers followed by a single error handler for all uncaught exceptions. A default error handler knows to output an error both in XML and JSON formats depending on requested MIME type.

In our projects we use unique error reference IDs. This feature allows to an end-user to refer to any error that has happened during the application life time and pass such error ID to the technical support for further investigations. Thus, error details passed to the client-side contain an ErrorID field. An error logger generates ErrorID and passes it over to an error handler for serialization.

Let's look at our error handling implementation for a Web API application.

The first part is an implementation of IExceptionLogger interface. It assigns ErrorID and logs all errors:

/// Defines a global logger for unhandled exceptions. public class GlobalExceptionLogger : ExceptionLogger { /// Writes log record to the database synchronously. public override void Log(ExceptionLoggerContext context) { try { var request = context.Request; var exception = context.Exception; var id = LogError( request.RequestUri.ToString(), context.RequestContext == null ? null : context.RequestContext.Principal.Identity.Name, request.ToString(), exception.Message, exception.StackTrace); // associates retrieved error ID with the current exception exception.Data["NesterovskyBros:id"] = id; } catch { // logger shouldn't throw an exception!!! } } // in the real life this method may store all relevant info into a database. private long LogError( string address, string userid, string request, string message, string stackTrace) { ... } }

The second part is the implementation of IExceptionHandler:

/// Defines a global handler for unhandled exceptions. public class GlobalExceptionHandler : ExceptionHandler { /// This core method should implement custom error handling, if any. /// It determines how an exception will be serialized for client-side processing. public override void Handle(ExceptionHandlerContext context) { var requestContext = context.RequestContext; var config = requestContext.Configuration; context.Result = new ErrorResult( context.Exception, requestContext == null ? false : requestContext.IncludeErrorDetail, config.Services.GetContentNegotiator(), context.Request, config.Formatters); } /// An implementation of IHttpActionResult interface. private class ErrorResult : ExceptionResult { public ErrorResult( Exception exception, bool includeErrorDetail, IContentNegotiator negotiator, HttpRequestMessage request, IEnumerable<MediaTypeFormatter> formatters) : base(exception, includeErrorDetail, negotiator, request, formatters) { } /// Creates an HttpResponseMessage instance asynchronously. /// This method determines how a HttpResponseMessage content will look like. public override Task<HttpResponseMessage> ExecuteAsync(CancellationToken cancellationToken) { var content = new HttpError(Exception, IncludeErrorDetail); // define an additional content field with name "ErrorID" content.Add("ErrorID", Exception.Data["NesterovskyBros:id"] as long?); var result = ContentNegotiator.Negotiate(typeof(HttpError), Request, Formatters); var message = new HttpResponseMessage { RequestMessage = Request, StatusCode = result == null ? HttpStatusCode.NotAcceptable : HttpStatusCode.InternalServerError }; if (result != null) { try { // serializes the HttpError instance either to JSON or to XML // depend on requested by the client MIME type. message.Content = new ObjectContent<HttpError>( content, result.Formatter, result.MediaType); } catch { message.Dispose(); throw; } } return Task.FromResult(message); } } }

Last, but not least part of this solution is registration and configuration of the error logger/handler:

/// WebApi congiguation. public static class WebApiConfig { public static void Register(HttpConfiguration config) { ... // register the exception logger and handler config.Services.Add(typeof(IExceptionLogger), new GlobalExceptionLogger()); config.Services.Replace(typeof(IExceptionHandler), new GlobalExceptionHandler()); // set error detail policy according with value from Web.config var customErrors = (CustomErrorsSection)ConfigurationManager.GetSection("system.web/customErrors"); if (customErrors != null) { switch (customErrors.Mode) { case CustomErrorsMode.RemoteOnly: { config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.LocalOnly; break; } case CustomErrorsMode.On: { config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.Never; break; } case CustomErrorsMode.Off: { config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.Always; break; } default: { config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.Default; break; } } } } }

The client-side error handler remain almost untouched. The implementation details you may find in /Scripts/api/api.js and Scripts/controls/error.js files.

You may download the demo project here. Feel free to use this solution in your .NET projects.

Monday, 10 March 2014 10:56:24 UTC

Comments [0] -
.NET | ASP.NET | javascript | kendoui | Tips and tricks

Friday, 07 March 2014

LINQ extensions

From time to time we run into tasks that we would like to solve in LINQ style but unfortunately it either cannot be done or a solution is not efficient.

Note that by LINQ style we do not mean C# query expressions (we have a strong distaste for that syntax) but extension methods defined in System.Linq.Enumerable and other classes.

Here we quote several extension methods that are good for a general use:

1. Select with predicate. This is shorthand of items.Where(...).Select(...):

/// <summary> /// Projects each element of a sequence into a new form. /// </summary> /// <typeparam name="T">A type of elements of source sequence.</typeparam> /// <typeparam name="R">A type of elements of target sequence.</typeparam> /// <param name="source">A source sequence.</param> /// <param name="where">A predicate to filter elements.</param> /// <param name="selector">A result element selector.</param> /// <returns>A target sequence.</returns> public static IEnumerable<R> Select<T, R>( this IEnumerable<T> source, Func<T, bool> where, Func<T, R> selector) { return source.Where(where).Select(selector); }

2. Select with predicate with source element index passed both into the predicate and into the selector. This one you cannot trivially implement in LINQ:

/// <summary> /// Projects each element of a sequence into a new form. /// </summary> /// <typeparam name="T">A type of elements of source sequence.</typeparam> /// <typeparam name="R">A type of elements of target sequence.</typeparam> /// <param name="source">A source sequence.</param> /// <param name="where">A predicate to filter elements.</param> /// <param name="selector">A result element selector.</param> /// <returns>A target sequence.</returns> public static IEnumerable<R> Select<T, R>( this IEnumerable<T> source, Func<T, int, bool> where, Func<T, int, R> selector) { var index = 0; foreach(var value in source) { if (where(value, index)) { yield return selector(value, index); } ++index; } }

3. A function with output element as projection of a window of input elements. Such function can be used to get finite difference (operation opposite to a cumulative sum).

/// <summary> /// Projects a window of source elements in a source sequence into target sequence. /// Thus /// target[i] = /// selector(source[i], source[i - 1], ... source[i - window + 1]) /// </summary> /// <typeparam name="T">A type of elements of source sequence.</typeparam> /// <typeparam name="R">A type of elements of target sequence.</typeparam> /// <param name="source">A source sequence.</param> /// <param name="window">A size of window.</param> /// <param name="lookbehind"> /// Indicate whether to produce target if the number of source elements /// preceeding the current is less than the window size. /// </param> /// <param name="lookahead"> /// Indicate whether to produce target if the number of source elements /// following current is less than the window size. /// </param> /// <param name="selector"> /// A selector that derives target element. /// On input it receives: /// an array of source elements stored in round-robing fashon; /// an index of the first element; /// a number of elements in the array to count. /// </param> /// <returns>Returns a sequence of target elements.</returns> public static IEnumerable<R> Window<T, R>( this IEnumerable<T> source, int window, bool lookbehind, bool lookahead, Func<T[], int, int, R> selector) { var buffer = new T[window]; var index = 0; var count = 0; foreach(var value in source) { if (count < window) { buffer[count++] = value; if (lookbehind || (count == window)) { yield return selector(buffer, 0, count); } } else { buffer[index] = value; index = index + 1 == window ? 0 : index + 1; yield return selector(buffer, index, count); } } if (lookahead) { while(--count > 0) { index = index + 1 == window ? 0 : index + 1; yield return selector(buffer, index, count); } } }

This way a finite difference looks like this:

var diff = input.Window( 2, false, false, (buffer, index, count) => buffer[index ^ 1] - buffer[index]);

4. A specialization of Window method that returns a enumeration of windows:

/// <summary> /// Projects a window of source elements in a source sequence into a /// sequence of window arrays. /// </summary> /// <typeparam name="T">A type of elements of source sequence.</typeparam> /// <typeparam name="R">A type of elements of target sequence.</typeparam> /// <param name="source">A source sequence.</param> /// <param name="window">A size of window.</param> /// <param name="lookbehind"> /// Indicate whether to produce target if the number of source elements /// preceeding the current is less than the window size. /// </param> /// <param name="lookahead"> /// Indicate whether to produce target if the number of source elements /// following current is less than the window size. /// </param> /// <returns>Returns a sequence of windows.</returns> public static IEnumerable<T[]> Window<T>( this IEnumerable<T> source, int window, bool lookbehind, bool lookahead) { return source.Window( window, lookbehind, lookahead, (buffer, index, count) => { var result = new T[count]; for(var i = 0; i < count; ++i) { result[i] = buffer[index]; index = index + 1 == buffer.Length ? 0 : index + 1; } return result; }); }

Friday, 07 March 2014 14:11:46 UTC

Comments [3] -
.NET | Tips and tricks

Monday, 24 February 2014

Bug either in C# spec or in Microsoft's implementation

While writing an article Dealing with dynamic SQL in SQL Server we have crushed on a trivial C# code.

Consider a declaration:

public enum Direction { Asc, Desc } public struct Order { public string Field { get; set; } public Direction Direction { get; set; } } public class DateRange { public DateTime? From { get; set; } public DateTime? To { get; set; } } public class Request { public DateRange CreatedAt { get; set; } public string Summary { get; set; } [XmlElement] public int[] State { get; set; } public DateRange UpdatedAt { get; set; } [XmlElement] public Order[] Order { get; set; } }

and a code:

... var request = new Request { CreatedAt = { From = new DateTime(2014, 1, 1) }, Order = new[] { new Order { Field = "CreatedAt" } } }; ...

It was hard to believe that this simple variable declaration throws NullReferenceException.

In fact we could not realize what the reason is until we have decompiled that code and poked into IL.

It appears that the problem is with the line:

CreatedAt = { From = new DateTime(2014, 1, 1) },

It's implemented like this:

var range = request.CreatedAt; range.From = new DateTime(2014, 1, 1) ; range.To = null;

In other words it assumes the instance stored in property CreatedAt is not null, and it reinitializes all its properties.

In contrast the following works as expected (no exception is thrown):

CreatedAt = new { From = new DateTime(2014, 1, 1) },

as it creates a new instance and assigns it to a property.

We think that this is very error prone, so if it's how C# is assumed to be then it's the problem of the spec, otherwise it's the problem of Microsoft implementation of C# (at least in VS 2010 - 2013).

Monday, 24 February 2014 08:48:35 UTC

Comments [4] -

Tuesday, 11 February 2014

Dealing with dynamic SQL in SQL Server

These are initial positions for this writing:

SQL Server allows to execute dynamic SQL.
Dynamic SQL is useful and often unavoidable, e.g. when you have to filter or order data in a way that you cannot code efficiently in advance.
Dynamic SQL has proven to be a dangerous area, as with improper use it can open hole in a security.

In general nothing stops you from building and then excuting of SQL string. Our goal, however, is to define rules that make work with dynamic SQL is more managable and verifiable.

Here we outline these rules, and then give some examples and tips.

Rule #1. Isolate dynamic SQL

Put all logic related to building of dynamic SQL into a separate function.
We usually define a separate scheme Dynamic, and define functions like Dynamic.GetSQL_XXX(params).
This makes it simple to perform code review.

Rule #2. Xml as parameters

Use xml type to pass parameters to a function that builds dynamic SQL.
In many cases dynamic SQL depends on variable number of parameters (like a list of values to check against).
Xml fits here to represent structured information.
On a client (e.g. in C# or java) you can define a class with all parameters, populate an instance and serialize it to an xml.

Rule #3. XQuery as template language

Use XQuery to define SQL template and to generate SQL tree from the input parameters.
Here is an example of such XQuery:

@data.query(' <sql> select T.* from Data.Ticket T where { for $ticketID in data/ticketID return <sql>(T.TicketID = <int>{$ticketID}</int>) and </sql> } (1 = 1) </sql>')

You can see that output is an xml with sql element to represent literal SQL, and int element to represent integer literal.

In fact whole output schema can be defined like this:

<xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="sql"/> <xs:element name="name"/> <xs:element name="string" nillable="true"/> <xs:element name="int" nillable="true"/> <xs:element name="decimal" nillable="true"/> <xs:element name="date" nillable="true"/> <xs:element name="time" nillable="true"/> <xs:element name="datetime" nillable="true"/> </xs:schema>

where sql is to represent literal content, name to represent a name, and other elements to represent different literal values.

Rule #4. Escape literals

Use function Dynamic.ToSQL(@template) to build final SQL text.
Here we quote the definition:

-- Builds a text of SQL function for an sql template. create function Dynamic.ToSQL ( -- SQL template. @template xml ) returns nvarchar(max) with returns null on null input as begin return ( select case when N.Node.exist('*[xs:boolean(@xsi:nil)]') = 1 then 'null' when N.Node.exist('self::int') = 1 then isnull(N.Node.value('xs:int(.)', 'nvarchar(max)'), '# int #') when N.Node.exist('self::string') = 1 then 'N''' + replace ( N.Node.value('.', 'nvarchar(max)'), '''', '''''' ) + '''' when N.Node.exist('self::name') = 1 then isnull ( quotename(N.Node.value('.', 'nvarchar(128)'), '['), '# name #' ) when N.Node.exist('self::datetime') = 1 then isnull ( 'convert(datetime2, ''' + N.Node.value('xs:dateTime(.)', 'nvarchar(128)') + ''', 126)', '# datetime #' ) when N.Node.exist('self::date') = 1 then isnull ( 'convert(date, ''' + N.Node.value('xs:date(.)', 'nvarchar(128)') + ''', 126)', '# date #' ) when N.Node.exist('self::time') = 1 then isnull ( 'convert(time, ''' + N.Node.value('xs:time(.)', 'nvarchar(128)') + ''', 114)', '# time #' ) when N.Node.exist('self::decimal') = 1 then isnull ( N.Node.value('xs:decimal(.)', 'nvarchar(128)'), '# decimal #' ) when N.Node.exist('self::*') = 1 then '# invalid template #' else N.Node.value('.', 'nvarchar(max)') end from @template.nodes('//sql/node()[not(self::sql)]') N(Node) for xml path(''), type ).value('.', 'nvarchar(max)'); end;

Now, we want to stress that this function plays an important role in prevention of the SQL injection, as it escapes literals from the SQL tree.

Rule #5 (optional). Collect data

Use SQL to collect additional data required to build dynamic SQL. Here is an example of how we get a Ticket by StatusID, while on input we receive a StatusName:

create function Dynamic.GetSQL_GetTicketByStatus(@data xml) returns nvarchar(max) as begin set @data = ( select @data, ( select T.StatusID from @data.nodes('/data/status') N(Node) inner join Metadata.Status T on T.StatusName = Node.value('.', 'nvarchar(128)') for xml auto, type, elements ) for xml path('') ); return Dynamic.ToSQL ( @data.query (' <sql> select T.* from Data.Ticket T where T.Status in ({ for $status in /T/StatusID return <sql><int>{$status}</int>,</sql> } null) </sql> ') ); end;

Notice code in red that collects some more data before calling XQuery.

Rule #6. Execute

The final step is to call dynamic SQL.
This is done like this:

-- build declare @sql nvarchar(max) = Dynamic.GetSQL_GetTicket(@data); -- execute execute sp_executesql @sql -- {, N'@parameter_name data_type [ OUT | OUTPUT ][ ,...n ]' } -- { , [ @param1 = ] 'value1' [ ,...n ] } with result sets ( ( TicketID int not null, CreatedAt datetime2 not null, Summary nvarchar(256) null, Status int, Severity int, DeadLineAt datetime2 null ) );

Notice that the use of dynamic SQL does not prevent static parameters.
Notice also that with result sets clause is used to specify output.

Example. Tickets system

Let's assume you're dealing with a tickets system (like Bugzilla), and you have a table Data.Ticket to describe tickets. Assume that DDL for this table is like this:

create table Data.Ticket ( TicketID bigint not null primary key, CreatedAt datetime2 not null, Summary nvarchar(128) null, Status int not null, UpdatedAt datetime2(7) not null )

Suppose you have to build C# code to search different tickets, where Entity Framework is used to access the database.
Search should be done by a range of CreatedAt, a range of UpdatedAt, Summary, or by different Status values. It should be possible to order results in different ways.

We start out solution from the C# and define classes for a request:

public enum Direction { Asc, Desc } public struct Order { public string Field { get; set; } public Direction Direction {get; set; } } public class DateRange { public DateTime? From { get; set; } // This property is to omit From element if value is null. // See rules for xml serialization. public bool FromSpecified { get { return From != null; } } public DateTime? To { get; set; } public bool ToSpecified { get { return To != null; } } } public class TicketsRequest { public DateRange CreatedAt { get; set; } public string Summary { get; set; } public DateRange UpdatedAt { get; set; } [XmlElement] public Order[] Order { get; set; } [XmlElement] public int[] Status { get; set; } }

Notice that we're going to use XmlSerializer to convert request to xml and then to pass parameter into EF's model. Here is utility method to perform such conversion:

public static string ToXmlString<T>(T value) { if (value == null) { return null; } var serializer = new XmlSerializer(typeof(T)); var builder = new StringBuilder(); var writer = XmlWriter.Create( builder, new XmlWriterSettings { OmitXmlDeclaration = true, Indent = false }); serializer.Serialize(writer, value); writer.Flush(); return builder.ToString(); }

Now we proceed to the database and define a procedure that runs the search:

-- Gets tickets. create procedure Data.GetTickets ( -- A query parameters. @params xml ) as begin set nocount on; -- This is for EF to guess type of result. if (1 = 0) begin select TicketID, CreatedAt, Summary, Status, UpdatedAt from Data.Ticket; end; declare @sql nvarchar(max) = Dynamic.GetSQL_GetTickets(@params); execute sp_executesql @sql with result sets ( ( TicketID int not null, CreatedAt datetime2 not null, Summary nvarchar(256) null, Status int, UpdatedAt datetime2 null ) ); end;

Switch back to C#, import the Data.GetTickets into the EF model, and create a search method:

public IEnumerable<Ticket> GetTickets(TicketsRequest request) { var model = new Model(); return model.GetTickets(ToXmlString(request)); }

The last ingredient is Dynamic.GetSQL_GetTickets() function.

create function Dynamic.GetSQL_GetTickets(@data xml) returns nvarchar(max) as begin return Dynamic.ToSQL ( @data.query(' <sql> select T.TicketID, T.CreatedAt, T.Summary, T.Status, T.UpdatedAt from Data.Ticket T where { for $range in */CreatedAt return ( for $date in $range/From return <sql> (T.CreatedAt >= <datetime>{$date}</datetime>) and </sql>, for $date in $range/To return <sql> (<datetime>{$date}</datetime> > T.CreatedAt) and </sql> ), for $range in */UpdatedAt return ( for $date in $range/From return <sql> (T.UpdatedAt >= <datetime>{$date}</datetime>) and </sql>, for $date in $range/To return <sql> (<datetime>{$date}</datetime> > T.UpdatedAt) and </sql> ), for $summary in */Summary return <sql> (T.Summary like <string>{$summary}</string>) and </sql>, if (*/Status) then <sql> T.Status in ({ for $status in */Status return <sql><int>{$status}</int>, </sql> } null) and </sql> else () } (1 = 1) order by { for $order in */Order [ Field = ("TicketID", "CreatedAt", "Summary", "UpdatedAt", "Status") ] return <sql> <name>{$order/Field}</name> {" desc"[$order[Direction = "Desc"]]}, </sql> } (select null) </sql> ') ); end;

SQL text from Dynamic.GetSQL_GetTickets()

Consider now SQL text produced by this function. For an input:

<TicketsRequest> <CreatedAt> <From>2014-01-01T00:00:00</From> </CreatedAt> <Summary>hello%</Summary> <Order> <Field>Status</Field> <Direction>Desc</Direction> </Order> <Status>1</Status> <Status>3</Status> </TicketsRequest>

the output is:

select T.TicketID, T.CreatedAt, T.Summary, T.Status, T.UpdatedAt from Data.Ticket T where (T.CreatedAt >= convert(datetime2, '2014-01-01T00:00:00', 126)) and (T.Summary like N'hello%') and T.Status in (1, 3, null) and (1 = 1) order by [Status] desc, (select null)

Though the text is not formatted as we would like, it's perfectly valid SQL.

Tips for building XQuery templates

What is called XQuery in SQL Server is in fact a very limited subset of XQuery 1.0. Microsoft clearly states this fact. What is trivial in XQuery is often impossible or ugly in XQuery of SQL Server.

Nevertheless XQuery in SQL Server works rather well as SQL template language. To make it most efficient, however, you should learn several tips.

Tip #1. Where clause

In template you might want to build a where clause:

<sql> select ... where { if (...) then <sql>...</sql> else () } </sql>

and it might happen that for a certain input a condition under where might collapse, and you will be left with where keyword without a real condition, which is wrong. A simple work around is to always add some true condition under ther where like this:

<sql> select ... where { if (...) then <sql>... and </sql> else () } (1 = 1) </sql>

Tip #2. "in" expression

If you want to generate "in" expression like this:

value in (item1, item2,...)

then you might find that it's much easier generate equivalent a code like this:

value in (item1, item2,..., null).

Here is a XQuery to generate such template:

value in ({ for $item in ... return <sql><int>{$item}</int>, </sql> } null) and

Tip #3. Order by

You can conclude an order by clause built from a data with a dummy expression like this:

order by { for $item in ... return <sql> <name>{$item/Field}</name> {" desc"[$item/Direction = "Desc"]}, </sql> } (select null)

Alternatively you can use first column from a clustered index.

Tip #4. Group by

In a group by clause we cannot introduce terminator expression as it was with order by, so a code is a less trivial:

{ let $items := ... return if ($items) then <sql> group by <name>{$items[1]}</name> { for $item in $items[position() > 1] return <sql>, <name>{$item}</name></sql> } </sql> else () }

In fact similar logic may work with order by.

Tip #5. Escape literals

It's crusial not to introduce SQL injection while building SQL. Thus use:

<int>{...}</int> - for literal int;
<decimal>{...}</decimal> - for literal decimal;
<string>{...}</string> - for literal string;
<datetime>{...}</datetime> - for literal datetime2;
<date>{...}</date> - for literal date;
<time>{...}</time> - for literal time;
<name>{...}</name> - for a name to quote.

Note that you can use xsi:nil, so <int xsi:nil="true"/> means null.

If you generate a field name from an input data then it worth to validate it against a list of available names.

Tip #6. Validate input.

It worth to define xml schema for an input xml, and to validate parameters against it.
This makes code more secure, and also adds a documentation.

Tip #7. Don't abuse dynamic SQL

There are not too many cases when you need a dynamic SQL. Usually SQL engine knows how to build a good execution plan. If your query contains optional conditions then you can write it a way that SQL Server can optimize, e.g.:

select * from T where ((@name is null) or (Name = @name)) and ((@date is null) or (Date = @date)) option(recompile)

Tuesday, 11 February 2014 09:48:07 UTC

Comments [0] -
SQL Server puzzle | Thinking aloud | Tips and tricks

Wednesday, 22 January 2014

Shared state

Consider how would you implement Style object in the HTML DOM?

These are some characteristics of that object:

It has a long list of properties, e.g. in IE 11 there are more than 300 properties over a style object.
Any specific instance usually have only several properties assigned.
Reads of properties are much more frequent than writes. In fact style often stays unchanged after initialization.
DOM contains many style instances (often thousands).
The number of distinct instances in terms of values of properties is moderate (usually dozens).

Here is how would we approached to such an object.

1. Styles are sparse objects, thus there is no point to implement plain class with all those properties, as it's wasteful.

We would rather use two techniques to keep style's state:

A dictionary of properties with their values;
An aggregation of objects, where all properies are grouped into families, each group is defined by a separate type, and a style's state is an aggregation of that groups.

A current style of an element is an aggregation of styles of ancestor element. It can either by dynamic or be fused into a single style instance.

2. Make style's state immutable, and share all these states among all style instances.

In this implementation property write turns into a state transition operation: state = set(state, property, value). Thus no state is modified but replaced with other state that corresponds to a required change.

If state is seen as a dictionary then API may look like this :

public class State<K, V> { // Gets shared dictionary for an input dictionary. public IDictionary<K, V> Get(IDictionary<K, V> dictionary); // Gets a shared dictionary for an input dictionary with key set to a value. public IDictionary<K, V> Set(IDictionary<K, V> dictionary, K key, V value); // Gets a shared dictionary for an input dictionary. public IDictionary<K, V> Remove(IDictionary<K, V> dictionary, K key); // Gets typed value. public T Get<T>(IDictionary<K, V> dictionary, K key) where T: V { V value; if ((dictionary == null) || !dictionary.TryGetValue(key, out value)) { return default(T); } return (T)value; } // Sets or removes a typed value. // dictionary can be null. // null returned if output dictionary would be empty. public IDictionary<K, V> Set<T>(IDictionary<K, V> dictionary, K key, T value) where T : V { return value == null ? Remove(dictionary, key) : Set(dictionary, key, (V)value); } }

States can be cached. Provided the cache keeps states in a weak way, no unsued state will be stored for a long time. We may use weak table of dictionary to dictionary WeakTable<Dictionary<K, V>, Dictionary<K, V>> as a storage for such a cache. All required API is described in the WeakTable and Hash Code of Dictionary posts.

3. Style can be implemented as a structure with shared state as a storage. Here is a scetch:

[Serializable] public struct Style { // All properties. public enum Property { Background, BorderColor, BorderStyle, Color, FontFamily, FontSize, // ... } public int? Background { get { return states.Get<int?>(state, Property.Background); } set { state = states.Set(state, Property.Background, value); } } public int? BorderColor { get { return states.Get<int?>(state, Property.BorderColor); } set { state = states.Set(state, Property.BorderColor, value); } } public string BorderStyle { get { return states.Get<string>(state, Property.BorderStyle); } set { state = states.Set(state, Property.BorderStyle, value); } } public int? Color { get { return states.Get<int?>(state, Property.Color); } set { state = states.Set(state, Property.Color, value); } } public string FontFamily { get { return states.Get<string>(state, Property.FontFamily); } set { state = states.Set(state, Property.FontFamily, value); } } public double? FontSize { get { return states.Get<double?>(state, Property.FontSize); } set { state = states.Set(state, Property.FontSize, value); } } // ... [OnDeserialized] private void OnDeserialized(StreamingContext context) { state = states.Get(state); } // A state. private IDictionary<Property, object> state; // A states cache. private static readonly State<Property, object> states = new State<Property, object>(); }

Note that:

default state is a null dictionary;
states are application wide shared.

The following link is our implementation of State<K, V> class: State.cs.

Here we have outlined the idea of shared state object, and how it can be applied to sparse mostly immutable objects. We used HTML style as an example of such an object. Shared state object may work in many other areas, but for it to shine its use case should fit to the task.

Wednesday, 22 January 2014 19:43:25 UTC

Comments [0] -
.NET | Thinking aloud | Tips and tricks

Monday, 13 January 2014

Hash Code of Dictionary

Dealing recently with some task (the same that inspired us to implement WeakTable), we were in a position to use a dictionary as a key in another dictionary.

What are the rules for the class to be used as key:

key should be immutable;
key should implement a GetHashCode() method;
key should implement a Equals() method.

The first requirement is usually implemented as a documentation contract like this:

As long as an object is used as a key in the Dictionary<TKey, TValue>, it must not change in any way that affects its hash value.

The third requirement about equals is trivially implemented as a method:

public bool Equals(IDictionary<K, V> x, IDictionary<K, V> y) { if (x == y) { return true; } if ((x == null) || (y == null) || (x.Count != y.Count)) { return false; } foreach(var entry in x) { V value; if (!y.TryGetValue(entry.Key, out value) || !valueComparer.Equals(entry.Value, value)) { return false; } } return true; }

But how would you implement hash code?

We argued like this.

1. Let's consider the dictionary as a sparse array of values with only populated items that correspond to key hash codes.

2. Hash code is constructed using some fair algorithm. E.g like that used in java to calculate string's hash code:

n-1 h(s) = SUM (s[i]*p^(n-1-i)) mod m, where m = 2^31 i=0

In our case:

n can be arbitrary large int value, so in fact it's 2^32;
items are enumerated in unknown order;
there is only limited set of items, so most s[i] are zeros.

As result we cannot use recurrent function to calculate a power p^k mod m. Fortunately one can build fast exponentiation arguing like this:

32/s - 1 p^k = p^ SUM 2^((s*i)*k[i]) mod m, where s some int: 1, 2, 4, 8, 16, or 32. i=0

Thus

32/s - 1 p^k = PRODUCT (p^(2^(s*i)))^k[i] mod m i=0

If s = 1 then k[i] is either 1 or 0 (a bit), and there is 32 different p^(2^i) mod m values, which can be precalculated.

On the other hand, if we select s = 8 we can write the formula as:

p^k = p^k[0] * (p^(2^8))^k[1] * (p^(2^16))^k[2] * (p^(2^24))^k[3] mod m

where k[i] is a 8 bit value (byte).

Precalculating all values p^n, (p^(2^8))^n, (p^(2^16))^n, (p^(2^24))^n for n in 0 to 255 we reach the formula with 4 multiplications and with 1024 precalculated values.

Here is the whole utility to calculate hash factors:

/// <summary> /// Hash utilities. /// </summary> public class Hash { /// <summary> /// Returns a P^value mod 2^31, where P is hash base. /// </summary> /// <param name="value">A value to get hash factor for.</param> /// <returns>A hash factor value.</returns> public static int GetHashFactor(int value) { return factors[(uint)value & 0xff] * factors[(((uint)value >> 8) & 0xff) | 0x100] * factors[(((uint)value >> 16) & 0xff) | 0x200] * factors[(((uint)value >> 24) & 0xff) | 0x300]; } /// <summary> /// Initializes hash factors. /// </summary> static Hash() { var values = new int[4 * 256]; var value = P; var current = 1; var i = 0; do { values[i++] = current; current *= value; } while(i < 256); value = current; current = 1; do { values[i++] = current; current *= value; } while(i < 512); value = current; current = 1; do { values[i++] = current; current *= value; } while(i < 768); value = current; current = 1; do { values[i++] = current; current *= value; } while(i < 1024); factors = values; } /// <summary> /// A base to calculate hash factors. /// </summary> public const int P = 1103515245; /// <summary> /// Hash factors. /// </summary> private static readonly int[] factors; }

With this API hash code for a dictionary is a trivial operation:

public int GetHashCode(IDictionary<K, V> dictionary) { if (dictionary == null) { return 0; } var result = 0; foreach(var entry in dictionary) { if ((entry.Key == null) || (entry.Value == null)) { continue; } result += Hash.GetHashFactor(keyComparer.GetHashCode(entry.Key)) * valueComparer.GetHashCode(entry.Value); } return result; }

And finally, here is a reference to a class DictionaryEqualityComparer<K, V>: IEqualityComparer<IDictionary<K, V>> that allows a dictionary to be a key in another dictionary.

Update

We have commited some tests, and have found that with suffiently "good" implementation of GetHashCode() of key or value we achieve results almost of the same quality, as the results of the algorithm we have outlined above with much simpler and straightforward algorithm like this:

public int GetHashCode(IDictionary<K, V> dictionary) { if (dictionary == null) { return 0; } var result = 0; foreach(var entry in dictionary) { if ((entry.Key == null) || (entry.Value == null)) { continue; } var k = entry.Key.GetHashCode(); var v = entry.Value.GetHashCode(); k = (k << 5) + k; v = (v << (k >> 3)) + v; result += k ^ v; //result += Hash.GetHashFactor(keyComparer.GetHashCode(entry.Key)) * // valueComparer.GetHashCode(entry.Value); } return result; }

It was worth to blog about this just to find out that we have outwitted ourselves, and finally to reach to a trivial hash code implementation for the dictionary.

Monday, 13 January 2014 20:33:31 UTC

Comments [0] -
.NET | Thinking aloud | Tips and tricks

Wednesday, 08 January 2014

WeakTable

Dealing recently with some task, we were in a position to use a weak dictionary in the .NET. Instinctively we assumed that it should exist somewhere in the standard library. We definitely knew that there is a WeakReference class to for a single instance. We also knew that there is WeakHashMap in java, and that it's based on java's WeakReference.

So, we were surprised to find that there is no such thing out of the box in .NET.

We have found that java's and .NET's weak references are different. In java weak references whose targets are GCed can be automatically put into a queue, which can be used to build clean up logic to remove dead keys from weak hash map. There is nothing similar in .NET, where weak reference just silently loses it's value.

Internet is full with custom implementations of weak dictionaries in .NET.

.NET 4.5 finally defines a class ConditionalWeakTable<TKey, TValue>, which solves the problem in case when you need to match keys by instance identity.

Unfortunately in our case we needed to match keys using key's GetHashCode() and Equals(). So, ConditionalWeakTable<TKey, TValue> did not directly work, but then we found a way to make it work for us.

Here is a quote from the definition:

A ConditionalWeakTable<TKey, TValue> object is a dictionary that binds a managed object, which is represented by a key, to its attached property, which is represented by a value. The object's keys are the individual instances of the TKey class to which the property is attached, and its values are the property values that are assigned to the corresponding objects.

...in the ConditionalWeakTable<TKey, TValue> class, adding a key/value pair to the table does not ensure that the key will persist, even if it can be reached directly from a value stored in the table... Instead, ConditionalWeakTable<TKey, TValue> automatically removes the key/value entry as soon as no other references to a key exist outside the table.

This property of ConditionalWeakTable<TKey, TValue> has helped us to build a way to get a notification when the key is being finalized, which is the missed ingredient in .NET's weak references.

Assume you have an instance key of type Key. To get a notification you should define a class Finalizer that will call some handler when it's finalized, and you should bind key and a finalizer instance using weak table.

The code looks like this:

public class Finalizer<K> where K: class { public static void Bind(K key, Action<K> handler) { var finalizer = table.GetValue(key, k => new Finalizer<K> { key = k }); finalizer.Handler += handler; } public static void Unbind(K key, Action<K> handler) { Finalizer finalizer; if (table.TryGetValue(key, out finalizer)) { finalizer.Handler -= handler; } } ~Finalizer() { var handler = Handler; if (handler != null) { handler(key); } } private event Action<K> Handler; private K key; private static readonly ConditionalWeakTable<K, Finalizer> table = new ConditionalWeakTable<K, Finalizer>(); } Key key = ... Finalizer.Bind(key, k => { /* clean up. */ });

Using this approach we have created a class WeakTable<K, V> modeled after ConditionalWeakTable<TKey, TValue>.

So, this is our take in the problem: WeakTable.cs.

Wednesday, 08 January 2014 21:57:16 UTC

Comments [0] -
.NET | Java | Thinking aloud | Tips and tricks

Friday, 27 December 2013

Bidi Visual Converter

Oftentimes we deal with Hebrew in .NET.

The task we face again and again is attempt to convert a Hebrew text from visual to logical representation.

The latest demand of such task was when we processed content extracted from PDF. It's turned out that PDF stores content as graphic primitives, and as result text is stored visually (often each letter is kept separately).

We solved the task more than a decade ago, by calling Uniscribe API.

The function by itself is a small wrapper around that API, so in .NET 1.0 we were using managed C++, several years later we have switched to C++/CLI.

But now after many .NET releases, and with 32/64 versions we can see that C++ is only a guest in .NET world.

To run C++ in .NET you have to install VC runtime libraries adjusted to a specific .NET version. This turns C++ support in .NET into not a trivial task.

So, we have finally decided to define C# interop for the Uniscribe API, and recreate that function in pure C#:

namespace NesterovskyBros.Bidi { /// <summary> /// An utility to convert visual string to logical. /// <summary> public static class BidiConverter { /// <summary> /// Converts visual string to logical. /// </summary> /// <param name="value">A value to convert.</param> /// <param name="rtl">A base direction.</param> /// <param name="direction"> /// true for visual to logical, and false for logical to visual. /// </param> /// <returns>Converted string.</returns> public static string Convert(string value, bool rtl, bool direction);

You can download this project from BidiVisualConverter.zip.

Friday, 27 December 2013 20:36:20 UTC

Comments [0] -
.NET | Tips and tricks

Monday, 11 November 2013

Library resources in JSF

Before to start we have to confess that afer many years of experience we sincerely dislike JSF technology, as we think it's outdated compared to html 5 + REST.

We have a JSF 2.2 application, which is configured to track session through url. In this case Session ID is stored in url and not in cookies, as there may be many sessions opened per a client.

At the same time application uses libraries that expose scripts and css resources. This resources are referred to like this:

<link rel="stylesheet" type="text/css" jsfc="h:outputStylesheet" library="css" name="library-name.css"/> <script type="text/javascript" jsfc="h:outputScript" name="library-name.js" library="scripts" target="head"></script>

At runtime this is rendered as:

<link type="text/css" rel="stylesheet" href="/App/javax.faces.resource/library-name.css.jsf;jsessionid=FC4A893330CCE12E8E20DFAFC73CDF35?ln=css" /> <script type="text/javascript" src="/App/javax.faces.resource/library-name.js.jsf;jsessionid=FC4A893330CCE12E8E20DFAFC73CDF35?ln=scripts"></script>

You can see that Session ID is a part of url path, which prevents resource caching on a client.

It's not clear whether it's what JSF spec dictates or it's Oracle's Reference Implementation detail. We're certain, however, that it's too wasteful in heavy loaded environment, so we have tried to resolve the problem.

From JSF sources we have found that h:outputStylesheet, h:outputScript, and h:outputLink all use ExternalContext.encodeResourceURL() method to build markup url.

So, here is a solution: to provide custom wrapper for the ExternalContext.

This is done in two steps:

create a factory class;
register a factory in faces-config.xml;

1. Factory is a simple class but unfortunately it's implementation specific:

package com.nesterovskyBros.jsf; import javax.faces.FacesException; import javax.faces.context.ExternalContext; import javax.faces.context.ExternalContextWrapper; import com.sun.faces.context.ExternalContextFactoryImpl; /** * {@link ExternalContextFactory} to prevent session id in resource urls. */ public class ExternalContextFactory extends ExternalContextFactoryImpl { /** * {@inheritDoc} */ @Override public ExternalContext getExternalContext( Object context, Object request, Object response) throws FacesException { final ExternalContext externalContext = super.getExternalContext(context, request, response); return new ExternalContextWrapper() { @Override public ExternalContext getWrapped() { return externalContext; } @Override public String encodeResourceURL(String url) { return shouldEncode(url) ? super.encodeResourceURL(url) : url; } private boolean shouldEncode(String url) { // Decide here whether you want to encode url. // E.g. in case of h:outputLink you may want to have session id in url, // so your decision is based on some marker (like &session=1) in url. return false; } }; } }

2. Registration is just three lines in faces-config.xml:

<factory> <external-context-factory>com.nesterovskyBros.jsf.ExternalContextFactory</external-context-factory> </factory>

After that change at runtime we have:

<link type="text/css" rel="stylesheet" href="/App/javax.faces.resource/library-name.css.jsf?ln=css" /> <script type="text/javascript" src="/App/javax.faces.resource/library-name.js.jsf?ln=scripts"></script>

Monday, 11 November 2013 13:08:53 UTC

Comments [0] -
Java | JSF and Facelets | Tips and tricks

Monday, 04 November 2013

Windows: so far so worse

A day ago we had installed on our laptops Win 8.1. We thought it will be better than Win 8, but now we afraid of next release of Windows... So far, so worse.

ICS became unusable: in Windows 7 we've shared Internet connection from USB 3G dongle between our two computers, in Windows 8 we've succeeded to share only file system access, but not Internet. In Windows 8.1 neither Internet nor file system are accessible...

Monday, 04 November 2013 11:20:28 UTC

Comments [0] -
Thinking aloud

Monday, 14 October 2013

String comparision: SQL vs COBOL

Till recently we were living in simple world of string comparisons in SQL style, and now everything has changed.

From the university years we knew that strings in SQL are compared by first trimming traling spaces, and then comparing in C style.

Well, the picture was a little more complex, as collations were involved (national, case sensivity), and as different SQL vendors implemented it differently.

Next,
we're dealing with programs converted from COBOL, which we originally thought follow SQL rules when strings are compared.

Here is where the problem has started.

Once we have found that java program has branched differently than original COBOL, and the reason was that the COBOL and java compared two strings differently:

COBOL: "A\n" < "A";
Java: "A\n" > "A"

We have looked into COBOL Language Reference and found the rules:

Operands of equal size

Characters in corresponding positions of the two operands are compared, beginning with the leftmost character and continuing through the rightmost character.

If all pairs of characters through the last pair test as equal, the operands are considered as equal.

If a pair of unequal characters is encountered, the characters are tested to determine their relative positions in the collating sequence. The operand that contains the character higher in the sequence is considered the greater operand.

Operands of unequal size

If the operands are of unequal size, the comparison is made as though the shorter operand were extended to the right with enough spaces to make the operands equal in size.

You can see that strings must not be trimmed but padded with spaces to the longer string, and only then they are compared. This subtle difference has significant impact for characters below the space.

So, here we've found that COBOL and SQL comparisons are different.

But then we have questioned how really SQL beheaves?

We've tested comparisons in SQL Server and DB2, and have seen that our understanding of SQL comparison holds. It works as if trimming spaces, and then comparing.

But again we have looked into SQL-92 definition, and that's what we see there:

8.2 <comparison predicate>
3) The comparison of two character strings is determined as follows:

a) If the length in characters of X is not equal to the length in characters of Y, then the shorter string is effectively replaced, for the purposes of comparison, with a copy of itself that has been extended to the length of the longer string by concatenation on the right of one or more pad characters, where the pad character is chosen based on CS. If CS has the NO PAD attribute, then the pad character is an implementation-dependent character different from any character in the character set of X and Y that collates less than any string under CS. Otherwise, the pad character is a <space>.

So, what we see is that SQL-92 rules are very close to COBOL rules, but then we reach the question: how come that at least SQL Server and DB2 implement string comparison differently than SQL-92 dictates?

Update: we have found that both SQL Server and DB2 have their string collation defined in a way that <space> is less than any other character. So the following is always true: '[' + char(13) + ']' > '[ ]'.

Monday, 14 October 2013 20:23:11 UTC

Comments [0] -
Java | SQL Server puzzle | Thinking aloud | Tips and tricks

Friday, 09 August 2013

KendoUI's guid function

Earlier we have written a post KendoUI's slowest function and now, we want to point to the next slow function, which is kendo.guid(). It's used to assign uid to each observable object, and also in couple of other places.

Here is its source:

guid: function() { var id = "", i, random; for (i = 0; i < 32; i++) { random = math.random() * 16 | 0; if (i == 8 || i == 12 || i == 16 || i == 20) { id += "-"; } id += (i == 12 ? 4 : (i == 16 ? (random & 3 | 8) : random)).toString(16); } return id; }

KendoUI people have decided to define uid as a string in format of Globally unique identifier. We think there is no reason to have such a complex value; it's enough to have counter to generate uid values. As KendoUI relies on the string type of uid, so we have defined a patch like this:

var guid = 0 kendo.guid = function() { return ++guid + "" }

Consider now a test case. It's almost identical to that in previous post:

<!DOCTYPE html> <html> <head> <title>Test</title> <script src="scripts/jquery/jquery.min..js"></script> <script src="scripts/kendo/kendo.web.min.js"></script> <link href="styles/kendo.common.min.css" rel="stylesheet" /> <link href="styles/kendo.default.min.css" rel="stylesheet" /> <link href="styles/style.css" rel="stylesheet" /> <script> var model; function init() { var source = []; for(var i = 0; i < 1000; ++i) { source.push({ text: "value " + i, value: "" + i }); } model = kendo.observable( { value: "1", source: new kendo.data.DataSource( { data: source }) }); model.source.read(); } function patch() { var base = kendo.data.binders.widget.source.fn._ns; var result; var guid = 0; kendo.guid = function() { return ++guid + ""; }; kendo.data.binders.widget.source.fn._ns = function(ns) { return ns ? base.call(this, ns) : (result || (result = base.call(this, ns))); } } function test() { init(); kendo.bind("#view", model); } patch(); </script> </head> <body> <p> <button onclick="test()">Click to start test</button> </p> <p id="view"> Select: <input data-role="dropdownlist" data-bind="value: value, source: source" data-text-field="text" data-value-field="value"/> </p> </body> </html>

Now, we can compare performance with and without that patch.

Here is a run statistics without patch:

Level	Function	Count	Inclusive time (ms)	Inclusive time %	Avg time (ms)
1	onclick	1	270.73	100	270.73
1.1	test	1	269.73	99.63	269.73
1.1.1	init	1	117.07	43.24	117.07
1.1.1.1	guid	1,001	72.05	26.61	0.07
1.1.2	bind	1	152.65	56.39	152.65

and with patch:

Level	Function	Count	Inclusive time (ms)	Inclusive time %	Avg time (ms)
1	onclick	1	172.64	100	172.64
1.1	test	1	171.65	99.42	171.65
1.1.1	init	1	62.04	35.94	62.04
1.1.1.1	guid	1,001	1	0.58	0
1.1.2	bind	1	109.6	63.49	109.6

Note that statistics were collected for IE 10.
An example can be found at slow2.html.

Friday, 09 August 2013 14:24:16 UTC

Comments [0] -
javascript | kendoui | Tips and tricks

Tuesday, 30 July 2013

KendoUI's slowest function

What is the slowest function in kendo? Or better, which function has most negative performance impact in kendo.

Recently, we were dealing with a simple page, which was too slow, as data binding took more than second.

The page contained a dropdown list, with ~1000 options. To understand the reason we have run this page under the IE's built-in javascript profiler, and ...

there, we have found that almost half of the time is taken by a function (call it X), which receives nothing and returns always the same result!

But, let's now see a minimal example that demostrates the problem:

<!DOCTYPE html> <html> <head> <title>Test</title> <script src="scripts/jquery/jquery.js"></script> <script src="scripts/kendo/kendo.web.min.js"></script> <link href="styles/kendo.common.min.css" rel="stylesheet" /> <link href="styles/kendo.default.min.css" rel="stylesheet" /> <link href="styles/style.css" rel="stylesheet" /> <script> var model; function init() { var source = []; for(var i = 0; i < 1000; ++i) { source.push({ text: "value " + i, value: "" + i }); } model = kendo.observable( { value: "1", source: new kendo.data.DataSource( { data: source }) }); model.source.read(); } function test() { kendo.bind("#view", model); } init(); </script> </head> <body> <p> <button onclick="test()">Click to start test</button> </p> <p id="view"> Select: <input data-role="dropdownlist" data-bind="value: value, source: source" data-text-field="text" data-value-field="value"/> </p> </body> </html>

There are two parts: initialization, and a test itself that starts upon button click.

In the initialization part we have defined a model, containing a datasource.

The test part performs data binding.

Now, here is a run statistics:

Function	Count	Inclusive time (ms)	Inclusive time %	Avg time (ms)
test	1	456.05	100	456.05
X	1,000	200.14	43.89	0.2

So, X is fast by itself, but it run 1000 times, and took about 44% of all time.

And now, to the function. It's kendo.data.binders.widget.source.fn._ns.

Here is its code:

_ns: function(ns) { ns = ns || kendo.ui; var all = [ kendo.ui, kendo.dataviz.ui, kendo.mobile.ui ]; all.splice($.inArray(ns, all), 1); all.unshift(ns); return kendo.rolesFromNamespaces(all); }

We can see that on each call it receives a parameter undefined, and always returns an array with the same content. Not sure why Kendo UI team made it so complex, but one can easily device a simple patch that optimizes this code path.

function patch() { var base = kendo.data.binders.widget.source.fn._ns; var result; kendo.data.binders.widget.source.fn._ns = function(ns) { return ns ? base.call(this, ns) : (result || (result = base.call(this, ns))); } }

With this patch applied, the profiler shows:

Function	Count	Inclusive time (ms)	Inclusive time %	Avg time (ms)
test	1	253.03	100	253.03
_ns	1,000	6	2.37	0.01

Execution time dropped considerably, and _ns() loses its title of most time consuming function!

An example can be found at slow.html.

Tuesday, 30 July 2013 20:47:11 UTC

Comments [0] -
javascript | kendoui

Sunday, 28 July 2013

Visual to logical conversion in Java

Time after time we run into the same problem on different platforms, with different languages. The problem's name is "Visual to Logical conversion for right-to-left or bidirectional text". The problem is usually due to legacy code, which stores texts in visual order from left to right. In case of English it's ok, but with Hebrew this means that texts are partially reversed.

It worth to note that we've solved the same task with Windows API for native and .NET applications more than 10 years ago.

On the other hand, for Java, we yet didn't see any acceptable standalone solution. To remedy this omission, we publish here our solution to this problem.

package com.nesterovskyBros.text;

import java.text.Bidi;

/**
 * Utility that uses {@link Bidi} class. 
 */
public class BidiUtils
{
  /**
   * Implements visual to logical order converter.
   * 
   * @author <a href="http://www.nesterovsky-bros.com">Nesterovsky bros</a>
   *
   * @param text an input text in visual order to convert.
   * @return a String value in logical order.
   */
  public static String visualToLogical(String text)
  {
    if ((text == null) || (text.length() == 0))
    {
        return text;
    }
  
    Bidi bidi = new Bidi(text, Bidi.DIRECTION_DEFAULT_LEFT_TO_RIGHT);
  
    if (bidi.isLeftToRight())
    {
        return text;
    }
  
    int count = bidi.getRunCount();
    byte[] levels = new byte[count];
    Integer[] runs = new Integer[count];
  
    for (int i = 0; i < count; i++)
    {
       levels[i] = (byte)bidi.getRunLevel(i);
       runs[i] = i;
    }
  
    Bidi.reorderVisually(levels, 0, runs, 0, count);

    StringBuilder result = new StringBuilder();

    for (int i = 0; i < count; i++)
    {
       int index = runs[i];
       int start = bidi.getRunStart(index);
       int end = bidi.getRunLimit(index);
       int level = levels[index];
  
       if ((level & 1) != 0)
       {
         for (; --end >= start;)
         {
           result.append(text.charAt(end));
         }
       }
       else
       {
         result.append(text, start, end);
       }
    }

    return result.toString();
  }
}

This method utilizes embeded Bidi's algorithm, see class java.text.Bidi.

Be aware that there is no perfect algorithm that covers all possible cases, since BIDI was written for an opposite task, but our implementation based on Bidi.reorderVisually is usually acceptable.

Here is an JUnit test for this method:

package com.nesterovskyBors.text;

import static org.junit.Assert.*;

import org.junit.Test;

import com.nesterovskyBros.text.BidiUtils;

public class BidiUtilsTests
{

  @Test
  public void testsVisualToLogical()
  {
    String text = "123 יתימאה ןחבמ";
    String actual = BidiUtils.visualToLogical(text);
    String expected = "מבחן האמיתי 123";
  
    assertEquals(expected, actual);
    
    text = "תירבע English תירבע בוש";
    actual = BidiUtils.visualToLogical(text);
    expected = "שוב עברית English עברית";
    
    assertEquals(expected, actual);
  }
}

Sunday, 28 July 2013 15:44:17 UTC

Comments [0] -
Java | Tips and tricks

Sunday, 14 July 2013

On Garbage Collectors

Recently we've seen an article Why mobile web apps are slow.

While title mentions web apps, the the criticism is directed purely to javascript language. The answer presented there is twofold:

Raw javascript performance is ~5 times slower than performance of native code solving the same task.
This accounts for the most modern implementations that exploit JIT. Author does not expect that this proportion will be significatly changed in javascript's favor in the near future.
Garbage Collection, which is essential part of javascript, does not work well in constrainted environment of mobile devices.
Here author quotes many references that show that:
- for GC to work on peer with non-GC application, it needs to have ~6 - 8 times size of memory than an application needs;
- at the same time for hardware reasons, mobile devices cannot provide such amount of memory;
- on the other hand with rise of CPU performance, GC pressure rises even faster.

In the end author, while saying about some attempts to change the state, has no final verdict, whether there can be anything done to remedy the problem.

Having roots in C++, we're GC unbelievers. But you know, who will ask your opinion on that, while there are all those modern languages that try to abstract from memory and implicitly or explicitly assume GC: java, C#, javascript, xslt, xquery, and so on.

There always is a solution to avoid GC completely, like C++ and other (e.g. Microsoft's C++/CX, or Apple's ARC) do. But, assuming you're entered GC world, what can be done with it? How do you make it more predictable, less greedy, and probably more fast?

Our arguments are like this.

How does native code manage object graphs?

Today's solution is reference counting along with weak references to break cycles in graph.

Can be GC based on this?

Yes.

In fact upcoming spec of javascript contains weak references. So, provided a developer accurately defines relations in an object graph, one may immediately achieve the same efficiency as native solution.

If one does not use weak references consistently then object cycles can be created, so there can appear a graph that is not referenced anywhere in a program. This graph can be collected with classical GC that scans object roots.

Classical GC part can be used as a debug mode leak detector, and will collect graph cycles at runtime.

Thus, we claim that a hybrid memory management: reference counting with weak references plus classical GC is possible; it will be equal to a native memory management when weak references are properly used, and it will be close to classical GC without use of weak references.

This solution gives a rule to mitigate GC drawbacks: just use weak references in appropriate points, and you can continue to live in GC world, where GC is only a fallback.

Sunday, 14 July 2013 12:20:29 UTC

Comments [0] -
javascript | Thinking aloud

Navigation