In the article "Error handling in WCF based web applications"
we've shown a custom error handler for RESTful service
based on WCF. This time we shall do the same for Web API 2.1 service.
Web API 2.1 provides an elegant way to implementat custom error handlers/loggers, see
the following article. Web API permits many error loggers followed by a
single error handler for all uncaught exceptions. A default error handler knows to output an error both in XML and JSON formats depending on requested
MIME type.
In our projects we use unique error reference IDs. This feature allows to an end-user to refer to any error that has happened during the application life time and pass such error ID to the technical support for further investigations. Thus, error details passed to the client-side contain an ErrorID field. An error logger generates ErrorID and passes it over to an error handler for serialization.
Let's look at our error handling implementation for a Web API application.
The first part is an implementation of IExceptionLogger
interface. It assigns ErrorID and logs all errors:
/// Defines a global logger for unhandled exceptions.
public class GlobalExceptionLogger : ExceptionLogger
{
/// Writes log record to the database synchronously.
public override void Log(ExceptionLoggerContext context)
{
try
{
var request = context.Request;
var exception = context.Exception;
var id = LogError(
request.RequestUri.ToString(),
context.RequestContext == null ?
null : context.RequestContext.Principal.Identity.Name,
request.ToString(),
exception.Message,
exception.StackTrace);
// associates retrieved error ID with the current exception
exception.Data["NesterovskyBros:id"] = id;
}
catch
{
// logger shouldn't throw an exception!!!
}
}
// in the real life this method may store all relevant info into a database.
private long LogError(
string address,
string userid,
string request,
string message,
string stackTrace)
{
...
}
}
The second part is the implementation of IExceptionHandler :
/// Defines a global handler for unhandled exceptions.
public class GlobalExceptionHandler : ExceptionHandler
{
/// This core method should implement custom error handling, if any.
/// It determines how an exception will be serialized for client-side processing.
public override void Handle(ExceptionHandlerContext context)
{
var requestContext = context.RequestContext;
var config = requestContext.Configuration;
context.Result = new ErrorResult(
context.Exception,
requestContext == null ? false : requestContext.IncludeErrorDetail,
config.Services.GetContentNegotiator(),
context.Request,
config.Formatters);
}
/// An implementation of IHttpActionResult interface.
private class ErrorResult : ExceptionResult
{
public ErrorResult(
Exception exception,
bool includeErrorDetail,
IContentNegotiator negotiator,
HttpRequestMessage request,
IEnumerable<MediaTypeFormatter> formatters) :
base(exception, includeErrorDetail, negotiator, request, formatters)
{
}
/// Creates an HttpResponseMessage instance asynchronously.
/// This method determines how a HttpResponseMessage content will look like.
public override Task<HttpResponseMessage> ExecuteAsync(CancellationToken cancellationToken)
{
var content = new HttpError(Exception, IncludeErrorDetail);
// define an additional content field with name "ErrorID"
content.Add("ErrorID", Exception.Data["NesterovskyBros:id"] as long?);
var result =
ContentNegotiator.Negotiate(typeof(HttpError), Request, Formatters);
var message = new HttpResponseMessage
{
RequestMessage = Request,
StatusCode = result == null ?
HttpStatusCode.NotAcceptable : HttpStatusCode.InternalServerError
};
if (result != null)
{
try
{
// serializes the HttpError instance either to JSON or to XML
// depend on requested by the client MIME type.
message.Content = new ObjectContent<HttpError>(
content,
result.Formatter,
result.MediaType);
}
catch
{
message.Dispose();
throw;
}
}
return Task.FromResult(message);
}
}
}
Last, but not least part of this solution is registration and configuration of the error logger/handler:
/// WebApi congiguation.
public static class WebApiConfig
{
public static void Register(HttpConfiguration config)
{
...
// register the exception logger and handler
config.Services.Add(typeof(IExceptionLogger), new GlobalExceptionLogger());
config.Services.Replace(typeof(IExceptionHandler), new GlobalExceptionHandler());
// set error detail policy according with value from Web.config
var customErrors =
(CustomErrorsSection)ConfigurationManager.GetSection("system.web/customErrors");
if (customErrors != null)
{
switch (customErrors.Mode)
{
case CustomErrorsMode.RemoteOnly:
{
config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.LocalOnly;
break;
}
case CustomErrorsMode.On:
{
config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.Never;
break;
}
case CustomErrorsMode.Off:
{
config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.Always;
break;
}
default:
{
config.IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.Default;
break;
}
}
}
}
}
The client-side error handler remain almost untouched. The implementation details you may find in
/Scripts/api/api.js and Scripts/controls/error.js files.
You may download the demo project here.
Feel free to use this solution in your .NET projects.
From time to time we run into tasks that we would like to solve in LINQ style but unfortunately it either cannot be done or a solution is not efficient.
Note that by LINQ style we do not mean C# query expressions (we have a strong distaste for that syntax) but extension methods defined in System.Linq.Enumerable and other classes.
Here we quote several extension methods that are good for a general use:
1. Select with predicate. This is shorthand of items.Where(...).Select(...) :
/// <summary>
/// Projects each element of a sequence into a new form.
/// </summary>
/// <typeparam name="T">A type of elements of source sequence.</typeparam>
/// <typeparam name="R">A type of elements of target sequence.</typeparam>
/// <param name="source">A source sequence.</param>
/// <param name="where">A predicate to filter elements.</param>
/// <param name="selector">A result element selector.</param>
/// <returns>A target sequence.</returns>
public static IEnumerable<R> Select<T, R>(
this IEnumerable<T> source,
Func<T, bool> where,
Func<T, R> selector)
{
return source.Where(where).Select(selector);
}
2. Select with predicate with source element index passed both into the predicate and into the selector. This one you cannot trivially implement in LINQ:
/// <summary>
/// Projects each element of a sequence into a new form.
/// </summary>
/// <typeparam name="T">A type of elements of source sequence.</typeparam>
/// <typeparam name="R">A type of elements of target sequence.</typeparam>
/// <param name="source">A source sequence.</param>
/// <param name="where">A predicate to filter elements.</param>
/// <param name="selector">A result element selector.</param>
/// <returns>A target sequence.</returns>
public static IEnumerable<R> Select<T, R>(
this IEnumerable<T> source,
Func<T, int, bool> where,
Func<T, int, R> selector)
{
var index = 0;
foreach(var value in source)
{
if (where(value, index))
{
yield return selector(value, index);
}
++index;
}
}
3. A function with output element as projection of a window of input elements. Such function can be used to get finite difference (operation opposite to a cumulative sum).
/// <summary>
/// Projects a window of source elements in a source sequence into target sequence.
/// Thus
/// target[i] =
/// selector(source[i], source[i - 1], ... source[i - window + 1])
/// </summary>
/// <typeparam name="T">A type of elements of source sequence.</typeparam>
/// <typeparam name="R">A type of elements of target sequence.</typeparam>
/// <param name="source">A source sequence.</param>
/// <param name="window">A size of window.</param>
/// <param name="lookbehind">
/// Indicate whether to produce target if the number of source elements
/// preceeding the current is less than the window size.
/// </param>
/// <param name="lookahead">
/// Indicate whether to produce target if the number of source elements
/// following current is less than the window size.
/// </param>
/// <param name="selector">
/// A selector that derives target element.
/// On input it receives:
/// an array of source elements stored in round-robing fashon;
/// an index of the first element;
/// a number of elements in the array to count.
/// </param>
/// <returns>Returns a sequence of target elements.</returns>
public static IEnumerable<R> Window<T, R>(
this IEnumerable<T> source,
int window,
bool lookbehind,
bool lookahead,
Func<T[], int, int, R> selector)
{
var buffer = new T[window];
var index = 0;
var count = 0;
foreach(var value in source)
{
if (count < window)
{
buffer[count++] = value;
if (lookbehind || (count == window))
{
yield return selector(buffer, 0, count);
}
}
else
{
buffer[index] = value;
index = index + 1 == window ? 0 : index + 1;
yield return selector(buffer, index, count);
}
}
if (lookahead)
{
while(--count > 0)
{
index = index + 1 == window ? 0 : index + 1;
yield return selector(buffer, index, count);
}
}
}
This way a finite difference looks like this:
var diff = input.Window(
2,
false,
false,
(buffer, index, count) => buffer[index ^ 1] - buffer[index]);
4. A specialization of Window method that returns a enumeration of windows:
/// <summary>
/// Projects a window of source elements in a source sequence into a
/// sequence of window arrays.
/// </summary>
/// <typeparam name="T">A type of elements of source sequence.</typeparam>
/// <typeparam name="R">A type of elements of target sequence.</typeparam>
/// <param name="source">A source sequence.</param>
/// <param name="window">A size of window.</param>
/// <param name="lookbehind">
/// Indicate whether to produce target if the number of source elements
/// preceeding the current is less than the window size.
/// </param>
/// <param name="lookahead">
/// Indicate whether to produce target if the number of source elements
/// following current is less than the window size.
/// </param>
/// <returns>Returns a sequence of windows.</returns>
public static IEnumerable<T[]> Window<T>(
this IEnumerable<T> source,
int window,
bool lookbehind,
bool lookahead)
{
return source.Window(
window,
lookbehind,
lookahead,
(buffer, index, count) =>
{
var result = new T[count];
for(var i = 0; i < count; ++i)
{
result[i] = buffer[index];
index = index + 1 == buffer.Length ? 0 : index + 1;
}
return result;
});
}
While writing an article Dealing with dynamic SQL in SQL Server we have crushed on a trivial C# code.
Consider a declaration:
public enum Direction
{
Asc,
Desc
}
public struct Order
{
public string Field { get; set; }
public Direction Direction { get; set; }
}
public class DateRange
{
public DateTime? From { get; set; }
public DateTime? To { get; set; }
}
public class Request
{
public DateRange CreatedAt { get; set; }
public string Summary { get; set; }
[XmlElement]
public int[] State { get; set; }
public DateRange UpdatedAt { get; set; }
[XmlElement]
public Order[] Order { get; set; }
}
and a code:
...
var request = new Request
{
CreatedAt = { From = new DateTime(2014, 1, 1) },
Order = new[]
{
new Order { Field = "CreatedAt" }
}
};
...
It was hard to believe that this simple variable declaration throws NullReferenceException .
In fact we could not realize what the reason is until we have decompiled that code and poked into IL.
It appears that the problem is with the line:
CreatedAt = { From = new DateTime(2014, 1, 1) },
It's implemented like this:
var range = request.CreatedAt;
range.From = new DateTime(2014, 1, 1) ;
range.To = null;
In other words it assumes the instance stored in property CreatedAt is not null , and it reinitializes all its properties.
In contrast the following works as expected (no exception is thrown):
CreatedAt = new { From = new DateTime(2014, 1, 1) },
as it creates a new instance and assigns it to a property.
We think that this is very error prone, so if it's how C# is assumed to be then it's the problem of the spec, otherwise it's the problem of Microsoft implementation of C# (at least in VS 2010 - 2013).
These are initial positions for this writing:
- SQL Server allows to execute dynamic SQL.
- Dynamic SQL is useful and often unavoidable, e.g. when you have to filter or order data in a way that you cannot code efficiently in advance.
- Dynamic SQL has proven to be a dangerous area, as with improper use it can open hole in a security.
In general nothing stops you from building and then excuting of SQL string. Our goal, however, is to define rules that make work with dynamic SQL is more managable and verifiable.
Here we outline these rules, and then give some examples and tips.
Rule #1. Isolate dynamic SQL
Put all logic related to building of dynamic SQL into a separate function.
We usually define a separate scheme Dynamic , and define functions like Dynamic.GetSQL_XXX(params) .
This makes it simple to perform code review.
Rule #2. Xml as parameters
Use xml type to pass parameters to a function that builds dynamic SQL.
In many cases dynamic SQL depends on variable number of parameters (like a list of values to check against).
Xml fits here to represent structured information.
On a client (e.g. in C# or java) you can define a class with all parameters, populate an instance and serialize it to an xml.
Rule #3. XQuery as template language
Use XQuery to define SQL template and to generate SQL tree from the input parameters.
Here is an example of such XQuery:
@data.query('
<sql>
select
T.*
from
Data.Ticket T
where
{
for $ticketID in data/ticketID return
<sql>(T.TicketID = <int>{$ticketID}</int>) and </sql>
}
(1 = 1)
</sql>')
You can see that output is an xml
with sql element to represent literal SQL, and int element to represent integer literal.
In fact whole output schema can be defined like this:
<xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="sql"/>
<xs:element name="name"/>
<xs:element name="string" nillable="true"/>
<xs:element name="int" nillable="true"/>
<xs:element name="decimal" nillable="true"/>
<xs:element name="date" nillable="true"/>
<xs:element name="time" nillable="true"/>
<xs:element name="datetime" nillable="true"/>
</xs:schema>
where sql is to represent literal content, name to represent a name, and other elements to represent different literal values.
Rule #4. Escape literals
Use function Dynamic.ToSQL(@template) to build final SQL text.
Here we quote the definition:
-- Builds a text of SQL function for an sql template.
create function Dynamic.ToSQL
(
-- SQL template.
@template xml
)
returns nvarchar(max)
with returns null on null input
as
begin
return
(
select
case
when N.Node.exist('*[xs:boolean(@xsi:nil)]') = 1 then
'null'
when N.Node.exist('self::int') = 1 then
isnull(N.Node.value('xs:int(.)', 'nvarchar(max)'), '# int #')
when N.Node.exist('self::string') = 1 then
'N''' +
replace
(
N.Node.value('.', 'nvarchar(max)'),
'''',
''''''
) +
''''
when N.Node.exist('self::name') = 1 then
isnull
(
quotename(N.Node.value('.', 'nvarchar(128)'), '['),
'# name #'
)
when N.Node.exist('self::datetime') = 1 then
isnull
(
'convert(datetime2, ''' +
N.Node.value('xs:dateTime(.)', 'nvarchar(128)') +
''', 126)',
'# datetime #'
)
when N.Node.exist('self::date') = 1 then
isnull
(
'convert(date, ''' +
N.Node.value('xs:date(.)', 'nvarchar(128)') +
''', 126)',
'# date #'
)
when N.Node.exist('self::time') = 1 then
isnull
(
'convert(time, ''' +
N.Node.value('xs:time(.)', 'nvarchar(128)') +
''', 114)',
'# time #'
)
when N.Node.exist('self::decimal') = 1 then
isnull
(
N.Node.value('xs:decimal(.)', 'nvarchar(128)'),
'# decimal #'
)
when N.Node.exist('self::*') = 1 then
'# invalid template #'
else
N.Node.value('.', 'nvarchar(max)')
end
from
@template.nodes('//sql/node()[not(self::sql)]') N(Node)
for xml path(''), type
).value('.', 'nvarchar(max)');
end;
Now, we want to stress that this function plays an important role in prevention of the SQL injection, as it escapes literals from the SQL tree.
Rule #5 (optional). Collect data
Use SQL to collect additional data required to build dynamic SQL. Here is an example of how we get a Ticket by StatusID , while on input we receive a StatusName :
create function Dynamic.GetSQL_GetTicketByStatus(@data xml)
returns nvarchar(max)
as
begin
set @data =
(
select
@data,
(
select
T.StatusID
from
@data.nodes('/data/status') N(Node)
inner join
Metadata.Status T
on
T.StatusName = Node.value('.', 'nvarchar(128)')
for xml auto, type, elements
)
for xml path('')
);
return Dynamic.ToSQL
(
@data.query
('
<sql>
select
T.*
from
Data.Ticket T
where
T.Status in ({ for $status in /T/StatusID return <sql><int>{$status}</int>,</sql> } null)
</sql>
')
);
end;
Notice code in red that collects some more data before calling XQuery.
Rule #6. Execute
The final step is to call dynamic SQL.
This is done like this:
-- build
declare @sql nvarchar(max) = Dynamic.GetSQL_GetTicket(@data);
-- execute
execute sp_executesql
@sql
-- {, N'@parameter_name data_type [ OUT | OUTPUT ][ ,...n ]' }
-- { , [ @param1 = ] 'value1' [ ,...n ] }
with result sets
(
(
TicketID int not null,
CreatedAt datetime2 not null,
Summary nvarchar(256) null,
Status int,
Severity int,
DeadLineAt datetime2 null
)
);
Notice that the use of dynamic SQL does not prevent static parameters.
Notice also that with result sets clause is used to specify output.
Example. Tickets system
Let's assume you're dealing with a tickets system (like Bugzilla), and you have a table Data.Ticket to describe tickets. Assume that DDL for this table is like this:
create table Data.Ticket
(
TicketID bigint not null primary key,
CreatedAt datetime2 not null,
Summary nvarchar(128) null,
Status int not null,
UpdatedAt datetime2(7) not null
)
Suppose you have to build C# code to search different tickets, where Entity Framework is used to access the database.
Search should be done by a range of CreatedAt , a range of UpdatedAt , Summary , or by different Status values. It should be possible to order results in different ways.
We start out solution from the C# and define classes for a request:
public enum Direction
{
Asc,
Desc
}
public struct Order
{
public string Field { get; set; }
public Direction Direction {get; set; }
}
public class DateRange
{
public DateTime? From { get; set; }
// This property is to omit From element if value is null.
// See rules for xml serialization.
public bool FromSpecified
{
get { return From != null; }
}
public DateTime? To { get; set; }
public bool ToSpecified
{
get { return To != null; }
}
}
public class TicketsRequest
{
public DateRange CreatedAt { get; set; }
public string Summary { get; set; }
public DateRange UpdatedAt { get; set; }
[XmlElement]
public Order[] Order { get; set; }
[XmlElement]
public int[] Status { get; set; }
}
Notice that we're going to use XmlSerializer to convert request to xml and then to pass parameter into EF's model. Here is utility method to perform such conversion:
public static string ToXmlString<T>(T value)
{
if (value == null)
{
return null;
}
var serializer = new XmlSerializer(typeof(T));
var builder = new StringBuilder();
var writer = XmlWriter.Create(
builder,
new XmlWriterSettings
{
OmitXmlDeclaration = true,
Indent = false
});
serializer.Serialize(writer, value);
writer.Flush();
return builder.ToString();
}
Now we proceed to the database and define a procedure that runs the search:
-- Gets tickets.
create procedure Data.GetTickets
(
-- A query parameters.
@params xml
)
as
begin
set nocount on;
-- This is for EF to guess type of result.
if (1 = 0)
begin
select
TicketID,
CreatedAt,
Summary,
Status,
UpdatedAt
from
Data.Ticket;
end;
declare @sql nvarchar(max) = Dynamic.GetSQL_GetTickets(@params);
execute sp_executesql @sql
with result sets
(
(
TicketID int not null,
CreatedAt datetime2 not null,
Summary nvarchar(256) null,
Status int,
UpdatedAt datetime2 null
)
);
end;
Switch back to C#, import the Data.GetTickets into the EF model, and create a search method:
public IEnumerable<Ticket> GetTickets(TicketsRequest request)
{
var model = new Model();
return model.GetTickets(ToXmlString(request));
}
The last ingredient is Dynamic.GetSQL_GetTickets() function.
create function Dynamic.GetSQL_GetTickets(@data xml)
returns nvarchar(max)
as
begin
return Dynamic.ToSQL
(
@data.query('
<sql>
select
T.TicketID,
T.CreatedAt,
T.Summary,
T.Status,
T.UpdatedAt
from
Data.Ticket T
where
{
for $range in */CreatedAt return
(
for $date in $range/From return
<sql>
(T.CreatedAt >= <datetime>{$date}</datetime>) and
</sql>,
for $date in $range/To return
<sql>
(<datetime>{$date}</datetime> > T.CreatedAt) and
</sql>
),
for $range in */UpdatedAt return
(
for $date in $range/From return
<sql>
(T.UpdatedAt >= <datetime>{$date}</datetime>) and
</sql>,
for $date in $range/To return
<sql>
(<datetime>{$date}</datetime> > T.UpdatedAt) and
</sql>
),
for $summary in */Summary return
<sql>
(T.Summary like <string>{$summary}</string>) and
</sql>,
if (*/Status) then
<sql>
T.Status in
({
for $status in */Status return
<sql><int>{$status}</int>, </sql>
} null) and
</sql>
else ()
}
(1 = 1)
order by
{
for $order in
*/Order
[
Field = ("TicketID", "CreatedAt", "Summary", "UpdatedAt", "Status")
]
return
<sql>
<name>{$order/Field}</name>
{" desc"[$order[Direction = "Desc"]]},
</sql>
}
(select null)
</sql>
')
);
end;
SQL text from Dynamic.GetSQL_GetTickets()
Consider now SQL text produced by this function. For an input:
<TicketsRequest>
<CreatedAt>
<From>2014-01-01T00:00:00</From>
</CreatedAt>
<Summary>hello%</Summary>
<Order>
<Field>Status</Field>
<Direction>Desc</Direction>
</Order>
<Status>1</Status>
<Status>3</Status>
</TicketsRequest>
the output is:
select
T.TicketID,
T.CreatedAt,
T.Summary,
T.Status,
T.UpdatedAt
from
Data.Ticket T
where
(T.CreatedAt >= convert(datetime2, '2014-01-01T00:00:00', 126)) and
(T.Summary like N'hello%') and
T.Status in
(1, 3, null) and
(1 = 1)
order by
[Status] desc,
(select null)
Though the text is not formatted as we would like, it's perfectly valid SQL.
Tips for building XQuery templates
What is called XQuery in SQL Server is in fact a very limited subset of XQuery 1.0. Microsoft clearly states this fact. What is trivial in XQuery is often impossible or ugly in XQuery of SQL Server.
Nevertheless XQuery in SQL Server works rather well as SQL template language. To make it most efficient, however, you should learn several tips.
Tip #1. Where clause
In template you might want to build a where clause:
<sql>
select
...
where
{
if (...) then
<sql>...</sql>
else ()
}
</sql>
and it might happen that for a certain input a condition under where might collapse, and you will be left with where keyword without a real condition, which is wrong. A simple work around is to always add some true condition under ther where like this:
<sql>
select
...
where
{
if (...) then
<sql>... and </sql>
else ()
} (1 = 1)
</sql>
Tip #2. "in" expression
If you want to generate "in" expression like this:
value in (item1, item2,...)
then you might find that it's much easier generate equivalent a code like this:
value in (item1, item2,..., null) .
Here is a XQuery to generate such template:
value in
({
for $item in ... return
<sql><int>{$item}</int>, </sql>
} null) and
Tip #3. Order by
You can conclude an order by clause built from a data with a dummy expression like this:
order by
{
for $item in ... return
<sql>
<name>{$item/Field}</name>
{" desc"[$item/Direction = "Desc"]},
</sql>
} (select null)
Alternatively you can use first column from a clustered index.
Tip #4. Group by
In a group by clause we cannot introduce terminator expression as it was with order by , so a code is a less trivial:
{
let $items := ... return
if ($items) then
<sql>
group by <name>{$items[1]}</name>
{
for $item in $items[position() > 1] return
<sql>, <name>{$item}</name></sql>
}
</sql>
else ()
}
In fact similar logic may work with order by .
Tip #5. Escape literals
It's crusial not to introduce SQL injection while building SQL. Thus use:
<int>{...}</int> - for literal int;
<decimal>{...}</decimal> - for literal decimal;
<string>{...}</string> - for literal string;
<datetime>{...}</datetime> - for literal datetime2;
<date>{...}</date> - for literal date;
<time>{...}</time> - for literal time;
<name>{...}</name> - for a name to quote.
Note that you can use xsi:nil , so
<int xsi:nil="true"/> means null .
If you generate a field name from an input data then it worth to validate it against a list of available names.
Tip #6. Validate input.
It worth to define xml schema for an input xml, and to validate parameters against it.
This makes code more secure, and also adds a documentation.
Tip #7. Don't abuse dynamic SQL
There are not too many cases when you need a dynamic SQL. Usually SQL engine knows how to build a good execution plan. If your query contains optional conditions then you can write it a way that SQL Server can optimize, e.g.:
select
*
from
T
where
((@name is null) or (Name = @name)) and
((@date is null) or (Date = @date))
option(recompile)
Consider how would you implement Style object in the HTML DOM?
These are some characteristics of that object:
- It has a long list of properties, e.g. in IE 11 there are more than 300 properties over a style object.
- Any specific instance usually have only several properties assigned.
- Reads of properties are much more frequent than writes. In fact style often stays unchanged after initialization.
- DOM contains many style instances (often thousands).
- The number of distinct instances in terms of values of properties is moderate (usually dozens).
Here is how would we approached to such an object.
1.
Styles are sparse objects, thus there is no point to implement plain class with all those properties, as it's wasteful.
We would rather use two techniques to keep style's state:
- A dictionary of properties with their values;
- An aggregation of objects, where all properies are grouped into families, each group is defined by a separate type, and a style's state is an aggregation of that groups.
A current style of an element is an aggregation of styles of ancestor element. It can either by dynamic or be fused into a single style instance.
2. Make style's state immutable, and share all these states among all style instances.
In this implementation property write turns into a state transition operation: state = set(state, property, value) . Thus no state is modified but replaced with other state that corresponds to a required change.
If state is seen as a dictionary then API may look like this :
public class State<K, V>
{
// Gets shared dictionary for an input dictionary.
public IDictionary<K, V> Get(IDictionary<K, V> dictionary);
// Gets a shared dictionary for an input dictionary with key set to a value.
public IDictionary<K, V> Set(IDictionary<K, V> dictionary, K key, V value);
// Gets a shared dictionary for an input dictionary.
public IDictionary<K, V> Remove(IDictionary<K, V> dictionary, K key);
// Gets typed value.
public T Get<T>(IDictionary<K, V> dictionary, K key)
where T: V
{
V value;
if ((dictionary == null) || !dictionary.TryGetValue(key, out value))
{
return default(T);
}
return (T)value;
}
// Sets or removes a typed value.
// dictionary can be null.
// null returned if output dictionary would be empty.
public IDictionary<K, V> Set<T>(IDictionary<K, V> dictionary,
K key,
T value)
where T : V
{
return value == null ? Remove(dictionary, key) :
Set(dictionary, key, (V)value);
}
}
States can be cached. Provided the cache keeps states in a weak way, no unsued state will be stored for a long time.
We may use weak table of dictionary to dictionary WeakTable<Dictionary<K, V>, Dictionary<K, V>> as a storage for such a cache. All required API is described in the WeakTable and Hash Code of Dictionary posts.
3. Style can be implemented as a structure with shared state as a storage. Here is a scetch:
[Serializable]
public struct Style
{
// All properties.
public enum Property
{
Background,
BorderColor,
BorderStyle,
Color,
FontFamily,
FontSize,
// ...
}
public int? Background
{
get { return states.Get<int?>(state, Property.Background); }
set { state = states.Set(state, Property.Background, value); }
}
public int? BorderColor
{
get { return states.Get<int?>(state, Property.BorderColor); }
set { state = states.Set(state, Property.BorderColor, value); }
}
public string BorderStyle
{
get { return states.Get<string>(state, Property.BorderStyle); }
set { state = states.Set(state, Property.BorderStyle, value); }
}
public int? Color
{
get { return states.Get<int?>(state, Property.Color); }
set { state = states.Set(state, Property.Color, value); }
}
public string FontFamily
{
get { return states.Get<string>(state, Property.FontFamily); }
set { state = states.Set(state, Property.FontFamily, value); }
}
public double? FontSize
{
get { return states.Get<double?>(state, Property.FontSize); }
set { state = states.Set(state, Property.FontSize, value); }
}
// ...
[OnDeserialized]
private void OnDeserialized(StreamingContext context)
{
state = states.Get(state);
}
// A state.
private IDictionary<Property, object> state;
// A states cache.
private static readonly State<Property, object> states =
new State<Property, object>();
}
Note that:
- default state is a
null dictionary;
- states are application wide shared.
The following link is our implementation of State<K, V> class: State.cs.
Here we have outlined the idea of shared state object, and how it can be applied to sparse mostly immutable objects. We used HTML style as an example of such an object. Shared state object may work in many other areas, but for it to shine its use case should fit to the task.
Dealing recently with some task (the same that inspired us to implement WeakTable), we were in a position to use a dictionary as a key in another dictionary.
What are the rules for the class to be used as key:
- key should be immutable;
- key should implement a
GetHashCode() method;
- key should implement a
Equals() method.
The first requirement is usually implemented as a documentation contract like this:
As long as an object is used as a key in the Dictionary<TKey, TValue> , it must not change in any way that affects its hash value.
The third requirement about equals is trivially implemented as a method:
public bool Equals(IDictionary<K, V> x, IDictionary<K, V> y)
{
if (x == y)
{
return true;
}
if ((x == null) || (y == null) || (x.Count != y.Count))
{
return false;
}
foreach(var entry in x)
{
V value;
if (!y.TryGetValue(entry.Key, out value) ||
!valueComparer.Equals(entry.Value, value))
{
return false;
}
}
return true;
}
But how would you implement hash code?
We argued like this.
1. Let's consider the dictionary as a sparse array of values with only populated items that correspond to key hash codes.
2. Hash code is constructed using some fair algorithm. E.g like that used in java to calculate string's hash code:
n-1
h(s) = SUM (s[i]*p^(n-1-i)) mod m, where m = 2^31
i=0
In our case:
n can be arbitrary large int value, so in fact it's 2^32 ;
- items are enumerated in unknown order;
- there is only limited set of items, so most
s[i] are zeros.
As result we cannot use recurrent function to calculate a power p^k mod m. Fortunately one can build fast exponentiation arguing like this:
32/s - 1
p^k = p^ SUM 2^((s*i)*k[i]) mod m, where s some int: 1, 2, 4, 8, 16, or 32.
i=0
Thus
32/s - 1
p^k = PRODUCT (p^(2^(s*i)))^k[i] mod m
i=0
If s = 1 then k[i] is either 1 or 0 (a bit), and there is 32 different p^(2^i) mod m values, which can be precalculated.
On the other hand, if we select s = 8 we can write the formula as:
p^k = p^k[0] * (p^(2^8))^k[1] * (p^(2^16))^k[2] * (p^(2^24))^k[3] mod m
where k[i] is a 8 bit value (byte).
Precalculating all values
p^n, (p^(2^8))^n , (p^(2^16))^n , (p^(2^24))^n for n in 0 to 255 we reach the formula with 4 multiplications and with 1024 precalculated values.
Here is the whole utility to calculate hash factors:
/// <summary>
/// Hash utilities.
/// </summary>
public class Hash
{
/// <summary>
/// Returns a P^value mod 2^31, where P is hash base.
/// </summary>
/// <param name="value">A value to get hash factor for.</param>
/// <returns>A hash factor value.</returns>
public static int GetHashFactor(int value)
{
return factors[(uint)value & 0xff] *
factors[(((uint)value >> 8) & 0xff) | 0x100] *
factors[(((uint)value >> 16) & 0xff) | 0x200] *
factors[(((uint)value >> 24) & 0xff) | 0x300];
}
/// <summary>
/// Initializes hash factors.
/// </summary>
static Hash()
{
var values = new int[4 * 256];
var value = P;
var current = 1;
var i = 0;
do
{
values[i++] = current;
current *= value;
}
while(i < 256);
value = current;
current = 1;
do
{
values[i++] = current;
current *= value;
}
while(i < 512);
value = current;
current = 1;
do
{
values[i++] = current;
current *= value;
}
while(i < 768);
value = current;
current = 1;
do
{
values[i++] = current;
current *= value;
}
while(i < 1024);
factors = values;
}
/// <summary>
/// A base to calculate hash factors.
/// </summary>
public const int P = 1103515245;
/// <summary>
/// Hash factors.
/// </summary>
private static readonly int[] factors;
}
With this API hash code for a dictionary is a trivial operation:
public int GetHashCode(IDictionary<K, V> dictionary)
{
if (dictionary == null)
{
return 0;
}
var result = 0;
foreach(var entry in dictionary)
{
if ((entry.Key == null) || (entry.Value == null))
{
continue;
}
result += Hash.GetHashFactor(keyComparer.GetHashCode(entry.Key)) *
valueComparer.GetHashCode(entry.Value);
}
return result;
}
And finally, here is a reference to a class DictionaryEqualityComparer<K, V>: IEqualityComparer<IDictionary<K, V>> that allows a dictionary to be a key in another dictionary.
Update
We have commited some tests, and have found that with suffiently "good" implementation of GetHashCode() of key or value we achieve results almost of the same quality, as the results of the algorithm we have outlined above with much simpler and straightforward algorithm like this:
public int GetHashCode(IDictionary<K, V> dictionary)
{
if (dictionary == null)
{
return 0;
}
var result = 0;
foreach(var entry in dictionary)
{
if ((entry.Key == null) || (entry.Value == null))
{
continue;
}
var k = entry.Key.GetHashCode();
var v = entry.Value.GetHashCode();
k = (k << 5) + k;
v = (v << (k >> 3)) + v;
result += k ^ v;
//result += Hash.GetHashFactor(keyComparer.GetHashCode(entry.Key)) *
// valueComparer.GetHashCode(entry.Value);
}
return result;
}
It was worth to blog about this just to find out that we have outwitted ourselves, and finally to reach to a trivial hash code implementation for the dictionary.
Dealing recently with some task, we were in a position to use a weak dictionary in the .NET. Instinctively we assumed that it should exist somewhere in the standard library. We definitely knew that there is a WeakReference class to for a single instance. We also knew that there is WeakHashMap in java, and that it's based on java's WeakReference .
So, we were surprised to find that there is no such thing out of the box in .NET.
We have found that java's and .NET's weak references are different. In java weak references whose targets are GCed can be automatically put into a queue, which can be used to build clean up logic to remove dead keys from weak hash map. There is nothing similar in .NET, where weak reference just silently loses it's value.
Internet is full with custom implementations of weak dictionaries in .NET.
.NET 4.5 finally defines a class ConditionalWeakTable<TKey, TValue> , which solves the problem in case when you need to match keys by instance identity.
Unfortunately in our case we needed to match keys using key's GetHashCode() and Equals() . So, ConditionalWeakTable<TKey, TValue> did not directly work, but then we found a way to make it work for us.
Here is a quote from the definition:
A ConditionalWeakTable<TKey, TValue> object is a dictionary that binds a managed object, which is represented by a key, to its attached property, which is represented by a value. The object's keys are the individual instances of the TKey class to which the property is attached, and its values are the property values that are assigned to the corresponding objects.
...in the ConditionalWeakTable<TKey, TValue> class, adding a key/value pair to the table does not ensure that the key will persist, even if it can be reached directly from a value stored in the table... Instead, ConditionalWeakTable<TKey, TValue> automatically removes the key/value entry as soon as no other references to a key exist outside the table.
This property of ConditionalWeakTable<TKey, TValue> has helped us to build a way to get a notification when the key is being finalized, which is the missed ingredient in .NET's weak references.
Assume you have an instance key of type Key . To get a notification you should define a class Finalizer that will call some handler when it's finalized, and you should bind key and a finalizer instance using weak table.
The code looks like this:
public class Finalizer<K>
where K: class
{
public static void Bind(K key, Action<K> handler)
{
var finalizer = table.GetValue(key, k => new Finalizer<K> { key = k });
finalizer.Handler += handler;
}
public static void Unbind(K key, Action<K> handler)
{
Finalizer finalizer;
if (table.TryGetValue(key, out finalizer))
{
finalizer.Handler -= handler;
}
}
~Finalizer()
{
var handler = Handler;
if (handler != null)
{
handler(key);
}
}
private event Action<K> Handler;
private K key;
private static readonly
ConditionalWeakTable<K, Finalizer> table =
new
ConditionalWeakTable<K, Finalizer>();
}
Key key = ...
Finalizer.Bind(key, k => { /* clean up. */ });
Using this approach we have created a class WeakTable<K, V> modeled after ConditionalWeakTable<TKey, TValue>.
So, this is our take in the problem: WeakTable.cs.
Oftentimes we deal with Hebrew in .NET.
The task we face again and again is attempt to convert a Hebrew text from visual to logical representation.
The latest demand of such task was when we processed content extracted from PDF. It's turned out that PDF stores content as graphic primitives, and as result text is stored visually (often each letter is kept separately).
We solved the task more than a decade ago, by calling Uniscribe API.
The function by itself is a small wrapper around that API, so in .NET 1.0 we were using managed C++, several years later we have switched to C++/CLI.
But now after many .NET releases, and with 32/64 versions we can see that C++ is only a guest in .NET world.
To run C++ in .NET you have to install VC runtime libraries adjusted to a specific .NET version. This turns C++ support in .NET into not a trivial task.
So, we have finally decided to define C# interop for the Uniscribe API, and recreate that function in pure C#:
namespace NesterovskyBros.Bidi
{
/// <summary>
/// An utility to convert visual string to logical.
/// <summary>
public static class BidiConverter
{
/// <summary>
/// Converts visual string to logical.
/// </summary>
/// <param name="value">A value to convert.</param>
/// <param name="rtl">A base direction.</param>
/// <param name="direction">
/// true for visual to logical, and false for logical to visual.
/// </param>
/// <returns>Converted string.</returns>
public static string Convert(string value, bool rtl, bool direction);
You can download this project from BidiVisualConverter.zip.
Before to start we have to confess that
afer many years of experience we sincerely dislike JSF technology, as
we think it's outdated compared to html 5 + REST.
We have a JSF 2.2 application, which
is configured to track session through url. In this case Session ID is stored in
url and not in
cookies, as there may be many sessions opened per a client.
At the same time application uses libraries that expose scripts and css
resources. This resources are referred to like this:
<link rel="stylesheet" type="text/css" jsfc="h:outputStylesheet"
library="css"
name="library-name.css"/>
<script type="text/javascript" jsfc="h:outputScript" name="library-name.js"
library="scripts" target="head"></script>
At runtime this is rendered as:
<link type="text/css" rel="stylesheet"
href="/App/javax.faces.resource/library-name.css.jsf;jsessionid=FC4A893330CCE12E8E20DFAFC73CDF35?ln=css"
/>
<script type="text/javascript"
src="/App/javax.faces.resource/library-name.js.jsf;jsessionid=FC4A893330CCE12E8E20DFAFC73CDF35?ln=scripts"></script>
You can see that Session ID is a part of url path,
which prevents resource caching on a client.
It's not clear whether it's what JSF spec dictates or
it's Oracle's Reference Implementation detail. We're certain, however,
that it's
too wasteful in heavy loaded environment, so we have tried to
resolve the problem.
From JSF sources we have found that h:outputStylesheet , h:outputScript ,
and h:outputLink all use ExternalContext.encodeResourceURL()
method to build markup url.
So, here is a solution: to provide custom wrapper for the ExternalContext .
This is done in two steps:
- create a factory class;
- register a factory in faces-config.xml;
1. Factory is a simple class but unfortunately it's implementation specific:
package com.nesterovskyBros.jsf;
import javax.faces.FacesException;
import javax.faces.context.ExternalContext;
import javax.faces.context.ExternalContextWrapper;
import com.sun.faces.context.ExternalContextFactoryImpl;
/**
* {@link ExternalContextFactory} to prevent session id in resource urls.
*/
public class ExternalContextFactory extends ExternalContextFactoryImpl
{
/**
* {@inheritDoc}
*/
@Override
public ExternalContext getExternalContext(
Object context,
Object request,
Object response)
throws FacesException
{
final ExternalContext externalContext =
super.getExternalContext(context, request, response);
return new ExternalContextWrapper()
{
@Override
public ExternalContext getWrapped()
{
return externalContext;
}
@Override
public String encodeResourceURL(String url)
{
return shouldEncode(url) ?
super.encodeResourceURL(url) : url;
}
private boolean shouldEncode(String url)
{
// Decide here whether you want to encode
url.
// E.g. in case of h:outputLink you may
want to have session id in url,
// so your decision is based on some
marker (like &session=1) in url.
return false;
}
};
}
}
2. Registration is just three lines in faces-config.xml :
<factory>
<external-context-factory>com.nesterovskyBros.jsf.ExternalContextFactory</external-context-factory>
</factory>
After that change at runtime we have:
<link type="text/css" rel="stylesheet"
href="/App/javax.faces.resource/library-name.css.jsf?ln=css" />
<script type="text/javascript"
src="/App/javax.faces.resource/library-name.js.jsf?ln=scripts"></script>
A day ago we had installed on our laptops Win 8.1. We thought it will be better than Win 8, but now we afraid of next release of Windows... So far, so worse.
ICS became unusable: in Windows 7 we've shared Internet connection from USB 3G dongle between our two computers, in Windows 8 we've succeeded to share only file system access, but not Internet. In Windows 8.1 neither Internet nor file system are accessible...
Till recently we were living in simple world of string comparisons in SQL style,
and now everything has changed.
From the university years we knew that strings in SQL are compared by first
trimming traling spaces, and then comparing in C style.
Well, the picture was a little more complex, as collations were involved
(national, case sensivity), and as different SQL vendors implemented it
differently.
Next, we're dealing with programs converted from COBOL, which we originally
thought follow SQL rules when strings are compared.
Here is where the problem has started.
Once we have found that java program has branched differently than original
COBOL, and the reason was that the COBOL and java compared two strings
differently:
- COBOL:
"A\n" < "A" ;
- Java:
"A\n" > "A"
We have looked into
COBOL Language Reference and found the rules:
- Operands of equal size
- Characters in corresponding positions of the two operands are compared,
beginning with the leftmost character and continuing through the rightmost
character.
If all pairs of characters through the last pair test as equal, the operands are
considered as equal.
If a pair of unequal characters is encountered, the characters are tested to
determine their relative positions in the collating sequence. The operand that
contains the character higher in the sequence is considered the greater operand.
- Operands of unequal size
- If the operands are of unequal size, the comparison is made as though the
shorter operand were extended to the right with enough spaces to make the
operands equal in size.
You can see that strings must not be trimmed but padded with spaces
to the longer string, and only then they are compared. This subtle difference
has significant impact for characters below the space.
So, here we've found that COBOL and SQL comparisons are different.
But then we have questioned how really SQL beheaves?
We've tested comparisons in SQL Server and DB2, and have seen that our
understanding of SQL comparison holds. It works as if trimming spaces, and then
comparing.
But again we have looked into SQL-92 definition, and that's what we see there:
8.2 <comparison predicate>
3) The comparison of two character strings is determined as follows:
a) If the length
in characters of X is not equal to the length
in
characters of Y, then the shorter string is effectively
replaced, for the purposes of comparison, with a copy of
itself that has been extended to the length of the longer
string by concatenation on the right of one or more pad characters, where the pad character is chosen based on CS. If
CS
has the NO PAD attribute, then the pad character is an
implementation-dependent character different from any character
in the character set of X and Y that collates less
than
any string under CS. Otherwise, the pad character is a
<space>.
So, what we see is that SQL-92 rules are very close to COBOL rules, but then we
reach the question: how come that at least SQL Server and DB2 implement string
comparison differently than SQL-92 dictates?
Update: we have found that both SQL Server and DB2 have their string collation defined in a way that <space> is less than any other character.
So the following is always true: '[' + char(13) + ']' > '[ ]' .
Earlier we have written a post
KendoUI's slowest function and now,
we want to point to the next slow function, which is kendo.guid() .
It's used to assign uid to each observable object, and also in couple
of other places.
Here is its source:
guid: function() {
var id = "", i, random;
for (i = 0; i < 32; i++) {
random = math.random() * 16 | 0;
if (i == 8 || i == 12 || i == 16 || i == 20) {
id += "-";
}
id += (i == 12 ? 4 : (i == 16 ? (random & 3 | 8) : random)).toString(16);
}
return id;
}
KendoUI people have decided to define uid as a string in format of
Globally unique identifier.
We think there is no reason to have such a complex value; it's enough to
have counter to generate uid values. As KendoUI relies
on the string type of uid , so we have defined a patch like this:
var guid = 0
kendo.guid = function()
{ return ++guid + ""
}
Consider now a test case. It's almost identical to that in previous post:
<!DOCTYPE html>
<html>
<head>
<title>Test</title>
<script src="scripts/jquery/jquery.min..js"></script>
<script src="scripts/kendo/kendo.web.min.js"></script>
<link href="styles/kendo.common.min.css" rel="stylesheet" />
<link href="styles/kendo.default.min.css" rel="stylesheet" />
<link href="styles/style.css" rel="stylesheet" />
<script>
var model;
function init()
{
var source = [];
for(var i = 0; i < 1000; ++i)
{
source.push({ text: "value " + i, value: "" + i });
}
model = kendo.observable(
{
value: "1",
source: new kendo.data.DataSource(
{
data: source
})
});
model.source.read();
}
function patch()
{
var base = kendo.data.binders.widget.source.fn._ns;
var result;
var guid = 0;
kendo.guid = function()
{ return ++guid + "";
};
kendo.data.binders.widget.source.fn._ns = function(ns)
{
return ns ? base.call(this, ns) :
(result || (result = base.call(this, ns)));
}
}
function test()
{
init();
kendo.bind("#view", model);
}
patch();
</script>
</head>
<body>
<p>
<button onclick="test()">Click to start test</button>
</p>
<p id="view">
Select:
<input data-role="dropdownlist"
data-bind="value: value, source: source"
data-text-field="text"
data-value-field="value"/>
</p>
</body>
</html>
Now, we can compare performance with and without that patch.
Here is a run statistics without patch:
Level |
Function |
Count |
Inclusive time (ms) |
Inclusive time % |
Avg time (ms) |
1 |
onclick |
1 |
270.73 |
100 |
270.73 |
1.1 |
test |
1 |
269.73 |
99.63 |
269.73 |
1.1.1 |
init |
1 |
117.07 |
43.24 |
117.07 |
1.1.1.1 |
guid |
1,001 |
72.05 |
26.61 |
0.07 |
1.1.2 |
bind |
1 |
152.65 |
56.39 |
152.65 |
and with patch:
Level |
Function |
Count |
Inclusive time (ms) |
Inclusive time % |
Avg time (ms) |
1 |
onclick |
1 |
172.64 |
100 |
172.64 |
1.1 |
test |
1 |
171.65 |
99.42 |
171.65 |
1.1.1 |
init |
1 |
62.04 |
35.94 |
62.04 |
1.1.1.1 |
guid |
1,001 |
1 |
0.58 |
0 |
1.1.2 |
bind |
1 |
109.6 |
63.49 |
109.6 |
Note that statistics were collected for IE 10.
An example can be found at
slow2.html.
What is the slowest function in kendo?
Or better, which function has most negative performance impact in kendo.
Recently, we were dealing with a simple page, which was too
slow, as data binding took more than second.
The page contained a dropdown list, with ~1000 options. To understand the
reason we have run this page under the IE's built-in javascript profiler, and
...
there, we have found that almost half of the time is taken by a function
(call it X), which receives nothing and returns always the same result!
But, let's now see a minimal example that demostrates the problem:
<!DOCTYPE html>
<html>
<head>
<title>Test</title>
<script src="scripts/jquery/jquery.js"></script>
<script src="scripts/kendo/kendo.web.min.js"></script>
<link href="styles/kendo.common.min.css" rel="stylesheet" />
<link href="styles/kendo.default.min.css" rel="stylesheet" />
<link href="styles/style.css" rel="stylesheet" />
<script>
var model;
function init()
{
var source = [];
for(var i = 0; i < 1000; ++i)
{
source.push({ text: "value " + i, value: "" + i });
}
model = kendo.observable(
{
value: "1",
source: new kendo.data.DataSource(
{
data: source
})
});
model.source.read();
}
function test()
{
kendo.bind("#view", model);
}
init();
</script>
</head>
<body>
<p>
<button onclick="test()">Click to start test</button>
</p>
<p id="view">
Select:
<input data-role="dropdownlist"
data-bind="value: value, source: source"
data-text-field="text"
data-value-field="value"/>
</p>
</body>
</html>
There are two parts: initialization, and a test itself that starts upon button
click.
In the initialization part we have defined a model, containing a datasource.
The test part performs data binding.
Now, here is a run statistics:
Function |
Count |
Inclusive time (ms) |
Inclusive time % |
Avg time (ms) |
test |
1 |
456.05 |
100 |
456.05 |
X |
1,000 |
200.14 |
43.89 |
0.2 |
So, X is fast by itself, but it run 1000 times, and took about 44% of all time.
And now, to the function. It's kendo.data.binders.widget.source.fn._ns .
Here is its code:
_ns: function(ns) {
ns = ns || kendo.ui;
var all = [ kendo.ui, kendo.dataviz.ui, kendo.mobile.ui ];
all.splice($.inArray(ns, all), 1);
all.unshift(ns);
return kendo.rolesFromNamespaces(all);
}
We can see that on each call it receives a parameter undefined , and always returns an array with the same content.
Not sure why Kendo UI team made it so complex, but one can easily device a
simple patch that optimizes this code path.
function patch()
{
var base = kendo.data.binders.widget.source.fn._ns;
var result;
kendo.data.binders.widget.source.fn._ns = function(ns)
{
return ns ? base.call(this, ns) :
(result || (result = base.call(this, ns)));
}
}
With this patch applied, the profiler shows:
Function |
Count |
Inclusive time (ms) |
Inclusive time % |
Avg time (ms) |
test |
1 |
253.03 |
100 |
253.03 |
_ns |
1,000 |
6 |
2.37 |
0.01 |
Execution time dropped considerably, and _ns() loses its title of most time
consuming function!
An example can be found at
slow.html.
Time after time we run into the same problem on different platforms, with different languages.
The problem's name is "Visual to Logical conversion for right-to-left or bidirectional text".
The problem is usually due to legacy code, which stores texts in visual order from left to right. In case of English it's ok, but with Hebrew this means that texts are partially reversed.
It worth to note that we've solved the same task with Windows API for native and .NET applications more than 10 years ago.
On the other hand, for Java, we yet didn't see any acceptable standalone solution. To remedy this omission,
we publish here our solution to this problem.
package com.nesterovskyBros.text;
import java.text.Bidi;
/**
* Utility that uses {@link Bidi} class.
*/
public class BidiUtils
{
/**
* Implements visual to logical order converter.
*
* @author <a href="http://www.nesterovsky-bros.com">Nesterovsky bros</a>
*
* @param text an input text in visual order to convert.
* @return a String value in logical order.
*/
public static String visualToLogical(String text)
{
if ((text == null) || (text.length() == 0))
{
return text;
}
Bidi bidi = new Bidi(text, Bidi.DIRECTION_DEFAULT_LEFT_TO_RIGHT);
if (bidi.isLeftToRight())
{
return text;
}
int count = bidi.getRunCount();
byte[] levels = new byte[count];
Integer[] runs = new Integer[count];
for (int i = 0; i < count; i++)
{
levels[i] = (byte)bidi.getRunLevel(i);
runs[i] = i;
}
Bidi.reorderVisually(levels, 0, runs, 0, count);
StringBuilder result = new StringBuilder();
for (int i = 0; i < count; i++)
{
int index = runs[i];
int start = bidi.getRunStart(index);
int end = bidi.getRunLimit(index);
int level = levels[index];
if ((level & 1) != 0)
{
for (; --end >= start;)
{
result.append(text.charAt(end));
}
}
else
{
result.append(text, start, end);
}
}
return result.toString();
}
}
This method utilizes embeded Bidi's algorithm, see class
java.text.Bidi.
Be aware that there is no perfect algorithm that covers all possible cases, since BIDI was written for an opposite task,
but our implementation based on Bidi.reorderVisually is usually acceptable.
Here is an JUnit test for this method:
package com.nesterovskyBors.text;
import static org.junit.Assert.*;
import org.junit.Test;
import com.nesterovskyBros.text.BidiUtils;
public class BidiUtilsTests
{
@Test
public void testsVisualToLogical()
{
String text = "123 יתימאה ןחבמ";
String actual = BidiUtils.visualToLogical(text);
String expected = "מבחן האמיתי 123";
assertEquals(expected, actual);
text = "תירבע English תירבע בוש";
actual = BidiUtils.visualToLogical(text);
expected = "שוב עברית English עברית";
assertEquals(expected, actual);
}
}
|