RSS 2.0
Sign In
# Sunday, 22 July 2012

For some reason neither .NET's XmlSerializer nor DataContractSerializer allow reading data through an XmlReader. These APIs work other way round writing data into an XmlWriter. To get data through XmlReader one needs to write it to some destination like a file or memory stream, and then to read it using XmlReader. This complicates streaming design considerably.

In fact the very same happens with other .NET APIs.

We think the reason of why .NET designers preferred XmlWriter to XmlReader in those APIs is that XmlReader's implementation is a state machine like, while XmlWriter's implementation looks like a regular procedure. It's much harder to manually write and to support a correct state machine logic than a procedure.

If history would have gone slightly different way, and if yield return, lambda, and Enumerator API appeared before XmlReader, and XmlWriter then, we think, both these classes looked differently. Xml source would have been described with a IEnumerable<XmlEvent> instead of XmlReader, and XmlWriter must be looked like a function receiving IEnumerable<XmlEvent>. Implementing XmlReader would have meant a creating a enumerator. Yield return and Enumerable API would have helped to implement it in a procedural way.

But in our present we have to deal with the fact that DataContractSerializer should write the data into XmlWriter, so let's assume we have a project that uses Entity Framework to access the database, and that you have a data class Person, and data access method GetPeople():

[DataContract(Name = "person", Namespace = "http://www.nesterovsky-bros.com")]
public class Person
{
  [DataMember] public int Id { get; set; }
  [DataMember] public string FirstName { get; set; }
  [DataMember] public string LastName { get; set; }
  [DataMember] public string City { get; set; }
  [DataMember] public string Title { get; set; }
  [DataMember] public DateTime BirthDate { get; set; }
  [DataMember] public int Age { get; set; }
}

public static IEnumerable<Person> GetPeople() { ... }

And your goal is to expose result of GetPeople() as XmlReader. We achieve result with three simple steps:

  1. Define JoinedStream - an input Stream implementation that reads data from a enumeration of streams (IEnumerable<Stream>).
  2. Build xml parts in the form of IEnumerable<Stream>.
  3. Combine parts into final xml stream.

The code is rather simple, so here we qoute its essential part:

public static class Extensions
{
  public static Stream JoinStreams(this IEnumerable<Stream> streams, bool closeStreams = true)
  {
    return new JoinedStream(streams, closeStreams);
  }

  public static Stream ToXmlStream<T>(
    this IEnumerable<T> items,
    string rootName = null,
    string rootNamespace = null)
  {
    return items.ToXmlStreamParts<T>(rootName, rootNamespace).
      JoinStreams(false);
  }

  private static IEnumerable<Stream> ToXmlStreamParts<T>(
    this IEnumerable<T> items,
    string rootName = null,
    string rootNamespace = null)
  {
    if (rootName == null)
    {
      rootName = "ArrayOfItems";
    }

    if (rootNamespace == null)
    {
      rootNamespace = "";
    }

    var serializer = new DataContractSerializer(typeof(T));
    var stream = new MemoryStream();
    var writer = XmlDictionaryWriter.CreateTextWriter(stream);

    writer.WriteStartDocument();
    writer.WriteStartElement(rootName, rootNamespace);
    writer.WriteXmlnsAttribute("s", XmlSchema.Namespace);
    writer.WriteXmlnsAttribute("i", XmlSchema.InstanceNamespace);

    foreach(var item in items)
    {
      serializer.WriteObject(writer, item);
      writer.WriteString(" ");

      writer.Flush();
      stream.Position = 0;

      yield return stream;

      stream.Position = 0;
      stream.SetLength(0);
    }

    writer.WriteEndElement();
    writer.WriteEndDocument();

    writer.Flush();
    stream.Position = 0;

    yield return stream;
  }

  private class JoinedStream: Stream
  {
    public JoinedStream(IEnumerable<Stream> streams, bool closeStreams = true)
    ...
  }
}

The use is even more simple:

// We have a streamed business data.
var people = GetPeople();

// We want to see it as streamed xml data.
using(var stream = people.ToXmlStream("persons", "http://www.nesterovsky-bros.com"))
using(var reader = XmlReader.Create(stream))
{
  ...
}

We have packed the sample into the project Streaming.zip.

In the next post we're going to remind about streaming processing in xslt.

Sunday, 22 July 2012 20:38:29 UTC  #    Comments [2] -
.NET | Thinking aloud | Tips and tricks | xslt
# Sunday, 24 June 2012

For some reason KendoUI DataSource does not allow to access current ajax request. Indeed, it seems quite natural to have a way to cancel running request.

To achieve a desired effect we have made a small set of changes in the RemoteTransport class:

var RemoteTransport_setup = kendo.RemoteTransport.fn.setup;

kendo.RemoteTransport.fn.setup = function()
{
  var that = this,
      options = RemoteTransport_setup.apply(that, arguments),
      beforeSend = options.beforeSend;

  options.beforeSend = functions(request, options)
  {
    that.abort();

    that._request = request;

    if (beforeSend && (beforeSend.apply(this, arguments) === false))
    {
      that._request = null;

      return false;
    }

    request.always(function() { that._request = null; });
  } 

  return options;
}

kendo.RemoteTransport.fn.request = function()
{
  return this._request;
}

kendo.RemoteTransport.fn.abort = function()
{
  var request = this._request;

  if (request)
  {
    this._request = null;
    request.abort();
  }
}

These changes allow to get an ajax request instance: grid.dataSource.request(), or to cancel a request grid.dataSource.abort().

Sunday, 24 June 2012 19:59:30 UTC  #    Comments [0] -
javascript | Tips and tricks
# Sunday, 17 June 2012

We're pleased to work with Kendo UI. Its design is good, however we find here and there things we would wish be done better. Here is a list of problems in a no particular order we would like to be addressed in the next release:

  • RTL is not supported (including correct scroll bar position see Tunning KendoUI).
  • Templates and binding should support a context information along with the data source. (Why do they use with statement?)
  • attr binding should use jquery.attr() method; there should be prop binding which is analogous to attr binding.
  • There should be custom binding that allows any json object to bind to different aspects of a widget or an element.
  • One should be able to use format/parse functions during binding. (Allow binding to express as a triple json object?)
  • parseExact(value, format, culture) method should be rewritten, as it has nothing in common with parsing data string according to exact format.
  • Type inference during binding is poor (parseOption() method). It works neither for string "1,2", nor json " { x: 0 } ", nor for date.
  • Binding is not implemented for many components: splitter, grid.
  • Splitter's pane should support size="auto".
  • Drid does not support totals in group headers, nor it supports header selection.
  • DataSource does not works after remote error, neither it allows to cancel request.
  • innerHtml is used all over the code, thus one cannot rely on jquery.data().
  • Grid does not support customization (localization) of a column filter.
  • Grid should support data binding of its content.
  • One should be able to destroy any widget.
Sunday, 17 June 2012 20:03:37 UTC  #    Comments [1] -
javascript | Thinking aloud
# Monday, 11 June 2012

Trying to make KendoUI to work with Hebrew or more generally in RTL environment we had to find a way to guess the position of scroll bar when direction is rtl.

The problem exists due to the fact that some browsers (Chrome one of them) always put scroll bars to the right. That's utterly wrong. Consider a label and a listbox:

Chrome IE
List in chrome List in chrome

You can see that the scroll bar appears between the label (on the right) and the data in the list box (on the left) in Chrome, and on the left side of the list box in the IE.

We came up with the following test that calculates a scroll bar position in rtl mode:

<script type="text/javascript">
var _scrollbar;

function scrollbar()
{
  if (!_scrollbar)
  {
    var div = document.createElement("div");

    div.style.cssText = "overflow:scroll;zoom:1;clear:both;direction:rtl";
    div.innerHTML = "<div>&nbsp;</div>";
    document.body.appendChild(div);

    _scrollbar =
    {
      size: div.offsetWidth - div.scrollWidth,
      rtlPosition: div.offsetLeft < div.firstChild.offsetLeft ? "left" : "right"
    };

    document.body.removeChild(div);
  }

  return _scrollbar;
}
</script>

In conjuction with an approach described in How to create a <style> tag with Javascript we were able to define rtl css classes for kendo controls and in particular for the grid, combobox, dropdownlist, and datepicker.

Monday, 11 June 2012 21:09:44 UTC  #    Comments [0] -
javascript | Tips and tricks
# Tuesday, 08 May 2012

Some time ago we were taking a part in a project where 95% of all sources are xslt 2.0. It was a great experience for us.

The interesting part is that we used xslt in areas we would never expect it in early 2000s. It crunched gigabytes of data in offline, while earlier we generally sought xslt application in a browser or on a server as an engine to render the data.

Web applications (both .NET and java) are in our focus today, and it became hard to find application for xslt or xquery.

Indeed, client side now have a very strong APIs: jquery, jqueryui, jsview, jqgrid, kendoui, and so on. These libraries, and today's browsers cover developer's needs in building managable applications. In contrast, a native support of xslt (at least v2) does not exist in browsers.

Server side at present is seen as a set of web services. These services support both xml and json formats, and implement a business logic only. It would be a torture to try to write such a frontend in xslt/xquery. A server logic itself is often dealing with a diversity of data sources like databases, files (including xml files) and other.

As for a database (we primarily work with SQL Server 2008 R2), we think that all communication should go through stored procedures, which implement all data logic. Clearly, this place is not for xslt. However, those who know sql beyond its basics can confirm that sql is very similar to xquery. More than that SQL Server (and other databases) integrate xquery to work with xml data, and we do use it extensively.

Server logic itself uses API like LINQ to manipulate with different data sources. In fact, we think that one can build a compiler from xquery 3.0 to C# with LINQ. Other way round compiler would be a whole different story.

The net result is that we see little place for xslt and xquery. Well, after all it's only a personal perspective on the subject. The similar type of thing has happened to us with C++. As with xslt/xquery we love C++ very much, and we fond of C++11, but at present we have no place in our current projects for C++. That's pitty.

P.S. Among other things that play against xslt/xquery is that there is a shortage of people who know these languages, thus who can support such projects.

Tuesday, 08 May 2012 20:28:51 UTC  #    Comments [0] -
Thinking aloud | xslt
# Sunday, 08 April 2012

Several days ago we've arrived to the blog "Recursive lambda expressions". There, author asks how to write a lambda expression that calculates a factorial (only expression statements are allowed).

The problem by itself is rather artificial, but at times you feel an intellectual pleasure solving such tasks by yourself. So, putting original blog post aside we devised our answers. The shortest one goes like this:

  1. As C# lambda expression cannot refer to itself, so it have to receive itself as a parameter, so:
    factorial(factorial, n) = n <= 1 ? 1 : n * factorial(factorial, n - 1);
  2. To define such lambda expression we have to declare a delegate type that receives a delegate of the same type:
    delegate int Impl(Impl impl, int n);
    Fortunately, C# allows this, but a workaround could be used even if it were not possible.

  3. To simplify the reasoning we've defined a two-expression version:
    Impl impl = (f, n) => n <= 1 ? 1 : n * f(f, n - 1);
    Func<int, int> factorial = i => impl(impl, i);
  4. Finally, we've written out a one-expression version:
    Func<int, int> factorial = i => ((Func<Impl, int>)(f => f(f, i)))((f, n) => n <= 1 ? 1 : n * f(f, n - 1));
  5. The use is:
    var f = factorial(10);

After that excercise we've returned back to original blog and compared solutions. We can see that author appeals to a set theory but for some reason his answer is more complex than nesessary, but comments contain variants that analogous to our answer.

Sunday, 08 April 2012 09:18:29 UTC  #    Comments [0] -
.NET | Tips and tricks
# Saturday, 24 March 2012

Let's start from a distance.

We support a busy database for a customer. Customer's requirement (in fact, state's requirement)  is that the database should have audit logs. This means that all important requests should be logged. These logs help both for the offline security analysis, and for the database health monitoring.

Before the end of the last year we used SQL Server 2005, and then customer has upgraded to SQL Server 2008 R2.

As by design the database is accessed through Stored Procedures only, so the logging was done using a small SP that traced input parameters and execution time. The call to that SP was inserted throughout the code of other SPs.

We expected SQL Server 2008 R2 to simplify the task, and to allow us to switch the audit on and off on a fine grained level without the need to change a SP in the production (see Understanding SQL Server Audit for details).

Unfortunatelly, we have almost immediately found that the current audit implementation traces SP calls but does not store parameter values. This way, you can see that there was a call "execute X @param1, @param2", but you have no idea what values were passed. Internet search shows that this a known problem (see SQL Server 2008 Database Audit on INSERT UPDATE and DELETE actual SQL and not parameter values), which renders SQL Server Audit useless.

But nevertheless, looking at how can we simplify our hand-made audit we have found a brilliant solution: "Light weight SQL Server procedure auditing without using SQL Server auditing". It's so simple, that it's a shame that we did not invent it ourselves! The approach is to insert or remove tracing code automatically. Indeed, there is nothing but data in the database, even the text of SP is only a data.

To automate it even more, we have defined a small table with names of procedures and their log levels, and have defined a procedure "Log.SetLevel @level" to configure all logging in one go. In addition we have simplified logging procedures and tables, and started to store parameters in xml columns rather than in a pipe-concatenated strings.

Now, to the negative SP execution times.

The logging code among other things measures current_timestamp at the begin and at the end of the execution of SP. This helps us (as developers) to monitor how database performs on a day to day basis, and to build many useful statistics.

For example we can see that the duration of about 10% of untrivial selects is 0ms (execution time is under 1ms). This means that SQL Server is good at data caching. But what is most interesting is that about 0.1% of requests have negative duration!

You could speculate on parallel or on out of order execution, but the paradox is resolved when you look closely on a value of duration. It's always around of -7,200,000ms. No one will assume that execution has ended two hours before it has started. So, what does it mean -2 hours? Well, we live in (UTC+02:00) Jerusalem time zone. We think that UTC offset crawls somehow into the result. To prove our hypothesis we would like to change time zone on sql servers, but customer won't agree on such an experiment. :-)

This effect probably means that there is some hidden bug in SQL Server 2008 R2 that we cannot reliably reproduce, but we can see that the datediff(ms, start_timestamp, end_timestamp) may return negative value when it's known that start_timestamp is acquired before end_timestamp.

Update: What a shame. During tunning of the original logging procedures we have changed type from datetime to datetime2, and calls from GETUTCDATE() to current_timestamp, except one place (default value in the table definition) where it remained with GETUTCDATE().

So, negative durations meant operation timeout (in our case duration is greater than 30 secs).

Saturday, 24 March 2012 14:44:04 UTC  #    Comments [0] -
SQL Server puzzle | Thinking aloud
# Friday, 23 March 2012

This time we update csharpxom to adjust it to C# 4.5. Additions are async modifier and await operator.

They are used to simplify asynchronous programming.

The following example from the msdn:

private async Task<byte[]> GetURLContentsAsync(string url)
{
  var content = new MemoryStream();
  var request = (HttpWebRequest)WebRequest.Create(url);

  using(var response = await request.GetResponseAsync())
  using(var responseStream = response.GetResponseStream())
  {
    await responseStream.CopyToAsync(content);
  }

  return content.ToArray();
}

looks like this in csharpxom:

<method name="GetURLContentsAsync" access="private" async="true">
  <returns>
    <type name="Task" namespace="System.Threading.Tasks">
      <type-arguments>
        <type name="byte" rank="1"/>
      </type-arguments>
    </type>
  </returns>
  <parameters>
    <parameter name="url">
      <type name="string"/>
    </parameter>
  </parameters>
  <block>
    <var name="content">
      <initialize>
        <new-object>
          <type name="MemoryStream" namespace="System.IO"/>
        </new-object>
      </initialize>
    </var>
    <var name="request">
      <initialize>
        <cast>
          <invoke>
            <static-method-ref name="Create">
              <type name="WebRequest" namespace="System.Net"/>
            </static-method-ref>
            <arguments>
              <var-ref name="url"/>
            </arguments>
          </invoke>
          <type name="HttpWebRequest" namespace="System.Net"/>
        </cast>
      </initialize>
    </var>

    <using>
      <resource>
        <var name="response">
          <initialize>
            <await>
              <invoke>
                <method-ref name="GetResponseAsync">
                  <var-ref name="request"/>
                </method-ref>
              </invoke>
            </await>
          </initialize>
        </var>
      </resource>
      <using>
        <resource>
          <var name="responseStream">
            <initialize>
              <invoke>
                <method-ref name="GetResponseStream">
                  <var-ref name="response"/>
                </method-ref>
              </invoke>
            </initialize>
          </var>
        </resource>
        <expression>
          <await>
            <invoke>
              <method-ref name="CopyToAsync">
                <var-ref name="responseStream"/>
              </method-ref>
              <arguments>
                <var-ref name="content"/>
              </arguments>
            </invoke>
          </await>
        </expression>
      </using>
    </using>

    <return>
      <invoke>
        <method-ref name="ToArray">
          <var-ref name="content"/>
        </method-ref>
      </invoke>
    </return>
  </block>
</method>

Friday, 23 March 2012 00:07:35 UTC  #    Comments [0] -
.NET | Announce | xslt
# Friday, 16 March 2012

After C++11 revision has been approved a new cycle of C++ design has begun:

N3370: The C++ standards committee is soliciting proposals for additional library components. Such proposals can range from small (addition of a single signature to an existing library) to large (something bigger than any current standard library component).

At this stage it's interesting to read papers, as authors try to express ideas rather than to formulate sentences that should go into spec as it lately was.

These are just several papers that we've found interesting:

N3322 12-0012 A Preliminary Proposal for a Static if Walter E. Brown
N3329 12-0019 Proposal: static if declaration H. Sutter, W. Bright, A. Alexandrescu

Those proposals argue about compile time "if statement". The feature can replace #if preprocessor directive, a SFINAE or in some cases template specializations.

A static if declaration can appear wherever a declaration or a statement is legal. Authors also propose to add static if clause to a class and a function declarations to conditionally exclude them from the scope.

Examples:

// Compile time factorial.
template <unsigned n>
struct factorial
{
  static if (n <= 1)
  {
    enum : unsigned { value = 1 };
  }
  else
  {
    enum : unsigned { value = factorial<n - 1>::value * n };
  }
};

// Declare class provided a condition is true.
class Internals if (sizeof(void*) == sizeof(int));

Paper presents strong rationale why this addition helps to build better programs, however the questions arise about relations between static if and concepts, static if clause and an error diagnostics.

 

N3327 12-0017 A Standard Programmatic Interface for Asynchronous Operations N. Gustafsson, A. Laksberg
N3328 12-0018 Resumable Functions Niklas Gustafsson

That's our favorite.

Authors propose an API and a language extensions to make asynchronous programs simpler.

In fact, asynchronous function will look very mush as a regular one but with small additions. It's similar to yield return in C# (a construct that has been available in C# for many years and is well vetted), and to async expression in C# 4.5. Compiler will rewrite such a function into a state machine, thus function can suspend its execution, wait for the data and to resume when data is available.

Example:

// read data asynchronously from an input and write it into an output.
int cnt = 0;

do
{
  cnt = await streamR.read(512, buf);

  if (cnt == 0)
  {
    break;
  }

  cnt = await streamW.write(cnt, buf);
}
while(cnt > 0);

It's iteresting to see how authors will address yield return: either with aditional keyword, or in terms of resumable functions.

 

N3340 12-0030 Rich Pointers D. M. Berris, M. Austern, L. Crowl

Here authors try to justify rich type-info but mask it under the name "rich pointers". To make things even more obscure they argue about dynamic code generation.

If you want a rich type-info then you should talk about it and not about thousand of other things.

We would better appealed to create a standard API to access post-compile object model, which could be used to produce different type-infos or other source derivatives.

This paper is our outsider. :-)

 

N3341 12-0031 Transactional Language Constructs for C++ M. Wong, H. Boehm, J. Gottschlich, T. Shpeisman, et al.

Here people try to generalize (put you away from) locking, and replace it with other word "transaction".

Seems it's not viable proposition. It's better to teach on functional style of programming with its immutable objects.

 

N3347 12-0037 Modules in C++ (Revision 6) Daveed Vandevoorde

Author argues against C style source composition with #include directive, and propose alternative called "modules".

We think that many C++ developers would agree that C pre-processor is a legacy that would never have existed, but for the same reason (for the legacy, and compatibility) it should stay.

In out opinion the current proposition is just immature, at least it's not intuitive. Or in other words there should be something to replace the C pre-processor (and #include as its part), but we don't like this paper from aestetic perspective.

 

N3365 12-0055 Filesystem Library Proposal (Revision 2)

This proposal says no a word about asynchronous nature of file access, while it should be designed around it.

Friday, 16 March 2012 19:21:58 UTC  #    Comments [0] -
C++ | Thinking aloud
# Thursday, 08 March 2012

For a long time we were developing web applications with ASP.NET and JSF. At present we prefer rich clients and a server with page templates and RESTful web services.

This transition brings technical questions. Consider this one.

Browsers allow to store session state entirely on the client, so should we maintain a session on the server?

Since the server is just a set of web services, so we may supply all required arguments on each call.

At first glance we can assume that no session is required on the server. However, looking further we see that we should deal with data validation (security) on the server.

Think about a classic ASP.NET application, where a user can select a value from a dropdown. Either ASP.NET itself or your program (against a list from a session) verifies that the value received is valid for the user. That list of values and might be other parameters constitute a user profile, which we stored in session. The user profile played important role (often indirectly) in the validation of input data.

When the server is just a set of web services then we have to validate all parameters manually. There are two sources that we can rely to: (a) a session, (b) a user principal.

The case (a) is very similar to classic ASP.NET application except that with EnableEventValidation="true" runtime did it for us most of the time.
The case (b) requires reconstruction of the user profile for a user principal and then we proceed with validation of parameters.

We may cache user profile in session, in which case we reduce (b) to (a); on the other hand we may cache user profile in Cache, which is also similar to (a) but which might be lighter than (at least not heavier than) the solution with the session.

What we see is that the client session does not free us from server session (or its alternative).

Thursday, 08 March 2012 21:56:19 UTC  #    Comments [0] -
.NET | ASP.NET | Java | JSF and Facelets | Thinking aloud
# Wednesday, 29 February 2012

We were dealing with a datasource of (int? id, string value) pairs in LINQ. The data has originated from a database where id is unique field. In the program this datasource had to be seen as a dictionary, so we have written a code like this:

var dictionary = CreateIDValuePairs().ToDictionary(item => item.ID, item => item.Value);

That was too simple-minded. This code compiles but crashes at runtime when there is an id == null.

Well, help warns about this behaviour, but anyway this does not make pain easier.

In our opinion this restriction is not justified and just complicates the use of Dictionaty.

Wednesday, 29 February 2012 20:42:46 UTC  #    Comments [0] -
.NET | Thinking aloud
# Wednesday, 01 February 2012

A customer have a table with data stored by dates, and asked us to present data from this table by sequential date ranges.

This query sounded trivial but took us half a day to create such a select.

For simplicity consider a table of integer numbers, and try to build a select that returns pairs of continuous ranges of values.

So, for an input like this:

declare @values table
(
  value int not null primary key
);

insert into @values(value)
select  1 union all select  2 union all select  3 union all
select  5 union all select  6 union all
select  8 union all
select 10 union all
select 12 union all select 13 union all select 14;

You will have a following output:

low  high
---- ----
1    3
5    6
8    8
10   10
12   14

Logic of the algorithms is like this:

  1. get a low bound of each range (a value without value - 1 in the source);
  2. get a high bound of each range (a value without value + 1 in the source);
  3. combine low and high bounds.

Following this logic we have built at least three different queries, where the shortest one is:

with source as
(
  select * from @values
)
select
  l.value low,
  min(h.value) high
from
  source l
  inner join
  source h
  on
    (l.value - 1 not in (select value from source)) and
    (h.value + 1 not in (select value from source)) and
    (h.value >= l.value)
group by
  l.value;

execution plan

Looking at this query it's hard to understand why it took so long to write so simple code...

Wednesday, 01 February 2012 20:34:09 UTC  #    Comments [0] -
SQL Server puzzle | Tips and tricks
# Wednesday, 25 January 2012

Some time ago our younger brother Aleksander had started studying of cinematography.

Few days ago he started his own "multimedia" blog (you'll better understand me when you'll see it), where you can see his portfolio. Aleksander's latest work was made with cooperation with Ilan Lahov, see "Bar mitzvah". This work demonstrates Aleksander's progress in this field.

Our congratulations to Aleksander!

Wednesday, 25 January 2012 20:57:35 UTC  #    Comments [0] -
Announce
# Friday, 20 January 2012

If you're writing an application that deals with files in file system on Windows, be sure that sooner or later you run into problems with antivirus software.

Our latest program that handles a lot of huge files and works as a Windows service, it reports time to time about some strange errors. These errors look like the file system disappeared on the fly, or, files were stolen by somebody else (after they have been opened in exclusive mode by our application).

We spent about two weeks in order to diagnose the cause of such behaviour, and then came to conclusion that is a secret work of our antivirus. All such errors disappeared as fog when the antivirus was configurated to skip folders with our files.

Thus, keep in mind our experience and don't allow an ativirus to became an evil.

Friday, 20 January 2012 16:47:58 UTC  #    Comments [0] -
Tips and tricks
# Thursday, 19 January 2012

While looking at some SQL we have realized that it can be considerably optimized.

Consider a table source like this:

with Data(ID, Type, SubType)
(
  select 1, 'A', 'X'
  union all
  select 2, 'A', 'Y'
  union all
  select 3, 'A', 'Y'
  union all
  select 4, 'B', 'Z'
  union all
  select 5, 'B', 'Z'
  union all
  select 6, 'C', 'X'
  union all
  select 7, 'C', 'X'
  union all
  select 8, 'C', 'Z'
  union all
  select 9, 'C', 'X'
  union all
  select 10, 'C', 'X'
)

Suppose you want to group data by type, to calculate number of elements in each group and to display sub type if all rows in a group are of the same sub type.

Earlier we have written the code like this:

select
  Type,
  case when count(distinct SubType) = 1 then min(SubType) end SubType,
  count(*) C
from
  Data
group by
  Type;

Namely, we select min(SybType) provided that there is a single distinct SubType, otherwise null is shown. That works perfectly, but algorithmically count(distinct SubType) = 1 needs to build a set of distinct values for each group just to ask the size of this set. That is expensive!

What we wanted can be expressed differently: if min(SybType) and max(SybType) are the same then we want to display it, otherwise to show null.

That's the new version:

select
  Type,
  case when min(SubType) = max(SubType) then min(SubType) end SubType,
  count(*) C
from
  Data
group by
  Type;

Such a simple rewrite has cardinally simplified the execution plan:

Execution plans

Another bizarre problem we have discovered is that SQL Server 2008 R2 just does not support the following:

select
  count(distinct SubType) over(partition by Type)
from
  Data

That's really strange, but it's known bug (see Microsoft Connect).

Thursday, 19 January 2012 21:12:11 UTC  #    Comments [0] -
SQL Server puzzle | Tips and tricks
Archive
<2012 July>
SunMonTueWedThuFriSat
24252627282930
1234567
891011121314
15161718192021
22232425262728
2930311234
Statistics
Total Posts: 387
This Year: 0
This Month: 0
This Week: 0
Comments: 2506
Locations of visitors to this page
Disclaimer
The opinions expressed herein are our own personal opinions and do not represent our employer's view in anyway.

© 2025, Nesterovsky bros
All Content © 2025, Nesterovsky bros
DasBlog theme 'Business' created by Christoph De Baene (delarou)