While looking at some SQL we have realized that it can be considerably optimized.
Consider a table source like this:
with Data(ID, Type, SubType) ( select 1, 'A', 'X' union all select 2, 'A', 'Y' union all select 3, 'A', 'Y' union all select 4, 'B', 'Z' union all select 5, 'B', 'Z' union all select 6, 'C', 'X' union all select 7, 'C', 'X' union all select 8, 'C', 'Z' union all select 9, 'C', 'X' union all select 10, 'C', 'X' )
Suppose you want to group data by type, to calculate number of elements in each group and to display sub type if all rows in a group are of the same sub type.
Earlier we have written the code like this:
select Type, case when count(distinct SubType) = 1 then min(SubType) end SubType, count(*) C from Data group by Type;
Namely, we select min(SybType) provided that there is a single distinct SubType, otherwise null is shown. That works perfectly, but algorithmically count(distinct SubType) = 1 needs to build a set of distinct values for each group just to ask the size of this set. That is expensive!
min(SybType)
SubType
null
count(distinct SubType) = 1
What we wanted can be expressed differently: if min(SybType) and max(SybType) are the same then we want to display it, otherwise to show null.
max(SybType)
That's the new version:
select Type, case when min(SubType) = max(SubType) then min(SubType) end SubType, count(*) C from Data group by Type;
Such a simple rewrite has cardinally simplified the execution plan:
Another bizarre problem we have discovered is that SQL Server 2008 R2 just does not support the following:
select count(distinct SubType) over(partition by Type) from Data
That's really strange, but it's known bug (see Microsoft Connect).
Remember Me
a@href@title, b, blockquote@cite, em, i, strike, strong, sub, super, u