For a long time we were developing web applications with ASP.NET and JSF. At
present we prefer rich clients and a server with page templates and RESTful web
services.
This transition brings technical questions. Consider this one.
Browsers allow to store session state entirely on the client, so should we
maintain a session on the server?
Since the server is just a set of web services, so we may supply all required
arguments on each call.
At first glance we can assume that no session is required on the server.
However, looking further we see that we should deal with data validation
(security) on the server.
Think about a classic ASP.NET application, where a user can select a value from
a dropdown. Either ASP.NET itself or your program (against a list from a
session) verifies that the value received is valid for the user. That list of
values and might be other parameters constitute a user profile, which we stored
in session. The user profile played important role (often indirectly) in the
validation of input data.
When the server is just a set of web services then we have to validate all
parameters manually. There are two sources that we can rely to: (a)
a session, (b)
a user principal.
The case (a) is very similar to classic ASP.NET application except that with
EnableEventValidation="true" runtime did it for us most of the time.
The case (b) requires reconstruction of the user profile for a user principal
and then we proceed with validation of parameters.
We may cache user profile in session, in which case we reduce (b) to (a); on the
other hand we may cache user profile in
Cache, which is also similar to (a) but which might be lighter than (at least not
heavier than) the solution with the session.
What we see is that the client session does not free us from server session (or
its alternative).
We were dealing with a datasource of (int? id, string
value) pairs in LINQ. The data has originated from a database where id is unique field.
In the program this datasource had to be seen as a dictionary, so we have
written a code like this:
var dictionary =
CreateIDValuePairs().ToDictionary(item => item.ID, item => item.Value) ;
That was too simple-minded. This code compiles but crashes at runtime when there is an id == null .
Well, help warns about this behaviour, but anyway this does not make pain easier.
In our opinion this restriction is not justified and just complicates the use
of Dictionaty .
A customer have a table with data stored by dates, and asked us to present data
from this table by sequential date ranges.
This query sounded trivial but took us half a day to create such a select.
For simplicity consider a table of integer numbers, and try to build a select
that returns pairs of continuous ranges of values.
So, for an input like this:
declare @values table
(
value int not null primary key
);
insert into @values(value)
select 1
union all
select 2
union all
select 3
union all
select 5
union all
select 6
union all
select 8
union all
select 10
union all
select 12
union all
select 13
union all
select 14;
You will have a following output:
low high
---- ----
1 3
5 6
8 8
10 10
12 14
Logic of the algorithms is like this:
- get a low bound of each range (a value without value - 1 in the source);
- get a high bound of each range (a value without value + 1 in the source);
- combine low and high bounds.
Following this logic we have built at least three different queries, where the
shortest one
is:
with source as
(
select * from @values
)
select
l.value low,
min(h.value) high
from
source l
inner join
source h
on
(l.value - 1 not in (select value from source)) and
(h.value + 1 not in (select value from source)) and
(h.value >= l.value)
group by
l.value;
Looking at this query it's hard to understand why it took so
long to
write so simple code...
Some time ago our younger brother Aleksander had started studying of cinematography.
Few days ago he started his own "multimedia" blog (you'll better understand me when you'll see it), where you can see his portfolio. Aleksander's latest work was made with cooperation with Ilan Lahov, see "Bar mitzvah". This work demonstrates Aleksander's progress in this field.
Our congratulations to Aleksander!
If you're writing an application that deals with files in file system on Windows, be sure that sooner or later you run into problems with antivirus software.
Our latest program that handles a lot of huge files and works as a Windows service, it reports time to time about some strange errors. These errors look like the file system disappeared on the fly, or, files were stolen by somebody else (after they have been opened in exclusive mode by our application).
We spent about two weeks in order to diagnose the cause of such behaviour, and then came to conclusion that is a secret work of our antivirus. All such errors disappeared as fog when the antivirus was configurated to skip folders with our files.
Thus, keep in mind our experience and don't allow an ativirus to became an evil.
While looking at some SQL we have realized that it can be considerably optimized.
Consider a table source like this:
with Data(ID, Type, SubType)
(
select 1, 'A', 'X'
union all
select 2, 'A', 'Y'
union all
select 3, 'A', 'Y'
union all
select 4, 'B', 'Z'
union all
select 5, 'B', 'Z'
union all
select 6, 'C', 'X'
union all
select 7, 'C', 'X'
union all
select 8, 'C', 'Z'
union all
select 9, 'C', 'X'
union all
select 10, 'C', 'X'
)
Suppose you want to group data by type, to calculate number of elements in each
group and to display sub type if all rows in a group are of the same sub type.
Earlier we have written the code like this:
select
Type,
case when count(distinct SubType) = 1 then min(SubType) end SubType,
count(*) C
from
Data
group by
Type;
Namely, we select min(SybType) provided that there is a single distinct
SubType , otherwise null is shown. That works perfectly,
but algorithmically count(distinct SubType) = 1 needs to build a set
of distinct values for each group just to ask the size of this set. That is
expensive!
What we wanted can be expressed differently: if min(SybType) and
max(SybType) are the same then we want to display it, otherwise to show
null .
That's the new version:
select
Type,
case when min(SubType) = max(SubType) then min(SubType) end SubType,
count(*) C
from
Data
group by
Type;
Such a simple rewrite has cardinally simplified the execution plan:
Another bizarre problem we have discovered is that SQL Server 2008 R2 just does
not support the following:
select
count(distinct SubType) over(partition by Type)
from
Data
That's really strange, but it's known bug (see
Microsoft Connect).
A database we support for a client contains multi-billion row tables. Many
users query the data from that database, and it's permanently populated
with a new data.
Every day we load several millions rows of a new data. Such loads can lock tables for a
considerable time, so our loading procedures collect new data into intermediate
tables and insert it into a final destination by chunks, and usually after work
hours.
SQL Server 2008 R2 introduced
READ_COMMITTED_SNAPSHOT database option. This feature trades locks for an
increased tempdb size (to store row versions) and possible performance
degradation during a transaction.
When we have switched the database to that option we did
not notice any considerable performance change. Encouraged, we've decided to
increase size of chunks of data we insert at once.
Earlier we have found that when we insert no more than 1000 rows
at once, users don't notice impact, but for a bigger chunk sizes users start to
complain on performance degradation. This has probably happened due to locks
escalations.
Now, with chunks of 10000 or even 100000 rows we have found that no queries
became slower. But load process became several times faster.
We were ready to pay for increased tempdb and transaction log size to increase
performance, but in our case we didn't approach limits assigned by the DBA.
Another gain is that we can easily load data at any time. This makes data we
store more up to date.
Yesterday, by accident, we've seen an article about some design principles
of V8 JavaScript Engine.
It made clearer what techniques are used in today's script implementations.
In particular V8 engine optimizes property access using "dynamically
created
hidden classes". These are structures to store object's layout, they
are derived when ש new property is created (deleted) on the object. When code
accesses a property, and if a cached object's dynamic hidden class is available
at the code point then access time is comparable to one of native fields.
In our opinion this tactics might lead to a proliferation of such dynamic hidden
classes, which requires a considerable housekeeping, which also slows property
write access, especially when it's written for the first time.
We would like to suggest a slightly different strategy, which exploits the
cache matches, and does not require a dynamic hidden classes.
Consider an implementation data type with following characteristics:
- object is implemented as a hash map of property id to property value: Map<ID, Value>;
- it stores data as an array of pairs and can be accessed directly: Pair<ID,
Value> values[];
- property index can be acquired with a method: int index(ID);
A pseudo code for the property access looks like this:
pair = object.values[cachedIndex];
if (pair.ID == propertyID)
{
value = pair.Value;
}
else
{
// Cache miss.
cachedIndex = object.index(propertyID);
value = objec.values[cachedIndex].Value;
}
This approach brings us back to dictionary like implementation but with
important optimization of array speed access when property index is cached, and
with no dynamic hidden classes.
@michaelhkay Saxon 9.4 is out.
But why author does not state that HE version is still xslt/xpath 2.0, as neither xslt maps, nor function items are supported.
Recently, we have found and reported the bug in the SQL Server 2008 (see SQL Server 2008 with(recompile), and also Microsoft Connect).
Persons, who's responsible for the bug evaluation has closed it, as if "By Design". This strange resolution, in our opinion, says about those persons only.
Well, we shall try once more (see Microsoft Connect). We have posted another trivial demonstartion of the bug, where we show that option(recompile) is not used, which leads to table scan (nothing worse can happen for a huge table).
Recently we have introduced some stored procedure in the production and have
found that it performs incredibly slow.
Our reasoning and tests in the development environment did not manifest any
problem at all.
In essence that procedure executes some SELECT and returns a status as a signle
output variable. Procedure recieves several input parameters, and the SELECT
statement uses
with(recompile) execution hint to optimize the performance for a specific
parameters.
We have analyzed the execution plan of that procedure and have found that it
works as if with(recompile) hint was not specified. Without that hint SELECT
failed to use index seek but rather used index scan.
What we have lately found is that the same SELECT that produces result set
instead of reading result into a variable performs very well.
We think that this is a bug in SQL Server 2008 R2 (and in SQL Server 2008).
To demonstrate the problem you can run this test:
-- Setup
create table dbo.Items
(
Item int not null primary key
);
go
insert into dbo.Items
select 1
union all
select 2
union all
select 3
union all
select 4
union all
select 5
go
create procedure dbo.GetMaxItem
(
@odd bit = null,
@result int output
)
as
begin
set nocount on;
with Items as
(
select * from dbo.Items where @odd is null
union all
select * from dbo.Items where (@odd = 1) and ((Item & 1) = 1)
union all
select * from dbo.Items where (@odd = 0) and ((Item & 1) = 0)
)
select @result = max(Item) from Items
option(recompile);
end;
go
create procedure dbo.GetMaxItem2
(
@odd bit = null,
@result int output
)
as
begin
set nocount on;
declare @results table
(
Item int
);
with Items as
(
select * from dbo.Items where @odd is null
union all
select * from dbo.Items where (@odd = 1) and ((Item & 1) = 1)
union all
select * from dbo.Items where (@odd = 0) and ((Item & 1) = 0)
)
insert into @results
select max(Item) from Items
option(recompile);
select @result = Item from @results;
end;
go
Test with output into a variable:
declare @result1 int;
execute dbo.GetMaxItem @odd = null, @result = @result1 output
Test without output directly into a variable:
declare @result2 int;
execute dbo.GetMaxItem2 @odd = null, @result = @result2 output
Now, you can see the difference: the first execution plan uses startup expressions, while the second optimizes execution branches, which are not really used.
In our case it was crucial, as the execition time difference was minutes (and
more in future) vs a split of second.
See also
Microsoft Connect Entry.
A bit history: the first release of this solution was about 9.5 years ago...
Today we've run into a strange situation. One of our clients ask us about automatic conversion of data from mainframe (that were defined as COBOL copybooks) into XML or Java/.NET objects. On our suggestion to use eXperanto, which is well known to him, he stated that he wouldn't like to use a tool of a company that is no more exists...
The situation, in our opinion, become more strange when you consider the following:
- eXperanto (the design-time tool and run-time libraries for Java and .NET) were developed, well tested, and delivered by us to production already several years ago.
- the client bought this set (the tool and libraries).
- the set is in production yet already in another big company, and is used time to time by our company in different migration projects.
- the client talks with developers of this tool and run-time libraries, and he knows about this fact.
- the client uses widely open source solutions even without dedicated vendors or support warranties.
It has happened so, that we have never worked with jQuery, however were aware of
it.
In early 2000 we have developed a web application that contained rich javascript
APIs, including UI components. Later, we were actively practicing in ASP.NET, and
later in JSF.
At present, looking at jQuery more closely we regret that we have failed to
start using it earlier.
Separation of business logic and presentation is remarkable when one uses JSON
web services. In fact server part can be seen as a set of web services
representing a business logic and a set of resources: html, styles, scripts,
others. Nor ASP.NET or JSF approach such a consistent separation.
The only trouble, in our opinion, is that jQuery has no standard data binding: a way to bind JSON data
to (and from) html controls. The technique that will probably be standardized is called jQuery Templates or JsViews
.
Unfortunatelly after reading about this
binding API, and
being in love with Xslt and XQuery we just want to cry. We don't know what would
be the best solution for the task, but what we see looks uncomfortable to us.
Incidentally, we have found one new implementation of yield return in java that is
in the development stage. Sources can be found at
https://github.com/peichhorn/lombok-pg/zipball/master. Just to be sure we
have copied those sources at other place
peichhorn-lombok-pg-0.10.0-39-g384fb7b.zip (you may search "yield"
in the archive).
It's broken according to source tracker, but the funny thing is that sources,
however different, still resemble our yield return implementation (Yield.jar,
Yield.3.7.jar
- Indigo, Yield.zip
- sources) very much: variable names, error messages, algorithmic structure.
Those programmers probably have forgotten good manners: to reference a base work,
at least.
Well, we generously forgive them this blunder.
P.S. our implementation, in contrast, works without bugs.
P.P.S. misunderstanding is resolved. See comments.
A couple of weeks ago, we have suggested to introduce a enumerator function into
the XPath (see
[F+O30] A enumerator function):
I would like the WG to consider an addition of a function that turns a sequence
into a enumeration of values.
Consider a function like this:
fn:enumerator($items as item()*) as function() as item()?;
alternatively, signature could be:
fn:enumerator($items as function() as item()*) as function() as item()?;
This function receives a sequence, and returns a function item, which upon N's
call shall return N's element of the original sequence. This way, a sequence of
items is turned into a function providing a enumeration of items of the
sequence.
As an example consider two functions:
a) t:rand($seed as xs:double) as xs:double* - a function producing a random
number sequence;
b) t:work($input as element()) as element() - a function that generates output
from it's input, and that needs random numbers in the course of the execution.
t:work() may contain a code like this:
let $rand := fn:enumerator(t:rand($seed)),
and later it can call $rand() to get a random numbers.
Enumerators will help to compose algorithms where one algorithm communicate
with other independant algorithms, thus making code simpler. The most obvious
class of enumerators are generators: ordered numbers, unique identifiers,
random numbers.
Technically, function returned from fn:enumerator() is nondetermenistic, but its "side effect" is
similar to a "side effect" of a function generate-id() from a newly created
node (see bug #13747, and bug #13494).
The idea is inspired by a generator function, which returns a new value upon each
call.
Such function can be seen as a stateful object. But our goal is to look at
it in a more functional way. So, we look at the algorithm as a function that
produces a sequence of output, which is pure functional; and an enumerator that
allows to iterate over algorithm's output.
This way, we see the function that implements an algorithm and the function that
uses it can be seen as two thread of functional programs that use messaging to
communicate to each other.
Honestly, we doubt that WG will accept it, but it's interesting to watch the
discussion.
|