Good bad and good news.
- Good: recently a new version Saxon XSLT processor was published:
-
12 May 2016
Saxon 9.7.0.5 maintenance release for Java and .NET.
- Bad: we run that release on our code base and found a bug:
-
See Internal error in Saxon-HE-9.7.0-5
- Good: Michael Kay has confirmed the problem and even fixed it:
-
See Bug #2770
- The only missing ingredient is when the patch will be available to the public:
"We tend to do a new maintenance release every 4-6 weeks. Can't commit to firm dates."
Our recent task required us to find all sets of not intersecting rectangles for a rectangle list.
At first glance it did not look like a trivial task. Just consider that for a list of N rectangles you can form
2^N different subsets. So, even result list, theoretically, can be enormous.
Fortunately, we knew that our result will be manageable in size. But nevertheless, suppose you have a list of
couple of hundred rectangles, how would you enumerate all different sets of rectangles?
By the way, this task sounds the same as one of a Google interview's question. So, you may try to solve it by yourself before to check our solution.
We didn't even dare to think of brute-force solution: to enumerate all sets and then check each one whether it fits our needs.
Instead we used induction:
- Suppose S(N) - is an solution for our task for N rectangles R(n), where S(N) is a set of sets of rectangles;
- Then solution for S(N+1) will contain whole S(N), R(N+1) - a set consisting of single rectangle, and
some sets of rectangles from S(N) combinded with R(N+1) provided they fit the condition;
- S(0) - is an empty set.
The algorithm was implemented in java, and at first it was using
Streaming and recursion.
Then we have figured out that we can use
Stream.reduce or Stream.collect to implement
the same algorithm. That second implementation was a little bit longer but probably faster, and besides it used standard idioms.
But then at last step we reformulated the algorithms in terms of
Collections.
Though the final implementation is the least similar to original induction algorithm,
it's straightforward and definitely fastest among all implementations we tried.
So, here is the code:
/**
* For a sequence of items builds a list of matching groups.
* @param identity an identity instance used for the group.
* @param items original sequence of items.
* @param matcher a group matcher of item against a group.
* @param combiner creates a new group from a group (optional) and an item.
* @return a list of matching groups.
*/
public static <T, G> List<G> matchingGroups(
G identity,
Iterable<T> items,
BiPredicate<G, T> matcher,
BiFunction<G, T, G> combiner)
{
ArrayList<G> result = new ArrayList<>();
for(T item: items)
{
int size = result.size();
result.add(combiner.apply(identity, item));
for(int i = 0; i < size; ++i)
{
G group = result.get(i);
if (matcher.test(group, item))
{
result.add(combiner.apply(group, item));
}
}
}
return result;
}
The sample project on GitHub contains implementation and a tests of this algorithm.
8 Ways to Become a Better Coder is a good article. Read and apply to yourself. Never mind what your occupation is. Replace "coder" with your profession. Suits to everybody who wants to be the best.
Visitor pattern is often used to separate operation from object graph it operates with. Here we assume that the reader is familiar with the subject.
The idea is like this:
- The operation over object graph is implemented as type called
Visitor .
Visitor defines methods for each type of object in the graph, which a called during traversing of the graph.
- Traversing over the graph is implemented by a type called
Traverser , or by the Visitor or by each object type in the graph.
Implementation should collect, aggregate or perform other actions during visit of objects in the graph, so that at the end of the visit the purpose of operation will be complete.
Such implementation is push-like: you create operation object and call a method that gets object graph on input and returns operation result on output.
In the past we often dealt with big graphs (usually these are virtual graphs backended at database or at a file system).
Also having a strong experience in the XSLT we see that the visitor pattern in OOP is directly mapped into xsl:template and xsl:apply-templates technique.
Another thought was that in XML processing there are two camps:
- SAX (push-like) - those who process xml in callbacks, which is very similar to visitor pattern; and
- XML Reader (pull-like) - those who pull xml components from a source, and then iterate and process them.
As with SAX vs XML Reader or, more generally, push vs pull processing models, there is no the best one. One or the other is preferable in particular circumstances. E.g. Pull like component fits into a transformation pipeline where one pull component has another as its source; another example is when one needs to process two sources at once, which is untrivial with push like model. On the other hand push processing fits better into Reduce part of MapReduce pattern where you need to accumulate results from source.
So, our idea was to complete classic push-like visitor pattern with an example of pull-like implementation.
For the demostration we have selected Java language, and a simplest boolean expression calculator.
Please follow GitHub nesterovsky-bros/VisitorPattern to see the detailed explanation.
Until now we've been aware of 2 excelent artificial intelligence frameworks written in C#. These are AForge.NET and its successor Accord.NET. The both include a lot of algorithms for solving wide range of tasks.
Yesterday we've discovered that Microsoft has published as an open-source project their Computational Network Toolkit in order to speed up advances in artificial intelligence and made it available to a broader group of developers.
The sources written in C++ which scales good and uses GPUs. This gives a competitive advantage to CNTK, see more details here.
Although the main aim of such development was speech recognition, the CNTK contains a Neural Network framework that may be used for other artificial intelligence tasks.
It's very old theme...
Many years ago we have defined a .NET wrapper around Windows Uniscribe API.
Uniscribe API is used to render bidirectional languages like Hebrew, so it's important mainly here in Israel.
Once in a while we get request from people to give that API, so we published it on GitHub at https://github.com/nesterovsky-bros/BidiVisualConverter.
You're welcome to use it!
Essence of the problem (see Error during transformation in Saxon 9.7, thread on forum):
- XPath engine may arbitrary reorder predicates whose expressions do not depend on a context position.
- While an XPath expression
$N[@x castable as xs:date][xs:date(@x) gt xs:date("2000-01-01")] cannot raise an error if it's evaluated from the left to right, an expression with reordered predicates $N[xs:date(@x) gt xs:date("2000-01-01")][@x castable as xs:date] may generate an error when @x is not a xs:date .
To avoid a potential problem one should rewrite the expression like this: $N[if (@x castable as xs:date) then xs:date(@x) gt xs:date("2000-01-01") else false()] .
Please note that the following rewrite will not work: $N[(@x castable as xs:date) and (xs:date(@x) gt xs:date("2000-01-01"))] , as arguments of and expression can be evaluated in any order, and error that occurs during evaluation of any argument may be propageted.
With these facts we faced a task to check our code base and to fix possible problems.
A search has brought ~450 instances of XPath expessions that use two or more consequtive predicates. Accurate analysis limited this to ~20 instances that should be rewritten. But then, all of sudden, we have decided to commit an experiment. What if we split XPath expression in two sub expressions. Can error still resurface?
Consider:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:variable name="elements" as="element()+"><a/><b value="c"/></xsl:variable>
<xsl:template match="/">
<xsl:variable name="a" as="element()*" select="$elements[self::d or self::e]"/>
<xsl:variable name="b" as="element()*" select="$a[xs:integer(@value) = 1]"/>
<xsl:sequence select="$b"/>
</xsl:template>
</xsl:stylesheet>
As we expected Saxon 9.7 internally assembles a final XPath with two predicates and reorders them. As result we get an error:
Error at char 20 in xsl:variable/@select on line 8 column 81 of Saxon9.7-filter_speculation.xslt:
FORG0001: Cannot convert string "c" to an integer
This turn of events greately complicates the code review we have to commit.
Michiel Kay's answer to this example:
I think your argument that the reordering is inappropriate when the expression is written using variables is very powerful. I shall raise the question with my WG colleagues.
In fact we think that either: reordering of predicates is inappropriate, or (weaker, to allow reordering) to treat an error during evaluation of predicate expression as false() . This is what is done in XSLT patterns. Other solutions make XPath less intuitive.
In other words we should use XPath (language) to express ideas, and engine should correctly and efficiently implement them. So, we should not be forced to rewrite expression to please implementation.
On December, 30 we have opened a thread in Saxon help forum that shows a stylesheet generating an error. This is the stylesheet:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsl:variable name="elements" as="element()+"><a/><b value="c"/></xsl:variable>
<xsl:template match="/">
<xsl:sequence select="$elements[self::d or self::e][xs:integer(@value) = 1]"/>
</xsl:template>
</xsl:stylesheet>
We get an error:
Error at char 47 in xsl:sequence/@select on line 7 column 83 of Saxon9.7-filter_speculation.xslt:
FORG0001: Cannot convert string "c" to an integer
Exception in thread "main" ; SystemID: .../Saxon9.7-filter_speculation.xslt; Line#: 7; Column#: 47
ValidationException: Cannot convert string "c" to an integer
at
...
It's interesting that error happens in Saxon 9.7 but not in earlier versions.
The answer we got was expected but disheartening:
The XPath specification (section 2.3.4, Errors and Optimization) explicitly allows the predicates of a filter expression to be reordered by an optimizer. See this example, which is very similar to yours:
The expression in the following example cannot raise a casting error if it is evaluated exactly as written (i.e., left to right). Since neither predicate depends on the context position, an implementation might choose to reorder the predicates to achieve better performance (for example, by taking advantage of an index). This reordering could cause the expression to raise an error.
$N[@x castable as xs:date][xs:date(@x) gt xs:date("2000-01-01")]
Following the spec, Michael Kay advices us to rewrite XPath:
$elements[self::d or self::e][xs:integer(@value) = 1]
like this:
$elements[if (self::d or self::e) then xs:integer(@value) = 1 else false()]
Such subtleties make it hard to reason about and to teach XPath. We doubt many people will spot the difference immediately.
We think that if such optimization was so much important to spec writers, then they had to change filter rules to treat failed predicates as false() . This would avoid any obscure differences in these two, otherwise equal, expressions. In fact something similar already exists with templates where failed evaluation of pattern is treated as un-match.
A collegue has approached to us with a question on how Akinator engine may work.
To our shame we have never heard about this amazing game before. To fill the gap we have immediately started to play it, and have identified it as a Troubleshooting solver.
It took us a couple of minutes to come up with a brilliant solution: "We just need to google and find the engine in the internet".
Unfortunately, this led to nowhere, as no Akinator itself is open sourced, and no other good quality open source solutions are available.
After another hour we have got two more ideas:
- The task should fit into SQL;
- The task is a good candidate for a neural network.
In fact, the first might be required to teach the second, so we have decided to formalize the problem in terms of SQL, while still keeping in mind a neural network.
With this goal we have created a GitHub project. Please see the algorithm and its implementation at github.com/nesterovsky-bros/KB.
Some of our latest projects used file uploading feature. Whether this is an excel, an audio or an image file, the file uploading mechanism remains the same. In a web application an user selects a file and uploads it to the server. Browser sends this file as a multipart-form file attachment, which is then handled on server.
The default HTML way to upload file to server is to use <input type="file"> element on a form. The rendering of such element is different in different browsers and looks rather ugly. Thus, almost all well known javascript libraries like JQuery, Kendo UI etc. provide their own implementations of file upload component. The key statement here is "almost", since in AngularJS bootstrap we didn't find anything like that. It worth to say that we've found several third-party implementations for file upload, but they either have rather complex implementation for such simple task or don't provide file selection feature. This is the reason why we've decided to implement this task by ourselves.
Sources of our solution with upload-link directive and uiUploader service you may find here.
Their usage is rather simple.
E.g. for upload-ink directive:
<a upload-link
class="btn btn-primary"
accept=".*"
server-url="api/upload"
on-success="controller.uploadSucceed(data, file)"
on-error="controller.uploadFailed(e)">Click here to upload an image</a>
Where:
- accept
- is a comma separated list of acceptable file extensions.
- server-url
-
is server URL where to upload the selected file.
In case when there is no "server-url" attribute the content of selected file will be
passed to success handler as a data URI.
- on-success
- A "success" handler, which is called when upload is finished successfully.
- on-error
- An "error" handler, which is called when upload is failed.
Usage of uiUploader service is also easy:
uiUploader.selectAndUploadFile("api/upload", ".jpg,.png,.gif").
then(
function(result)
{
// TODO: handle the result.
// result.data - contains the server response
// result.file - contains uploaded File or Blob instance.
},
$log.error);
In case when the first parameter is null the result.data contains the selected file content as a
data URI.
In a web project we needed to provide a region selection tool.
This requirement is resulted in a javascript module selectionTool, and in an angularjs wrappers selection, and clip.
There are samples test.html, and angularjs-test.html.
The module is implemented through SVG manipulation. Selection paths are defined in terms of SVG.
The simplest way to start with this API is through test pages.
From the client perspective API allows to:
- create a new selection - click and drag path;
- select/unselect selection - click selection overlay or image area;
- move selected path - drag selected overlay, or click arrow keys;
- move selected edge - drag selected edge;
- move selected vertex - drag selected vertex;
- delete selected path - Delete button;
- add selection vertex - double click or ctrl + click a selection edge;
- remove selection vertex - double click or ctrl + click a selection vertex;
- scale selection - shift + drag selection, or shift + arrow keys;
- rotate selection - ctrl + drag selection, or ctrl + arrow keys.
Sources can be found at GitHub: nesterovsky-bros/selection.
It's time to align csharpxom to the latest version of C#. The article New Language Features in C# 6 sums up what's being added.
Sources can be found at nesterovsky-bros/languages-xom, and C# model is at csharp folder.
In general we feel hostile to any new features until they prove they bring an added value. So, here our list of new features from most to least useless:
String interpolation
var s = $"{p.Name} is {p.Age} year{{s}} old";
This is useless, as it does not account resource localization.
Null-conditional operators
int? first = customers?[0].Orders?.Count();
They claim to reduce cluttering from null checks, but in our opinion it looks opposite. It's better to get NullReferenceException if arguments are wrong.
Exception filters
private static bool Log(Exception e) { /* log it */ ; return false; }
…
try { … } catch (Exception e) when (Log(e)) {}
"It is also a common and accepted form of “abuse” to use exception filters for side effects; e.g. logging."
Design a feature for abuse just does not tastes good.
Expression-bodied function and property members.
public Point Move(int dx, int dy) => new Point(x + dx, y + dy);
public string Name => First + " " + Last;
Not sure it's that usefull.
We have solved this problem years ago, but have run into it once again.
So, we shall log the solution here.
The problem: to minify payload of the JAXB serialized beans.
Java beans have many properties most of them contains default values: zero ints, empty strings, and so on.
JAXB never tries to omit default value from marshalled xml, the only thing it can remove from output is null values. So, our approach is to define xml adapter to map default values to nulls.
Here we refer to the StackOverflow question: Prevent writing default attribute values JAXB, and to our answer.
Though it's not as terse as one would wish, one can create XmlAdapters to avoid marshalling the default values.
The use case is like this:
@XmlRootElement(name = "FIELD")
public class TestLayoutNode
{
@XmlAttribute(name = "num")
@XmlJavaTypeAdapter(value = IntegerZero.class, type = int.class)
public int number;
@XmlAttribute(name = "str")
@XmlJavaTypeAdapter(StringDefault.class)
public String str = "default";
}
And here are adapters.
IntegerZero:
public class IntegerZero extends DefaultValue<Integer>
{
public Integer defaultValue() { return 0; }
}
StringDefault:
public class StringDefault extends DefaultValue<String>
{
public String defaultValue() { return "default"; }
}
DefaultValueAdapter:
public class DefaultValue<T> extends XmlAdapter<T, T>
{
public T defaultValue() { return null; }
public T marshal(T value) throws Exception
{
return (value == null) || value.equals(defaultValue()) ? null : value;
}
public T unmarshal(T value) throws Exception
{
return value;
}
}
With small number of different default values this approach works well.
Taking into an account that we use Saxon for many years, it was strange to run into so simple error like the following:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:variable name="doc" as="element()+"><a/><b/><c/></xsl:variable>
<xsl:sequence select="$doc = 3"/>
</xsl:template>
</xsl:stylesheet>
This is a simplified case that should produce an dynamic error FORG0001 as per General Comparisions; the real code is more complex, as it uses SFINAE and continues.
This case crushes in Saxon with exception:
Exception in thread "main" java.lang.RuntimeException:
Internal error evaluating template at line 3 in module ICE9.6.xslt
at net.sf.saxon.expr.instruct.Template.applyLeavingTail()
at net.sf.saxon.trans.Mode.applyTemplates()
at net.sf.saxon.Controller.transformDocument()
at net.sf.saxon.Controller.transform()
at net.sf.saxon.s9api.XsltTransformer.transform()
at net.sf.saxon.jaxp.TransformerImpl.transform()
...
Caused by: java.lang.NumberFormatException: For input string: ""
at java.lang.NumberFormatException.forInputString()
at java.lang.Long.parseLong()
at java.lang.Long.parseLong()
at net.sf.saxon.expr.GeneralComparison.quickCompare()
at net.sf.saxon.expr.GeneralComparison.compare()
at net.sf.saxon.expr.GeneralComparison.evaluateManyToOne()
at net.sf.saxon.expr.GeneralComparison.evaluateItem()
at net.sf.saxon.expr.GeneralComparison.evaluateItem()
at net.sf.saxon.expr.Expression.process()
at net.sf.saxon.expr.instruct.Template.applyLeavingTail()
... 8 more
We have reported the problem at Saxon's forum, and as usual the problem was shortly resolved.
Though ADO.NET and other ORM framworks like EntityFramework and Dapper support async pattern, you should remember that database drivers (at least all we know about) do not support concurrent db commands running against a single connection.
To see what we mean consider a bug we have recently identified. Consider a code:
await Task.WhenAll(
newImages.
Select(
async image =>
{
// Load data from url.
image.Content = await HttpUtils.ReadData(image.Url);
// Insert image into the database.
image.ImageID = await context.InsertImage(image);
}));
The code runs multiple tasks to read images, and to write them into a database.
Framework decides to run all these tasks in parallel. HttpUtils.ReadData() has no problem with parallel execution, while context.InsertImage() does not run well in parallel, and is a subject of race conditions.
To workaround the problem we had to use async variant of a critical section. So the fixed code looks like this:
using(var semaphore = new SemaphoreSlim(1))
{
await Task.WhenAll(
newImages.
Select(
async image =>
{
// Load data from url.
image.Content = await HttpUtils.ReadData(image.Url);
await semaphore.WaitAsync();
try
{
// Insert image into the database.
image.ImageID = await context.InsertImage(image);
}
finally
{
semaphore.Release();
}
}));
}
So, in the async world we still should care about race conditions.
|