We needed to track a stream position during creation of xml file. This is to
allow random access to a huge xml file (the task is related to
WindowsSearch).
This is a simplified form of the xml:
<data>
<item>...</item>
...
<item>...</item>
</data>
The goal was to have stream position of each item element. With this in mind,
we've decided to:
- open a stream, and then xml writer over it;
- write data into xml writer;
- call
Flush() method of the xml writer before measuring stream offset;
That's a code sample:
var stream = new MemoryStream();
var writer = XmlWriter.Create(stream);
writer.WriteStartDocument();
writer.WriteStartElement("data");
for(var i = 0; i < 10; ++i)
{
writer.Flush();
Console.WriteLine("Flush offset: {0}, char: {1}",
stream.Position,
(char)stream.GetBuffer()[stream.Position - 1]);
writer.WriteStartElement("item");
writer.WriteValue("item " + i);
writer.WriteEndElement();
}
writer.WriteEndElement();
writer.WriteEndDocument();
That's the output:
Flush offset: 46, char: a
Flush offset: 66, char: >
Flush offset: 85, char: >
Flush offset: 104, char: >
Flush offset: 123, char: >
Flush offset: 142, char: >
Flush offset: 161, char: >
Flush offset: 180, char: >
Flush offset: 199, char: >
Flush offset: 218, char: >
Funny, isn't it?
After feeding the start tag <data> , and flushing xml writer we observe that only
"<data" has been written down to the stream. Well,
Flush() have never promissed anything particular about the content
of the stream, so we cannot claim any violation, however we expected to see
whole start tag.
Inspection of the implementation of xml writer reveals laziness during writting
data down the stream. In particular start tag is closed when one starts the
content. This is probably to implement empty tags: <data/> .
To do the trick we had to issue empty content, moreover, to call a particular
method with particular parameters of the xml writer. So the code after the fix
looks like this:
var stream = new MemoryStream();
var writer = XmlWriter.Create(stream);
writer.WriteStartDocument();
writer.WriteStartElement("data");
char[] empty = { ' ' };
for(var i = 0; i < 10; ++i)
{
writer.WriteChars(empty, 0, 0);
writer.Flush();
Console.WriteLine("Flush offset: {0}, char: {1}",
stream.Position,
(char)stream.GetBuffer()[stream.Position - 1]);
writer.WriteStartElement("item");
writer.WriteValue("item " + i);
writer.WriteEndElement();
}
writer.WriteEndElement();
writer.WriteEndDocument();
And output is:
Flush offset: 47, char: >
Flush offset: 66, char: >
Flush offset: 85, char: >
Flush offset: 104, char: >
Flush offset: 123, char: >
Flush offset: 142, char: >
Flush offset: 161, char: >
Flush offset: 180, char: >
Flush offset: 199, char: >
Flush offset: 218, char: >
While this code works, we feel uneasy with it.
What's the better way to solve the task?
Update: further analysis shows that it's
only possible behaviour, as after the call to write srart element, you either
can write attributes, content or end of element, so writer may write either
space, '>' or '/>' . The only
question is why it takes WriteChars(empty, 0, 0) into account and WriteValue("")
it doesn't.
As you probably know we have implemented our custom Protocol Handler for the Windows
Search.
It's called .xml-gz, and has a goal to index compressed xml files and to have
search results with a subtree precision. So, for xml:
<data>
<item>...</item>
<item>...</item>
...
</data>
search finds results within item and returns xml's url and stream
offset of the item. Using ZLIB API we can compress data with stream bookmarks, so fast random
access to the data is possible.
The only problem we have is about notification of changes (create, delete, update)
of such files.
Spec describes several techniques (nothing has worked for us):
1. Call catalogManager.ReindexMatchingURLs()
- it just returns without any impact.
2.Call changeSink.OnItemsChanged()
- returns error.
3. Implement
.xml-gz IFilter and call IGatherNotifyInline (see "
have your .zip urls indexed when they are created or modified") -
that's a mistery, as:
4. Implement root url in form .xml-gz:/// and perform Windows Search:
SELECT
System.ItemUrl, System.DateModified
FROM
SystemIndex WHERE System.FileExtension='.xml-gz'
to find all .xml-gz sources. This is not reliable, as your protocol handler can
be (and is) called before file is indexed.
So, the only reliable way to index your data is to (re-)add indexing rule for
the protocol handler, which in most cases reindexes everything.
The only bearable solution we found is to define indexing rule in the form:
.xml-gz://file:d:/data/... and to use
IShellFolder(2)
interfaces to discover sub items and their modification times. This technique allows
minimal data scan when you're (re-)add indexing rule.
Being unexperienced with Windows Search we tried to build queries to find data in the huge storage. We needed to find a document that matches some name pattern and contains some text.
Our naive query was like this:
select top 1000 System.ItemUrl from SystemIndex where scope = '...' and System.ItemName like '...%' and contains('...')
In most cases this query returns nothing and runs very long. It's interesting to note that it may start returning data if "top " clause is missing or uses a bigger number, but in this cases query is slower even more.
Next try was like this:
select top 1000 System.ItemUrl from SystemIndex where scope = '...' and System.ItemName >= '...' and System.ItemName < '...' and contains('...')
This query is also slow, but at least it returns some results.
At some point we have started to question the utility of Windows Search if it's so slow, but then we have found that there is a property System.ItemNameDisplay , which in our case coincides with the value of property System.ItemName , so we have tried the query:
select top 1000 System.ItemUrl from SystemIndex where scope = '...' and System.ItemNameDisplay like '...%' and contains('...')
This query worked fast, and produced good results. This hints that search engine has index on System.ItemNameDisplay in contrast to System.ItemName property.
We've looked at property definitions:
System.ItemNameDisplay
The display name in "most complete" form. It is the unique representation of the item name most appropriate for end users.
propertyDescription name = System.ItemNameDisplay shellPKey = PKEY_ItemNameDisplay formatID = B725F130-47EF-101A-A5F1-02608C9EEBAC propID = 10 searchInfo inInvertedIndex = true isColumn = true isColumnSparse = false columnIndexType = OnDisk maxSize = 128
System.ItemName
The base name of the System.ItemNameDisplay property.
propertyDescription name = System.ItemName shellPKey = PKEY_ItemName formatID = 6B8DA074-3B5C-43BC-886F-0A2CDCE00B6F propID = 100 searchInfo inInvertedIndex = false isColumn = true isColumnSparse = false columnIndexType = OnDisk maxSize = 128
Indeed, one property is indexed, while the other is not.
As with other databases, query is powerful when engine uses indices rather than performs data scan. This is also correct for Windows Search.
The differences in results that variations of query produce also manifests that Windows Search nevertheless is very different from relational database.
We have developed our custom Windows Search Protocol Handler. The role of this component is to expose items of complex content (or unusual storage) to Windows Search.
You can think of some virtual folder, so a Protocol Handler allows to enumerate it's files, file properties, and contents.
The goal of our Protocol Handler is to represent some data structure as a set of xml files. We expected that if we found a data within a folder with these files, then a search within Protocol Handler's scope would bring the same (or almost the same) results.
Reality is different.
For some reason .xml IFilter (a component to extract text data to index) works differently with file system and with our storage. We cannot state that it does not work, but for some reason many words that Windows Search finds within a file are never found within Protocol Handler scope.
We have observed that if, for purpose of indexing, we represent content xml items as .txt files, then search works as expected. So, our workaround was to present only xml's text data for the indexing, and to use .txt IFilter (this in fact roughly what .xml IFilter does by itself).
Is there a conclusion?
Well, Windows Search is a black box probably containing bugs. Its behaviour is not always obvious.
Let's put it blatantly: Windows Search 4 has design and implementation problems.
You discover this immediatelly when you start implementing indexing of custom file format.
If you want to index simple file format then you need to imlement you IFilter interface. But if it has happened so that you want to index compound data then you should invent you own protocols.
If you will fugure out how to implement your protocol to index that compound data, then you will most probably stuck on the problem on how to notify indexer about the changes.
The problem is that Windows Search 4 has API to reindex urls, which simply does not work, or to notify indexer about changes, which throws an error (returns error HRESULT) for custom protocols. At least, we were not able to make it run.
There is a problem with XML serialization of BigDecimal values, as we've written in one of our previous articles "BigDecimal + JAXB => potential interoperability problems". And now we ran into issue with serialization of double / Double values. All such values, except zero, serialize in scientific format, even a value contains only integer part. For example, 12 will be serialized as 1.2E+1. Actually this is not contradicts with XML schema definitions.
But what could be done, if you want to send/receive double and/or decimal values in plain format. For example you want serialize a double / BigDecimal value 314.15926 in XML as is. In this case you ought to use javax.xml.bind.annotation.adapters.XmlAdapter .
In order to solve this task we've created two descendants of XmlAdapter (the first for double / Double and the second for BigDecimal ), click here to download the sources.
Applying these classes on properties or package level you may manage XML serialization of numeric fields in your classes.
See this article for tips how to use custom XML serialization.
Do you know that the best JSF/Facelets visual editor, in our opinion, is ... Microsoft Visual Studio 2008? Another rather good JSF editor is presented in IBM RAD 7.xx. The most popular, open source Java IDE Eclipse contains an ugly implementation of such useful thing.
We did not update
languages-xom already for many monthes but now we have found a severe bug
in the jxom's algorithm for eliminating unreachable code. The marked line
were considered as unreachable:
check:
if (condition)
{
break check;
}
else
{
return;
}
// due to bug the following was considered unreachable
expression;
Bug is fixed.
Current update contains other cosmetic fixes.
Please download xslt sources from languages-xom.zip.
Summary
Languages XOM is a set of xml schemas and xslt stylesheets that allows:
- to define programs in xml form;
- to perform transformations over code in xml form;
- to generate sources.
Languages XOM includes:
- jxom - Java Xml Object model;
- csharpxom - C# Xml Object Model;
- cobolxom - COBOL Xml Object Model;
- sqlxom - SQL Xml Object Model (including several sql dialects);
- aspx - ASP.NET Object Model;
A proprietary part of languages XOM also includes XML Object Model for a
language named Cool:GEN. In fact the original purpose for this API was a
generation of java/C#/COBOL from Cool:GEN. For more details about Cool:GEN
conversion please see
here.
Ladies and gentlemen!
We are proud and would like to announce few works of our younger brother Aleksander, who studies cinematography now.
As you may know, JAX-WS uses javax.xml.datatype.XMLGregorianCalendar abstract class in order
to present date/time data type fields. We have used this class rather long time in
happy ignorance without of any problem. Suddenly, few days ago, we ran into a weird bug
of its Sun’s implementation (com.sun.org.apache.xerces.internal.jaxp.datatype.XMLGregorianCalendarImpl ).
The bug appears whenever we try to convert an XMLGregorianCalendar instance
to a java.util.GregorianCalendar using toGregorianCalendar() method.
I’ve written a simple JUnit test in order to demonstrate this bug:
@Test
public void testXMLGregorianCalendar()
throws Exception
{
SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
XMLGregorianCalendar calendar =
javax.xml.datatype.DatatypeFactory.newInstance().newXMLGregorianCalendar();
calendar.setDay(1);
calendar.setMonth(1);
calendar.setYear(1);
System.out.println("1: " + calendar.toString());
System.out.println("2: " +
formatter.format(calendar.toGregorianCalendar().getTime()));
GregorianCalendar cal = new GregorianCalendar(
calendar.getYear(),
calendar.getMonth() - 1,
calendar.getDay());
cal.clear(Calendar.AM_PM);
cal.clear(Calendar.HOUR_OF_DAY);
cal.clear(Calendar.HOUR);
cal.clear(Calendar.MINUTE);
cal.clear(Calendar.SECOND);
cal.clear(Calendar.MILLISECOND);
System.out.println("3: " + formatter.format(cal.getTime()));
/*
* Output:
*
* 1: 0001-01-01
* 2: 0001-01-03 00:00:00
* 3: 0001-01-01 00:00:00
*/
}
As you see, the date 0001-01-01 is transformed to 0001-01-03 after call of
toGregorianCalendar() method (see output 2).
Moreover, if we’ll serialize this XMLGregorianCalendar instance to XML we’ll see
it as 0001-01-01+02:00 which is rather weird and could be potential problem for
interoperability between Java and other platforms.
Conclusion: in order to convert XMLGregorianCalendar value to
GregorianCalendar do the following. Create a new instance of
GregorianCalendar and just set the corresponding fields with
values from XMLGregorianCalendar instance.
Already for a couple of days we're trying to create a UserControl containing a
TabContainer. To achieve the goal we have
created a page with a ToolkitScriptManager and a user control itself.
Page:
<form runat="server">
<ajax:ToolkitScriptManager ID="ToolkitScriptManager1" runat="server"/>
<uc1:WebUserControl1 ID="WebUserControl11" runat="server" />
</form>
User control:
<%@ Control Language="C#" %>
<%@ Register
Assembly="AjaxControlToolkit"
Namespace="AjaxControlToolkit"
TagPrefix="ajax" %>
<ajax:TabContainer ID="Tab" runat="server" Width="100%">
<ajax:TabPanel runat="server" HeaderText="Tab1" ID="Tab1">
<ContentTemplate>Panel 1</ContentTemplate>
</ajax:TabPanel>
</ajax:TabContainer>
What could be more simple?
But no, there is a problem. At run-time
control works perfectly, but at the
designer it shows an error instead of a normal design view:
Error Rendering Control - TabContainer1
An unhandled exception has occurred.
Could not find any resources appropriate for the specified culture or the neutral culture. Make sure "AjaxControlToolkit.Properties.Resources.NET4.resources" was correctly embedded or linked into assembly "AjaxControlToolkit" at compile time, or that all the satellite assemblies required are loadable and fully signed.
That's a stupid error, which says nothing about the real problem reason. We had
to attach a debugger to a Visual Studio just to realize what the problem is.
So, the error occurs at the following code of AjaxControlToolkit.ScriptControlBase:
private void EnsureScriptManager()
{
if (this._scriptManager == null)
{
this._scriptManager = ScriptManager.GetCurrent(this.Page);
if (this._scriptManager == null)
{
throw new HttpException(Resources.E_NoScriptManager);
}
}
}
Originally, the problem is due to the fact that ScriptManager is not found, and code
wants to report an HttpException , but fun is that we recieve a different exception, which is releted to a missing resouce text for a message Resources.E_NoScriptManager .
It turns out that E_NoScriptManager text is found neither in
primary no in resource assemblies.
As for original problem, it's hard to say about reason of why ScriptManager is
not available at design time. We, however, observed that a ScriptManager
registers itself for a ScriptManager.GetCurrent() method at
run-time only:
protected internal override void OnInit(EventArgs e)
{
...
if (!base.DesignMode)
{
...
iPage.Items[typeof(ScriptManager)] = this;
...
}
}
So, it's not clear what they (toolkit's developers) expected to get at design
time.
These observations leave uneasiness regarding the quality of the library.
Earlier, we have described an approach to call Windows Search from SQL Server 2008. But it has turned out that our problem is more complicated...
All has started from the initial task:
- to allow free text search in a store of huge xml files;
- files should be compressed, so these are *.xml.gz;
- search results should be addressable to a fragment within xml.
Later we shall describe how we have solved this task, and now it's enough to say that we have implemented a Protocol Handler for Windows Search named '.xml-gz:'. This way original file stored say at 'file:///c:/store/data.xml-gz' is seen as a container by the Windows Search:
- .xml-gz:///file:c:/store/data.xml-gz/id1.xml
- .xml-gz:///file:c:/store/data.xml-gz/id2.xml
- ...
This way search xml should be like this:
select System.ItemUrl from SystemIndex where scope='.xml-gz:' and contains(...)
Everything has worked during test: we have succeeded to issue Windows Search selects from SQL Server and join results with other sql queries.
But later on when we considered a runtime environment we have seen that our design won't work. The reason is simple. Windows Search will work on a computer different from those where SQL Servers run. So, the search query should look like this:
select System.ItemUrl from Computer.SystemIndex where scope='.xml-gz:' and contains(...)
Here we have realized the limitation of current (Windows Search 4) implementation: remote search works for shared folders only, thus query may only look like:
select System.ItemUrl from Computer.SystemIndex where scope='file://Computer/share/' and contains(...)
Notice that search restricts the scope to a file protocol, this way remoter search will never return our results. The only way to search in our scope is to perform a local search.
We have considered following approaches to resolve the issue.
The simplest one would be to access Search protocol on remote computer using a connection string: "Provider=Search.CollatorDSO;Data Source=Computer" and use local queries. This does not work, as provider simply disregards Data Source parameter.
The other try was to use MS Remote OLEDB provider. We tried hard to configure it but it always returns obscure error, and more than that it's deprecated (Microsoft claims to remove it in future).
So, we decided to forward request manually:
- SQL Server calls a web service (through a CLR function);
- Web service queries Windows Search locally.
Here we considered WCF Data Services and a custom web service.
The advantage of WCF Data Services is that it's a technology that has ambitions of a standard but it's rather complex task to create implementation that will talk with Windows Search SQL dialect, so we have decided to build a primitive http handler to get query parameter. That's trivial and also has a virtue of simple implementation and high streamability.
So, that's our http handler (WindowsSearch.ashx):
<%@ WebHandler Language="C#" Class="WindowsSearch" %>
using System; using System.Web; using System.Xml; using System.Text; using System.Data.OleDb;
/// <summary> /// A Windows Search request handler. /// </summary> public class WindowsSearch: IHttpHandler { /// <summary> /// Handles the request. /// </summary> /// <param name="context">A request context.</param> public void ProcessRequest(HttpContext context) { var request = context.Request; var query = request.Params["query"]; var response = context.Response;
response.ContentType = "text/xml"; response.ContentEncoding = Encoding.UTF8;
var writer = XmlWriter.Create(response.Output);
writer.WriteStartDocument(); writer.WriteStartElement("resultset");
if (!string.IsNullOrEmpty(query)) { using(var connection = new OleDbConnection(provider)) using(var command = new OleDbCommand(query, connection)) { connection.Open();
using(var reader = command.ExecuteReader()) { string[] names = null;
while(reader.Read()) { if (names == null) { names = new string[reader.FieldCount];
for (int i = 0; i < names.Length; ++i) { names[i] = XmlConvert.EncodeLocalName(reader.GetName(i)); } }
writer.WriteStartElement("row");
for(int i = 0; i < names.Length; ++i) { writer.WriteElementString( names[i], Convert.ToString(reader[i])); }
writer.WriteEndElement(); } } } }
writer.WriteEndElement(); writer.WriteEndDocument();
writer.Flush(); }
/// <summary> /// Indicates that a handler is reusable. /// </summary> public bool IsReusable { get { return true; } }
/// <summary> /// A connection string. /// </summary> private const string provider = "Provider=Search.CollatorDSO;" + "Extended Properties='Application=Windows';" + "OLE DB Services=-4"; }
And a SQL CLR function looks like this:
using System; using System.Collections; using System.Collections.Generic; using System.Data; using System.Data.SqlClient; using System.Data.SqlTypes; using Microsoft.SqlServer.Server; using System.Net; using System.IO; using System.Xml;
/// <summary> /// A user defined function. /// </summary> public class UserDefinedFunctions { /// <summary> /// A Windows Search returning result as xml strings. /// </summary> /// <param name="url">A search url.</param> /// <param name="userName">A user name for a web request.</param> /// <param name="password">A password for a web request.</param> /// <param name="query">A Windows Search SQL.</param> /// <returns>A result rows.</returns> [SqlFunction( IsDeterministic = false, Name = "WindowsSearch", FillRowMethodName = "FillWindowsSearch", TableDefinition = "value nvarchar(max)")] public static IEnumerable Search( string url, string userName, string password, string query) { return SearchEnumerator(url, userName, password, query); }
/// <summary> /// A filler of WindowsSearch function. /// </summary> /// <param name="value">A value returned from the enumerator.</param> /// <param name="row">An output value.</param> public static void FillWindowsSearch(object value, out string row) { row = (string)value; }
/// <summary> /// Gets a search row enumerator. /// </summary> /// <param name="url">A search url.</param> /// <param name="userName">A user name for a web request.</param> /// <param name="password">A password for a web request.</param> /// <param name="query">A Windows Search SQL.</param> /// <returns>A result rows.</returns> private static IEnumerable<string> SearchEnumerator( string url, string userName, string password, string query) { if (string.IsNullOrEmpty(url)) { throw new ArgumentException("url"); }
if (string.IsNullOrEmpty(query)) { throw new ArgumentException("query"); }
var requestUrl = url + "?query=" + Uri.EscapeDataString(query);
var request = WebRequest.Create(requestUrl);
request.Credentials = string.IsNullOrEmpty(userName) ? CredentialCache.DefaultCredentials : new NetworkCredential(userName, password);
using(var response = request.GetResponse()) using(var stream = response.GetResponseStream()) using(var reader = XmlReader.Create(stream)) { bool read = true;
while(!read || reader.Read()) { if ((reader.Depth == 1) && reader.IsStartElement()) { // Note that ReadInnerXml() advances the reader similar to Read(). yield return reader.ReadInnerXml();
read = false; } else { read = true; } } } } }
And, finally, when you call this service from SQL Server you write query like this:
with search as ( select cast(value as xml) value from dbo.WindowsSearch ( N'http://machine/WindowsSearchService/WindowsSearch.ashx', null, null, N' select "System.ItemUrl" from SystemIndex where scope=''.xml-gz:'' and contains(''...'')' ) ) select value.value('/System.ItemUrl[1]', 'nvarchar(max)') from search
Design is not trivial but it works somehow.
After dealing with all these problems some questions remain unanswered:
- Why SQL Server does not allow to query Windows Search directly?
- Why Windows Search OLEDB provider does not support "Data Source" parameter?
- Why Windows Search does not support custom protocols during remote search?
- Why SQL Server does not support web request/web services natively?
Hello everybody! You might think that we had died, since there were no articles
in our blog for a too long time, but no, we’re still alive…
A month or so we were busy with Windows Search and stuff around it. Custom
protocol handlers, support of different file formats and data storages are very
interesting tasks, but this article discusses another issue.
The issue is how to compile and install native code written in C++, which was
built under Visual Studio 2008 (SP1), on a clean computer.
The problem is that native dlls now resolve the problem known as a
DLL hell using assembly
manifests. This should help to discover and load the right DLL. The problem is
that there are many versions of CRT, MFC, ATL and other dlls, and it's not
trivial to create correct setup for a clean computer.
In order to avoid annoying dll binding problems at run-time, please define
BIND_TO_CURRENT_CRT_VERSION and/or (_BIND_TO_CURRENT_ATL_VERSION,
_BIND_TO_CURRENT_MFC_VERSION). Don’t forget to make the same definitions for
all configurations/target platforms you intend to use. Build the project and
check the resulting manifest file (just in case). It should contain something
like that:
<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<assembly xmlns='urn:schemas-microsoft-com:asm.v1' manifestVersion='1.0'>
<trustInfo xmlns="urn:schemas-microsoft-com:asm.v3">
<security>
<requestedPrivileges>
<requestedExecutionLevel level='asInvoker' uiAccess='false' />
</requestedPrivileges>
</security>
</trustInfo>
<dependency>
<dependentAssembly>
<assemblyIdentity type='win32' name='Microsoft.VC90.DebugCRT'
version='9.0.30729.4148'
processorArchitecture='x86'
publicKeyToken='1fc8b3b9a1e18e3b' />
</dependentAssembly>
</dependency>
</assembly>
The version of dependent assembly gives you a clue what a native run-time
version(s) requires your application. The same thing you have to do for all your
satellite projects.
The next step is to create a proper setup project using VS wizard.
Right click on setup project and select “Add->Merge Module…”. Select
“Microsoft_VC90_CRT_x86.msm” or/and (“Microsoft_VC90_DebugCRT_x86.msm”,
“Microsoft_VC90_ATL_x86.msm”, “Microsoft_VC90_MFC_x86.msm”…) for installing of
corresponding run-time libraries and “policy_9_0_Microsoft_VC90_CRT_x86.msm”
etc. for route calls of old version run-time libraries to the newest versions.
Now you're ready to build your setup project.
You may also include “Visual C++ Runtime Libraries” to a setup prerequisites.
As result, you'll get 2 files (setup.exe and Setup.msi) and an optional folder
(vcredist_x86) with C++ run-time redistributable libraries.
Note: only setup.exe installs those C++ run-time libraries.
More info concerning this theme:
Let's assume you're loading data into a table using BULK INSERT from tab
separated file. Among others you have some varchar field, which may contain any
character. Content of such field is escaped with usual scheme:
'\' as '\\' ;
char(13) as '\n' ;
char(10) as '\r' ;
char(9) as '\t' ;
But now, after loading, you want to unescape content back. How would you do it?
Notice that:
'\t' should be converted to a char(9) ;
'\\t' should be converted to a '\t' ;
'\\\t' should be converted to a '\' + char(9) ;
It might be that you're smart and you will immediately think of correct
algorithm, but for us it took a while to come up with a neat solution:
declare @value varchar(max);
set @value = ...
-- This unescapes the value
set @value =
replace
(
replace
(
replace
(
replace
(
replace(@value, '\\', '\ '),
'\n',
char(10)
),
'\r',
char(13)
),
'\t',
char(9)
),
'\ ',
'\'
);
Do you know a better way?
We were trying to query Windows Search from an SQL Server 2008.
Documentation states that Windows Search is exposed as OLE DB datasource. This meant that we could just query result like this:
SELECT * FROM OPENROWSET( 'Search.CollatorDSO.1', 'Application=Windows', 'SELECT "System.ItemName", "System.FileName" FROM SystemIndex');
But no, such select never works. Instead it returns obscure error messages:
OLE DB provider "Search.CollatorDSO.1" for linked server "(null)" returned message "Command was not prepared.". Msg 7399, Level 16, State 1, Line 1 The OLE DB provider "Search.CollatorDSO.1" for linked server "(null)" reported an error. Command was not prepared. Msg 7350, Level 16, State 2, Line 1 Cannot get the column information from OLE DB provider "Search.CollatorDSO.1" for linked server "(null)".
Microsoft is silent about reasons of such behaviour. People came to a conclusion that the problem is in the SQL Server, as one can query search results through OleDbConnection without problems.
This is very unfortunate, as it bans many use cases.
As a workaround we have defined a CLR function wrapping Windows Search call and returning rows as xml fragments. So now the query looks like this:
select value.value('System.ItemName[1]', 'nvarchar(max)') ItemName, value.value('System.FileName[1]', 'nvarchar(max)') FileName from dbo.WindowsSearch('SELECT "System.ItemName", "System.FileName" FROM SystemIndex')
Notice how we decompose xml fragment back to fields with the value() function.
The C# function looks like this:
using System; using System.Collections; using System.IO; using System.Xml; using System.Data; using System.Data.SqlClient; using System.Data.SqlTypes; using System.Data.OleDb;
using Microsoft.SqlServer.Server;
public class UserDefinedFunctions { [SqlFunction( FillRowMethodName = "FillSearch", TableDefinition="value xml")] public static IEnumerator WindowsSearch(SqlString query) { const string provider = "Provider=Search.CollatorDSO;" + "Extended Properties='Application=Windows';" + "OLE DB Services=-4";
var settings = new XmlWriterSettings { Indent = false, CloseOutput = false, ConformanceLevel = ConformanceLevel.Fragment, OmitXmlDeclaration = true };
string[] names = null;
using(var connection = new OleDbConnection(provider)) using(var command = new OleDbCommand(query.Value, connection)) { connection.Open();
using(var reader = command.ExecuteReader()) { while(reader.Read()) { if (names == null) { names = new string[reader.FieldCount];
for (int i = 0; i < names.Length; ++i) { names[i] = XmlConvert.EncodeLocalName(reader.GetName(i)); } }
var stream = new MemoryStream(); var writer = XmlWriter.Create(stream, settings);
for(int i = 0; i < names.Length; ++i) { writer.WriteElementString(names[i], Convert.ToString(reader[i])); }
writer.Close();
yield return new SqlXml(stream); } } } }
public static void FillSearch(object value, out SqlXml row) { row = (SqlXml)value; } }
Notes:
- Notice the use of "
OLE DB Services=-4 " in provider string to avoid transaction enlistment (required in SQL Server 2008).
- Permission level of the project that defines this extension function should be set to unsafe (see Project Properties/Database in Visual Studio) otherwise it does not allow the use OLE DB.
- SQL Server should be configured to allow CLR functions, see Server/Facets/Surface Area Configuration/ClrIntegrationEnabled in Microsoft SQL Server Management Studio
- Assembly should either be signed or a database should be marked as trustworthy, see Database/Facets/Trustworthy in Microsoft SQL Server Management Studio.
|