We needed to track a stream position during creation of xml file. This is to allow random access to a huge xml file (the task is related to WindowsSearch).
This is a simplified form of the xml:
<data> <item>...</item> ... <item>...</item> </data>
The goal was to have stream position of each item element. With this in mind, we've decided to:
Flush()
That's a code sample:
var stream = new MemoryStream(); var writer = XmlWriter.Create(stream); writer.WriteStartDocument(); writer.WriteStartElement("data"); for(var i = 0; i < 10; ++i) { writer.Flush(); Console.WriteLine("Flush offset: {0}, char: {1}", stream.Position, (char)stream.GetBuffer()[stream.Position - 1]); writer.WriteStartElement("item"); writer.WriteValue("item " + i); writer.WriteEndElement(); } writer.WriteEndElement(); writer.WriteEndDocument();
That's the output:
Flush offset: 46, char: a Flush offset: 66, char: > Flush offset: 85, char: > Flush offset: 104, char: > Flush offset: 123, char: > Flush offset: 142, char: > Flush offset: 161, char: > Flush offset: 180, char: > Flush offset: 199, char: > Flush offset: 218, char: >
Funny, isn't it?
After feeding the start tag <data>, and flushing xml writer we observe that only "<data" has been written down to the stream. Well, Flush() have never promissed anything particular about the content of the stream, so we cannot claim any violation, however we expected to see whole start tag.
<data>
"<data"
Inspection of the implementation of xml writer reveals laziness during writting data down the stream. In particular start tag is closed when one starts the content. This is probably to implement empty tags: <data/>.
<data/>
To do the trick we had to issue empty content, moreover, to call a particular method with particular parameters of the xml writer. So the code after the fix looks like this:
var stream = new MemoryStream(); var writer = XmlWriter.Create(stream); writer.WriteStartDocument(); writer.WriteStartElement("data"); char[] empty = { ' ' }; for(var i = 0; i < 10; ++i) { writer.WriteChars(empty, 0, 0); writer.Flush(); Console.WriteLine("Flush offset: {0}, char: {1}", stream.Position, (char)stream.GetBuffer()[stream.Position - 1]); writer.WriteStartElement("item"); writer.WriteValue("item " + i); writer.WriteEndElement(); } writer.WriteEndElement(); writer.WriteEndDocument();
And output is:
Flush offset: 47, char: > Flush offset: 66, char: > Flush offset: 85, char: > Flush offset: 104, char: > Flush offset: 123, char: > Flush offset: 142, char: > Flush offset: 161, char: > Flush offset: 180, char: > Flush offset: 199, char: > Flush offset: 218, char: >
While this code works, we feel uneasy with it.
What's the better way to solve the task?
Update: further analysis shows that it's only possible behaviour, as after the call to write srart element, you either can write attributes, content or end of element, so writer may write either space, '>' or '/>'. The only question is why it takes WriteChars(empty, 0, 0) into account and WriteValue("") it doesn't.
'>'
'/>'
WriteChars(empty, 0, 0)
WriteValue("")
Remember Me
a@href@title, b, blockquote@cite, em, i, strike, strong, sub, super, u