Accidentally we have found that implementation of String and StringBuilder have been considerably revised, while public interface has remained the same.
public sealed class String { private int m_arrayLength; private int m_stringLength; private char m_firstChar; }
This layout is dated to .NET 1.0.
VM, in fact, allocates more memory than that defined in C# class, as &m_firstChar refers to an inline char buffer.
&m_firstChar
This way string's buffer length and string's length were two different values, thus StringBuilder used this fact and stored its content in a private string which it modified in place.
StringBuilder
In .NET 4, string is different:
public sealed class String { private int m_stringLength; private char m_firstChar; }
Memory footprint of such structure is smaller, but string's length should always be the same as its buffer. In fact layout of string is now the same as layout of char[].
string
char[]
This modification leads to implementation redesign of the StringBuilder.
Earlier, StringBuilder looked like the following:
public sealed class StringBuilder { internal IntPtr m_currentThread; internal int m_MaxCapacity; internal volatile string m_StringValue; }
Notice that m_StringValue is used as a storage, and m_currentThread is used to preserve thread affinity of the internal string value.
m_StringValue
m_currentThread
Now, guys at Microsoft have decided to implement StringBuilder very differently:
public sealed class StringBuilder { internal int m_MaxCapacity; internal int m_ChunkLength; internal int m_ChunkOffset; internal char[] m_ChunkChars; internal StringBuilder m_ChunkPrevious; }
Inspection of this layout immediately reveals implementation technique. It's a list of chunks. Instance itself references the last chunk (most recently appended), and the previous chunks.
Characteristics of this design are:
Length
Append()
ToString()
Insert()
Personally, we would select a slightly different design:
public sealed class StringBuilder { private struct Chunk { public int length; // Chunk length. public int offset; // Chunk offset. public char[] buffer; } private int m_MaxCapacity; // Alternatively, one can use // private List<Chunk> chunks; private int chunkCount; // Number of used chunks. private Chunk[] chunks; // Array of chunks except last. private Chunk last; // Last chunk. private bool nonHomogenous; // false if all chunks are of the same size. }
This design has better memory footprint, and random access time is O(1) when there were no inserts in the middle (nonHomogenous=false), and O(log(chunkCount)) after such inserts. All other characteristics are the same.
nonHomogenous=false
Remember Me
a@href@title, b, blockquote@cite, em, i, strike, strong, sub, super, u