So, in answering another thread on this forum, I decided to do some performance testing on String. Mainly because a friend of mine said "You should always use
new String it's the fastest!".
I wasn't convinced and argued in favour of StringBuilder, at which point I was directed to some "performance" tests of their own. Needless to say, I was not impressed at the numbers; it showed
new String as a clear winner being at least 20% faster than
StringBuilder. This concerned me a great deal! I decided to investigate and get to the bottom of it.
After dissecting my friend's test, I believe I found the answer. Compiler optimisation. In their test, they performed 10,000 iterations of setting strings. The SAME string every time. There is an important point to make here.
When you create a string it is assigned memory. This isn't anything astounding, however, strings are a little different in terms of conception.
Lets take the following:
String firstString = new String("10"); String secondString = new String("10");
Although you have specified two strings, each with a hard-coded value, this will be "optimised" away. Here, the word "10" will be stored in memory at a single place. The two string values that are created, will simply reference this "10" memory location as "10" is a constant at compile time.
If you then did;
firstString = firstString + "00"; an entirely new string will have been created in memory.
Seeing this, I decided to create my own test. It performs 10,000 iterations of the same logic on a randomly generated string 10,000 characters in length.
My output was as follows:
Using CHARACTER ARRAY ------------------------------ Time for new String: 21ms Time for concat: 2325ms Time for stringbuilder: 62ms Using IEnumerable<Char> ------------------------------ Time for new String: 915ms Time for concat: 2631ms Time for stringbuilder: 81ms Using List<Char> ------------------------------ Time for new String: 47ms Time for concat: 2660ms Time for stringbuilder: 10ms
In all cases, concat absolutely sucks and should never see the light of day again ;)
In terms of the first test, I believe the compiler was still able to optimise this. A character array is simply a string at the end of the day and the application will simply update pointer references rather than create new objects. However, this is what we wanted to know :)
In the second test we can see more clearly now that iterating an enumerator to create the string is a fairly slow process. StringBuilder will easily win out here as it uses dynamic memory. I suspect that
new String does not and instead generates a new object for each character in the string, which has to be enumerated again. I believe this explains the poor performance.
Using a List we can see that StringBuilder has the best performance by a long way whilst
new String comes back into action again. This is probably due to the single enumeration.
IMPORTANT NOTE: The IEnumerable interface has given us a lot of flexibility in C# and is absolutely brilliant for passing data around methods. But it is for this reason it is also rather dangerous! IEnumerable performs something called deferred execution that is, the value calculation is not actually performed until you use it.
IEnumerable<Int32> myInts = myBigArrayOfInts.Where(i => i > 0); // Get all integers larger than 0 // Some code is here // that doesn't even touch // the variable myInts Console.WriteLine(myInts.Count()); // Execution of line 1 happens here! This is the first time we use myInts. Console.WriteLine(myInts.First()); // Execution happens again!
What this also means, is that each time you call myInts, it will execute the enumerable!. Personally, I prefer to think of IEnumerable as a method pointer, a query method pointer if you will, as it helps to conceptualise what the code is doing.
To overcome this issue, you have to put the IEnumerable into a concrete class...
List<Int32> myInts = myBigArrayOfInts.Where(i => i > 0).ToList(); // Execution happens here, as we are converting it to a list Console.WriteLine(myInts); // Works just like an array and doesn't need to re-execute the query
So to relate this back to the above code, everytime
new String uses the
IEnumerable<Char> it will likely re-execute the query that retrieves all characters in the String and then pick out which character it is up to, create a new String and then do it all again with the next character. List doesn't suffer from this, because the query has already been executed, in effect turning it into a big character array just like the first test. (With some performance hit due to the way List lookup works)