Prove the following recurrence by induction
T(n) = 2T(n/2) + n lg n = Θ ( n lg^2 n ), T(1) = 1

This to me looks scary and I was wondering if anybody had a heads up on how to solve for this

One way is to use the master theorem, I think.

I prefer to think about these in terms of "layers." At the top layer of the problem, we spend an n*log(n) cost in order to move down one layer. At the layer underneath, we have two sub-problems and for each of them, we pay an (n/2)*log(n/2) cost to move down one more layer. And so on until the bottom.

That means our total cost at each layer is going to be

1 * n * lg(n)
2 * (n/2) * lg(n/2)
4 * (n/4) * lg(n/4)
8 * (n/8) * lg(n/8)
n * 1

We want to add up these costs, but first we can simplify.

n * lg(n)
n * lg(n/2)
n * lg(n/4)
n * lg(n/8)
n * 1

There are two questions to ask. One is: how many layers are there? The answer is lg(n) layers, rounded to an integer -- it's a typical divide 'n conquer situation. The other is: what's the total cost?

Well, we can add it up:

n * lg(n) + n * lg(n/2) + n * lg(n/4) + ... + n * lg(n/2^lg(n))
n * [lg(n) + lg(n/2) + lg(n/4) + ... + lg(n/2^lg(n))]
n * [lg(n) + lg(n)-1 + lg(n)-2 + ... + 0]

So we can see it's n multiplied by a sum of integers from 0 up to lg(n).

Well the sum of integers from zero up to any value X is going to be X(X+1)/2 or in general O(X^2). So we have our sum being in the class of functions


I think thinking in terms of "layers" is easiest for the problem where
T(n) = 2T(n/2) + n. There we have a linear time cost each layer (it merely gets split into two parts at each layer and gets added back together). And we have lg(n) layers. So the cost is thus O(n * log(n)). That's a very common case, one that we see with things like merge sort. Finding ourselves with n * log(n) cost per layer is what makes our problem tricky -- at first we might think "This is just slightly different, it adds log(n) cost each layer, so the answer gets multiplied by log(n)." That's almost correct. Instead, as we've shown, the extra cost of that lg(n) factor is a lg(n) cost at the top layer, a lg(n)/2 cost in the middle, and a 0 cost at the bottom layer -- it linearly descends. So the average "extra" cost per layer is O(log(n)/2). Which is still O(log(n)), just half as large. Which we don't care about. So we get the intuitive answer, in that the cost is a little bit more than n * log(n). We had to look closely to be sure.