Its a typical implementation of mergesort algorithm. It runs in a stable time of O(n*log(n) ). Just thought, I would add it to the library. Its been tested, although not very throughly, so if any bugs are found, just post it here, so others can be aware of them.
Its not commented much, if any, because I thought it was self explainable. I realize that
it could have been made more flexible, but I just wasn't interested enough. Happy Coding.

#include <iostream>
#include <algorithm>
#include <vector>
#include <ctime>
#include <functional>
#include <string>
#include <list>

using namespace std;


template<typename Type, typename Comparator = std::less<Type>>
struct MergeSort{
public:	
	typename typedef std::vector<Type> ArrayType;
	typename typedef std::vector<Type>::const_iterator constItrType;
	typename typedef std::vector<Type>::difference_type DiffType;	
private:
	ArrayType values;
	Comparator compare;

public:
	MergeSort(){};
	MergeSort(const ArrayType val,const Comparator& cmp =  Comparator()) 
		: values(val), compare(cmp){}	
	MergeSort(const Type* begin, const Type* end, const Comparator& cmp = Comparator())
		: values(begin,end), compare(cmp) {}
	ArrayType doIt(){ return _sortIt(values.begin(), values.end()); }
private:
	//helper function	
	ArrayType _sortIt(constItrType begin, constItrType end)const{
		if(end - begin  < 2) return ArrayType(begin,end);
		ArrayType left, right;
		DiffType mid = (end-begin) / 2;
		left = _sortIt(begin, begin + mid );
		right = _sortIt(begin + mid , end );
		return _mergeSort(left,right);		
	}
	ArrayType _mergeSort(const ArrayType& leftArray, const ArrayType& rightArray)const{
		ArrayType lhs(leftArray);
		ArrayType rhs(rightArray);
		ArrayType result;
		while(!lhs.empty() && !rhs.empty()){
			Type elem = compareElem(lhs.front(),rhs.front());
			if(elem == lhs.front() ) lhs.erase(lhs.begin());
			else rhs.erase( rhs.begin() );
			result.push_back( elem );
		}
		std::copy(lhs.begin(),lhs.end(), std::back_insert_iterator<ArrayType>(result) );
		std::copy(rhs.begin(),rhs.end(), std::back_insert_iterator<ArrayType>(result) );
		return result;
	}
	Type compareElem(const Type& lhs, const Type& rhs)const{
		return compare(lhs,rhs) ? lhs : rhs;
	}	
};

template<typename Input>
void println(const Input& in){		
	cout << in << endl;
}

int main(){	
	srand( static_cast<unsigned>(time(0)) );

	const int Size = 7;
	int values[Size] =  {0};

	std::generate( values, values + Size, rand );				
	MergeSort<int > msort(values,values+Size); 	
	vector<int> sorted = msort.doIt();
	cout <<"Using std::less as comparison :\n";
	std::for_each( sorted.begin(), sorted.end(),println<int>);
	
	MergeSort<int, std::greater<int> > greaterSort(values,values +Size);
	vector<int> greaterSorted = greaterSort.doIt();
	cout << "\n\nUsing std::greater as comparison : \n";
	std::for_each( greaterSorted.begin(), greaterSorted.end(), println<int> );

	return 0;
}

Pretty nice! But I have a few things to point out:

First, gcc complains with this code because of those "typename typedef .." statements at the beginning of the MergeSort class. These don't make sense as you posted them, and I don't understand why your compiler would not complain. As you put it, it reads "I state a typename from defining the type std::vector<Type> as ArrayType" when it should read "I define a type from the typename std::vector<Type> called ArrayType". This corresponds to this code (just inverting the keywords):

typedef typename std::vector<Type> ArrayType;
	typedef typename std::vector<Type>::const_iterator constItrType;
	typedef typename std::vector<Type>::difference_type DiffType;

Then, well there is a bit of useless copying of vectors in your implementation. I know, a good compiler can do a lot of the optimization for you, but in this case it might be worth eliminating temporaries a bit more. After all, the purpose of MergeSort is to be efficient.

Finally, in _mergeSort, erasing elements from the vector "lhs" and "rhs" is useless and might very well destroy your performance by adding a O(N) complex calls in the middle of your merge loops, I haven't worked out the math or a test-benchmark, but I doubt it remains O(NlogN). Furthermore, allocating capacity for the result vector with a simple reserve() call will also make your push_back calls guaranteed to be O(1).

So, my mods to the two _mergeSort and _sortIt methods would look like this:

ArrayType _sortIt(constItrType begin, constItrType end)const{
		if(end - begin  < 2) 
                        return ArrayType(begin,end);
		DiffType mid = (end-begin) / 2;
		return _mergeSort(_sortIt(begin, begin + mid ),_sortIt(begin + mid , end ));		
	}
	ArrayType _mergeSort(const ArrayType& lhs, const ArrayType& rhs)const{
		ArrayType result;
                result.reserve(lhs.size() + rhs.size());
                constItrType lhs_pos = lhs.begin();
                constItrType rhs_pos = rhs.begin();
		while((lhs_pos != lhs.end()) && (rhs_pos != rhs.end())){
			if(compare(*lhs_pos,*rhs_pos))
                                result.push_back( *(lhs_pos++) );
                        else
                                result.push_back( *(rhs_pos++) );
		}
		std::copy(lhs_pos,lhs.end(), std::back_insert_iterator<ArrayType>(result) );
		std::copy(rhs_pos,rhs.end(), std::back_insert_iterator<ArrayType>(result) );
		return result;
	}

Finally, in _mergeSort, erasing elements from the vector "lhs" and "rhs" is useless and might very well destroy your performance by adding a O(N) complex calls in the middle of your merge loops, I haven't worked out the math or a test-benchmark, but I doubt it remains O(NlogN).

It makes the running time ~(N^2). The cost at each point in the recursion tree is ~N^2 and so the sum of the levels will be N^2 + 2(N/2)^2 + 4(N/4)^2 + ..., which equals N^2 + N^2/2 + N^2/4 + ... which does not exceed 2N^2. (Normally the cost at each level is ~N which gives us N + 2(N/2) + 4(N/4) + ..., which gives us N + N + N + ..., which is N*log2(N) since there are ~log2(N) levels.)

Furthermore, allocating capacity for the result vector with a simple reserve() call will also make your push_back calls guaranteed to be O(1).

I'll note here that this will not affect the time complexity since the amortization applies, but it will improve performance.

The article starter has earned a lot of community kudos, and such articles offer a bounty for quality replies.