I store matrix entries in std::vector<double>, such that the reading is row by row.
This means, for matrix
1 3 4 8 9 3
3 6 8 1 1 2
2 0 9 8 7 6

the std::vector<double> would have entries: {1,3,4,8,9,3,3,6,8,1,1,2,2,0,9,8,7,6}.
To transpose the matrix I use the naive approach of iterating through std::vector, calculating for each
entry its position in a matrix transpose. Recently, I got one suggestion:

"When you have a matrix and a mapping to a one-dimensional vector, transposing is equivalent to a permutation. Now execute this permutation by sorting the elements by requested rank in the permutation result."

I understand that transposing is equivalent to a permutation, but I dont know how to approach the problem this way.
Any suggestion on how to do this (or to do matrix transpose is a faster way) is welcome.

Well, think by representing the operation for small 2x2 and 3x3 matrices, you can see the pattern emerge:

//say vector<double> q is some 2x2 matrix then
qt = {q[0],q[2],q[1],q[3]};
//say vector<double> q is some 3x3 matrix then
qt = {q[0],q[3],q[6],q[1],q[4],q[7],q[2],q[5],q[8]};

For a (faster) limited-memory way, well there is a way to program this as a swapping chain, I think I've seen that somewhere, sometime ago. I think the basic idea is: you start with the first value that will have to move (in your example, the "3", at first row, second position from the left), figure out where it's supposed to go, hold that value as temporary, store the "3" at that place, and start again by figuring out where that new value should go (the one in temporary) and repeat until you come back to the original spot. I'm not sure this is exactly how it works, if it really visits the entire matrix and comes back to the start, but I think there is a way to do it along these lines.

EDIT: I just tested the above, and it works indeed!

Thanks for your messages. The main problem with the naive approach (calculating the proper positions for each entry) is large stride you would have to perform when operating on very large matrices. The aim is to fight that large stride. I thought of doing the following: iterate through the elements of the original matrix, one by one, and store the entries in a std::vector<std::vector> of size originalMatrix.colNo and originalMatrix.rowNo, respectively. However, I dont think I fight the strde problem with this approach. Any thought on this? Thanks

First, I want to say that the little method I posted earlier is only working is some cases like the one you posted and that I tested on. I don't think it works in all cases.

So, to avoid the stride problem, I don't see how that can possibly be done. There is an inherent mismatch in the indices from the nature of the transpose operation itself. If you read in order (incrementing index by one for each read), you have to write in disorder, and vice-versa. In the bigger scheme of things, sometimes people who are really concerned with efficiency and know they will use transpose a lot, will put a flag (enum) in the matrix class to tell whether it is row-major (as for your implementation) or column-major. Effectively, you can transpose the matrix by switching the flag value, but then all algorithms using the matrix will have to be implemented for all possible cases, which is a real pain in the ass, so I don't do that.

So, all in all, I think I will not be able to help more. But this wiki might be a good source to start from, if you didn't get there already.