Alright, I'm trying to learn some basic threading and this is a problem I've been banging my head up against for a while.
I'm doing some image manipulation using boost::numeric::ublas::matrix, which I believe is supposed to be thread safe, but as soon as more than 1 concurrent thread is involved I get radically different results on every execution.
I believe my algorithms should be order independent (or close to it).

My first bit of calling code looks like:


	n = 0;
	for (int structNum = 0; structNum < 8; structNum++)
		for (int x = 0; x < numthreads; x++)
			tg.create_thread(boost::bind(&decompose::octagon,d,boost::ref(M),boost::ref(copy),x,structNum, boost::ref(n), boost::ref(bar)));

	if (n == 0)
	numLoops ++;


And it's calling:

void decompose::octagon(matrix4D &M, matrix4D &copy, int threadNum, int structNum, int &n, boost::barrier &bar)

I have two copies of my matrices, an original and a copy. For each call to octagon, the original is only used as read only while the copy has its corresponding values modified. I send the function what thread number this is (a global variable stores the number of threads) and the general algorithm through the matrix is:

for (x = threadNum; x < (h); x += num)
		for (y = threadNum; y < (w); y+= num)

Where threadnum is the current thread number, and num is the total number of concurrent threads. As such, no threads should ever operate on the same element in the matrix object (unless I'm wrong about matrix being thread safe).

I expected tg.join_all() to cause each thread to finish execution before program flow would continue. I wasn't sure if it wasn't doing that, so I tried adding a barrier condition (bar) with a count of 3, though that doesn't seem to make a difference. Still, each thread calls bar.wait() upon finishing, and so does main after creating both threads. Also, join_all doesn't seem to destroy threads, so threadgroup tg ends up containing hundreds of threads by the end of the program execution. I assume that they're mostly all just idle, but I suppose it could be a problem.

Am I missing something major with my threading design?

> boost::ref(copy),x,structNum, boost::ref(n), boost::ref(bar)
My first guess would be that you're passing references to data which subsequently change while each thread is running.

The threads are operating on different elements of the matrices, so they should be fine.

I actually found the problem, my algorithm increments by the number of threads for both x and y, which causes it to miss elements. It probably causes other problems for other algorithms as well. That explains why it works in single threaded code, but not multithreaded.