Hi everyone, I'm trying to parallelise a Gram-Schmidt algorithm. Here's the serial code:

``````for( k = 0; k < m; k++ ) {
for( i = 0; i < k; i++ ) {

s = 0.0;
for( j = 0; j < n; j++ ) {
s += q[ i ][ j ] * q[ k ][ j ];
}

for( j = 0; j < n; j++ ) {
q[ k ][ j ] -= s * q[ i ][ j ];
}
}

s = 0.0;
for( j = 0; j < n; j++ ) {
s += pow( q[ k ][ j ], 2 );
}

for( j = 0; j < n; j++ ) {
q[ k ][ j ] /= sqrt( s );
}
}
``````

I have tried a bunch of different ways to parallelise it and I found that the k-loop or the i-loop cannot be parallelised because they all depend on previous iterations. Only some the inner loops can be parallelised (i.e. the j-loops) by doing something like this:

``````  for( k = 0; k < m; k++ ) {
for( i = 0; i < k; i++ ) {

s = 0.0;
#pragma omp parallel for default(none) shared(i, k, q, m, n) private(j) reduction(+:s) schedule(dynamic)
for( j = 0; j < n; j++ ) {
s += q[ i ][ j ] * q[ k ][ j ];
}

/* #pragma omp parallel for */
for( j = 0; j < n; j++ ) {
q[ k ][ j ] -= s * q[ i ][ j ];
}
}

s = 0.0;
#pragma omp parallel for default(none) shared(i, k, q, m, n) private(j) reduction(+:s) schedule(dynamic)
for( j = 0; j < n; j++ ) {
s += pow( q[ k ][ j ], 2 );
}

/* #pragma omp parallel for */
for( j = 0; j < n; j++ ) {
q[ k ][ j ] /= sqrt( s );
}
}
``````

This however creates a large overhead because threads are created and destroyed over and over as the algorithm is running making it a lot slower than the serial version even for very large 'm' and 'n'.

I can't find any way to efficiently parallelise the piece of code but I have seen parallelised MGS algorithms in research papers. What am I missing here?

Thanks

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.