Can anyone please tell me how can I do loop unroll

``````void do_block (int lda, int M, int N, int K, double* A, double* B, double* C)
{
/* For each row i of A */
for (int i = 0; i < M; ++i)
/* For each column j of B */
for (int j = 0; j < N; ++j)
{
/* Compute C(i,j) */
double cij = C[i+j*lda];
for (int k = 0; k < K; ++k)
cij += A[i+k*lda] * B[k+j*lda];
C[i+j*lda] = cij;
}
}
``````

Delete both lines 10 and 11. Line 11 will never be executed because of the `k < k` condition in likne 10.

Here is a good article about loop unrolling.

One of the …

Have you decided what number you are going to use to unroll the loop? For example, are you going to unroll 5 loops at a time? 7 loops at a time? etc.?

I posted a code snippet several months ago that does matrix multiplication:

## All 5 Replies

Delete both lines 10 and 11. Line 11 will never be executed because of the `k < k` condition in likne 10.

Here is a good article about loop unrolling.

One of the optomizations you would make is to calculate `i*j*ida`only once within a loop, save the result in another variable, then use that variable everywhere else `i*j*ida` appears in the loop.

Have you decided what number you are going to use to unroll the loop? For example, are you going to unroll 5 loops at a time? 7 loops at a time? etc.?

I posted a code snippet several months ago that does matrix multiplication:

http://www.daniweb.com/software-development/cpp/code/456187/matrix-multiplication-c-program

The block that does the actual multiplication is contained in lines 40 - 48.

Say you wanted to unroll the inner loop 5 at time.
You don't know beforehand if the number of loops is evenly divisible by 5, so you have to check and deal with the remainder.

For example,

``````int remainder = nCols%5;
``````

Then,

``````dummy = 0.0;
for (i = 0; i < remainder; i++){
dummy += A_Matrix[k][j]*B_Matrix[j][i];
}
C_Matrix[k][i] = dummy;
``````

So now you have taken care of the iterations that would be "left over" when the loop is unrolled 5 at a time.

Now you can do the unrolling:

``````dummy = 0.0;
for (j = 0; j < nCols; j +=5) {
dummy += A_Matrix[k][j]*B_Matrix[j][i];
dummy += A_Matrix[k][j+1]*B_Matrix[j+1][i];
dummy += A_Matrix[k][j+2]*B_Matrix[j+2][i];
dummy += A_Matrix[k][j+3]*B_Matrix[j+3][i];
dummy += A_Matrix[k][j+4]*B_Matrix[j+4][i];
} // End for j
C_Matrix[k][i] = dummy;
``````

This code may not be completely accurate; I am writing off the top of my head. But I hope you get the idea. You are explicitly writing out the loops 5 at a time, incrementing the counter by five, and eliminating the test in the for-loop for many iterations (instead of testing the condition in the for-loop every time, you are only doing it once every 5 loops.)

Actually, on line 2 in my post just above, I think the loop index should have started at remainder, since the first few entries were done in the step just before. i.e. - line 2 should be `for (j = remainder; j < nCols; j +=5) {`

Thanks a lot. Can you tell how can I use GCC flags like -funroll-loops to unroll the loop.

I just cant figure out the syntax.

``````gcc -O2                  -funroll-loops -dgemm-blocked.c

optimazation level     flag name       file name

What am I missing?
``````
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts learning and sharing knowledge.