I am trying to learn CUDA and want to know how to deal with nested loops and things I was wondering how you might code this in CUDA
float a[1024][1024], b[1024];
for (i=0; i<1024; i++)
for (j=0; j<1024‐i; j++)
b[i+j] += arbitrary_function(a[i][j]);