I am trying to learn CUDA and want to know how to deal with nested loops and things I was wondering how you might code this in CUDA

float a[1024][1024], b[1024];
		for (i=0; i<1024; i++)
			for (j=0; j<1024‐i; j++)
				b[i+j] += arbitrary_function(a[i][j]);

That's basically it, it's still C++ code really. Since CUDA can be used with C#, C++, Python, Java, possibly others, it's really just an extension to provide access to the GPU through whatever language you're currently using.