i am trying to do matrix multiplication on a 2D array using pitch.
i am able to load the 2D array on gpu using cudaMallocPitch() and cudaMemcpy2D() function, but i am not able to write the multiplication code.
The output which i am getting is wrong.
Can anyone help me out in the code

here's the which i have written

//---code for matrix multiplication using pitch---

float Pvalue=0;
xid = blockIdx.x * blockDim.x + threadIdx.x;
yid = blockIdx.y * blockDim.y + threadIdx.y;

for (int k = 0; k < N; ++k) {	 //D=T*M
float Melement = T[yid*pitch+k];
float Nelement = M[k+xid*pitch];
Pvalue += Melement * Nelement;
D[yid*pitch+xid] = Pvalue;


i am waiting for the help
thanx in advance....

I'm not embarrassed to admit that I haven't a clue about what you're talking. However, I do have Super Secret Investigative Powers, so i was able to find out:

CUDA (an acronym for Compute Unified Device Architecture) is a parallel computing architecture developed by NVIDIA. CUDA is the computing engine in NVIDIA graphics processing units or GPUs that is accessible to software developers through industry standard programming languages. Programmers use 'C for CUDA' (C with NVIDIA extensions), compiled through a PathScale Open64 C compiler, to code algorithms for execution on the GPU. CUDA architecture shares a range of computational interfaces with two competitors -the Khronos Group's Open Computing Language and Microsoft's DirectCompute. Third party wrappers are also available for Python, Fortran, Java and Matlab.

so i don't think youre going to get much help here. this is far too specific of a niche application. we generally don't deal with proprietary extensions. We can't, really. Questions about proprietary libraries are far better handled by people with expert knowledge in the specific libraries.

good luck