I currently have the following code for a sprite struct (c-style for use in DLL):

typedef struct LABglSpriteTAG
{
    unsigned int glTexture;
    int w;
    int h;
    LABglColour *data;
    bool compiled;
}*LABglSpriteHANDLE;

And the following function:

void LABglSpriteConvolute(LABglSpriteHANDLE spr, int w, int h, int *kernel)
{
    size_t dim=spr->w*spr->h;
    LABglColour *out=new LABglColour[dim];
    for (size_t i=0; i<dim; ++i)
        out[i]=0;
    int centerx=w/2;
    int centery=h/2;
    for (int x=0; x<spr->w; ++x)
    {
        for (int y=0; y<spr->h; ++i)
        {
            for (int kx=0; kx<w; ++kx)
            {
                int xx=w-1-kx;
                for (int ky=0; ky<h; ++ky)
                {
                    int yy=h-1-ky;
                    int rx=x+(kx-centerx);
                    int ry=y+(ky-centery);
                    if (rx>=0&&rx<spr->w&&ry>=0&&ry<spr->h)
                        out[y*spr->w+x]+=spr->data[ry*spr->w+rx]*(1.0/kernel[ky*w+kx]);
                }
            }
        }
    }
    delete[]spr->data;
    spr->data=out;
    #warning SLOW 2012-03-27-12.34: Unknown solution
}

The problem is that this will run in O(nk) time (n being array size, k being kernel size) and takes up a lot of space. Is there any way to optimize this function?

For the actual theory, this SO thread is a good starting point. Convolutions are never actually calculated like it appears in theoretical equations. As the SO thread says, you either perform the multiplication in Fourier space (instead of a convolution in the original space), or you separate the filter in each dimension and traverse each linearly.

As for further improving performance, you should also start to grasp the concept of "locality of reference". It's often more important than the theoretical complexity of an algorithm.

This article has been dead for over six months. Start a new discussion instead.