Hello everyone, I hope I have posted this in the correct forum. I have a good knowledge of VB.Net and VC++ and have now takent he brave step into assembly level programming. I'm starting out with just a bit of inline assembly in a c++ program and toying around with MMX. What i'm struggling with is what MMX actually does with certain instructions. I know it's designed to operate on packed data values but having read a wealth of documentation on it I'm still not getting how the follwing instructions work. Example.... Lets say I have a BYTE pointer to a memory bitmap in 32 bpp format. I use movq to move two pixels or 8 bytes of memory into an mmx register. Now the problems begin, unpacking and manipulating the individual BGRA components. I think I need the following instruction, i just don't know how to use it properly. Lamens terms would be very much appreciated.

punpcklbw ???

bytePtr // Pointer to first byte of 32bpp memory bitmap;
pushad; // Push all 32 bit registers onto the stack

mov esi, DWORD PTR [bytePtr] // move the pointer into esi.

//Start MMX

movq mm0, esi; // Move quadword (two pixles into mm0)

/// Problems here, for now just a simple example on how to add the value of 1 to each BGRA value. I can use "paddusb" to add a value to the first Blue value but I don't know how to add a value to the rest. I'm sure I have to unpack the data but i need some help.

Have you read the MMX literature in the Intel Software developers manual volume 3? It explains the instructions pretty well and even gives you tips on how to optimize them.

I'm a tad rusty, but your first your instruction should have been...

You're pushing the contents of memory addressed by esi, not trying to push a 32-bit register onto a 64-bit MMX register. So now you need to use a byte oriented addition instruction.

movq mm0, [esi] ; Move quadword (two pixels into mm0)
movdqu xmm0,[esi] ; Move Oct word (four 32-bit pixels)
; 16 8-bit pixel elements

But why waste time pushing reserved registers, as this is not the old original 16-bit processor thus you can use General Purpose registers as indexes these days! so no register save is required!

movq mm0, [ecx] ; Move quadword (two pixels into mm0)
movdqu xmm0,[ecx] ; Move Oct word (four 32-bit pixels)

Note, I used an unaligned instruction movdqu (U) but since all memory should be kept in an aligned state, your code will run faster without stalling to handle the misaligned memory.

movdqa xmm0,[ecx] ; Move Oct word (four 32-bit pixels)
movdqa (A)

Since it's 32-bit RGBA then you're dealing with 8 (or 16) packed bytes dependent upon which register you'r using mm0 vs xmm0.

Since there isn't a packed increment then load from memory
OneX16 OWORD 01010101010101010101010101010101h

paddusb xmm0,OneX16 ; vA=vA + 1 Add 16 unsigned bytes

and of course the save!

movdqu [ecx],xmm0 ; Save Octal word (four 32-bit pixels)

Also for memory efficiency, you should work in pairs but that's a lesson for another time