We're not going to do your homework for you. Show you made an effort, and then you may get some help.
We're not going to do your homework for you. Show you made an effort, and then you may get some help.
they'll throw extra hardware at it
...and when that hardware doesn't exist?
What they want you to do is take the CPU's pipeline into consideration when writing the code. On the newer x86 CPU (486 onward) you can get significant speed boosts by ordering stuff so stages of the pipeline don't stall wiating for other stage results.
ex.
xor cx,cx
mov something, cx
mov di, whatever
mov ax,[di]
The load of DI and its use in the next instruction are in close proximity and can cause a stall. The code will work, it'll just be somewhat slower than something like this:
mov di, whatever
xor cx,cx
mov something, cx
mov ax,[di]
Where the CX ops were something that had to happen anyway and interposing them between the DI load and usage doesn't have any effect on the semantics of the code.
Another thing is with flag setting and usage. Newer CPU's have branch prediction logic that will try to prefetch cache lines of the most likely path to be taken. If you give the pipeline some warning about the conditions of a conditional branch that branch can happen faster.
ex.
mov cx,1234
mov dx,4567
add ax,something
jnz someplace
The setting of ZF happening right before the jump doesn't give the branch prediction much help. If you could interpose some other instructions between the setting of ZF and its usage the branch predictor can do a better job. NAturally, those instructions had better not be …
Since it requires a 100% rewrite of assembly language programs to port from one platform (e.g. 80x88) to another (e.g. AIX Unix), assembly language, by definition, is unportable.
You can always run the code in an emulator like BOCHS, Hercules, QEMU, etc...
A modern x86 CPU can probably emulate a 15 year old mainframe faster than that mainframe ran originally.
That being said, ASM is still important today for anyone who wants code to actually perform well. When you've exhaused all HLL algorithmic optimizations, a bit of hand tuned ASM in the right place can still make a program dramatically faster. Comilers are good, but even the best of them can be beaten by a pro. This is particularly true on the quirky non-orthogonal x86 CPU's, and even more particularly true if the goal is to squeeze size rather than speed so something fits in an EPROM or ROM.
ex. What's the smallest way to clear 10 bytes of memory to zero on somthing 486DX or newer?
fldz
fbstp tenbytes
I've never seen a compiler clever enough to figure that one out ;->