User Name Password Register
DaniWeb IT Discussion Community
All
What is DaniWeb IT Discussion Community?
You're currently browsing the Assembly section within the Software Development category of DaniWeb, a massive community of 456,562 software developers, web developers, Internet marketers, and tech gurus who are all enthusiastic about making contacts, networking, and learning from each other. In fact, there are 3,507 IT professionals currently interacting right now! Registration is free, only takes a minute and lets you enjoy all of the interactive features of the site.
Please support our Assembly advertiser: Programming Forums
Views: 1108 | Replies: 0 | Solved
Reply
Join Date: Sep 2007
Posts: 26
Reputation: winky is an unknown quantity at this point 
Rep Power: 2
Solved Threads: 0
winky's Avatar
winky winky is offline Offline
Light Poster

Pipelining and Stalling

  #1  
Oct 21st, 2007
I am currently studying the impact of microarchitectural techniques. I have been looking at code and how to stall it correctly, as well as how to make it more efficient. I have been doing this through several different methods and then measuring the cycles per iteration.

I was wondering if you could look below at what I did and then let me know if I am stalling correctly and reordering correctly. It would be awesome if you guys could give me any suggestions or any feedback .

Here is the code I am working with, as well as the latencies beyond a single cycle (note that it is beyond a single cycle so an instruction that has +N actually has N+1 cycles). Also note that the branch is always taken and the branch delayed slot is one cycle.

Sorry for the indenting.... wanted to seperate everything, but it somehow got messed up in word .

Latencies beyond single cycle:
Memory LD                   +3
Memory SD                   +1
Integer ADD, SUB          +0
Branches                       +1
ADDD                            +2
MULTD                           +4
DIVD                             +10

Loop:	LD		F2, 0(Rx)
I0:	MULTD	                F2, F0, F2
I1:	DIVD		F8, F2, F0
I2:	LD		F4, 0(Ry)
I3:	ADDD		F4, F0, F4
I4:	ADDD		F10, F8, F2
I5:	SD		F4, 0(Ry)
I6:	ADDI		Rx, Rx, #8
I7:	ADDI		Ry, Ry, #8
I8:	SUB		R20, R4, Rx
I9:	BNZ		R20, Loop
Branch Delayed Slot


The first method I used was stalling when there were only true data depenencies, instead of stalling on every single instruction.
Loop:	LD		F2, 0(Rx)
<stall> x 3
I0:	MULTD	                F2, F0, F2
<stall> x 4
I1:	DIVD		F8, F2, F0
I2:	LD		F4, 0(Ry)
<stall> x 3
I3:	ADDD		F4, F0, F4
I4:	ADDD		F10, F8, F2
I5:	SD		F4, 0(Ry)
I6:	ADDI		Rx, Rx, #8
I7:	ADDI		Ry, Ry, #8
I8:	SUB		R20, R4, Rx
I9:	BNZ		R20, Loop
Branch Delay Slot
This left me with 48 cycles per iteration.

Next, I used a multiple-issue design where results can be immediately forwarded from one unit to another or itself. It should only stall to observe a true data dependence.
1st Pipeline			     2nd Pipeline
Loop:	LD		F2, 0(Rx)        I0:	MULTD	F2, F0, F2 
I1:	DIVD		F8, F2, F0       I2:	LD	F4, 0(Ry)
I3:	ADDD		F4, F0, F4       <stall> x 6 (waiting for F8)   
                                                 I4:      ADDD       F10, F8, F2
I5:	SD		F4, 0(Ry)         I6:	ADDI	Rx, Rx, #8
I7:	ADDI		Ry, Ry, #8      I8:	SUB	R20, R4, Rx
I9:	BNZ		R20, Loop            Branch Delay Slot
This gave me 23 cycles per loop iteration.


The final thing I did was use the multiple-issue design and reorder the code to improve the performance.
1st Pipeline					2nd Pipeline
Loop:	LD		F2, 0(Rx)         I0:	MULTD	F2, F0, F2 
I2:	LD		F4, 0(Ry)	      I1:	DIVD        F8, F2, F0
I3:	ADDD		F4, F0, F4        I8:	SUB	R20, R4, Rx
I5:	SD		F4, 0(Ry)         I6:	ADDI        Rx, Rx, #8
I4:	ADDD		F10, F8, F2       I7:	ADDI        Ry, Ry, #8
I9:	BNZ		R20, Loop        Branch Delay Slot
This gave me 18 cycles per loop iteration.

Thanks again in advance for any responses
"First learn computer science and all the theory. Next develop a programming style. Then forget all that and just hack."
-George Carrette
AddThis Social Bookmark Button
Reply With Quote  
Reply

Only community members can participate in forum threads. You must register or log in to contribute.

DaniWeb Assembly Marketplace
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 

Thread Tools Display Modes

Similar Threads
Other Threads in the Assembly Forum

All times are GMT -4. The time now is 5:42 am.
Forum system based on vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
©2003 - 2008 DaniWeb® LLC