Problem on Branch Prediction Schemes


Use the following code segment:
 
loop: LW R1, 0(R2)
ADDI R1,R1,#1
SW R1, 0(R2)
ADDI R2, R2, #4
SUB R4, R3, R2
BENZ R4, loop
Assume that the initial value of R3 is R2+396.

Throughout this Exercise use the DLX integer pipeline and assume all memory accesses are cache hits.

a) Show the timing of this instruction sequence for the DLX pipeline without any forwarding but assuming a register read and a write in the same clock cycle "forwards" through the register file. Assume that a branch is handled by flushing the pipeline. If the memory references hit in the cache, how many cycles does this loop take to execute?

Solution:
 
      1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
l: LW R1,0(R2) IF ID EX ME WB                          
ADDI R1,R1,#1   IF stall stall ID EX ME WB                    
SW R1, 0(R2)         IF stall stall ID EX ME WB              
ADDI R2,R2,#4               IF ID EX ME WB            
SUB R4,R3,R2                 IF stall stall ID EX ME WB      
BENZ R4, l                       IF stall stall ID EX MEM WB
                                IF stall stall IF

Number of steps = (396 / 4) + 1 = 100
Clock Cycles to Execute the loop = Number of steps * Number of Clock Cycles per step
= 100 * 17 = 1700
 

b) Show the timing of this instruction sequence for the DLX pipeline with normal forwarding hardware. Assume that the branch is handled by predicting it as not taken. If the memory references hit in the cache, how many cycles does this loop take to execute?

Solution:
 
1 2 3 4 5 6 7 8
l: LW R1,0(R2) IF ID EX ME WB            
ADDI R1,R1,#1    IF ID stall EX ME WB        
SW R1, 0(R2)     IF stall ID EX ME WB      
ADDI R2,R2,#4         IF ID EX ME WB    
SUB R2,R3,R2           IF ID EX ME WB  
BENZ R4, l             IF ID EX ME WB
  if not taken               IF idle idle idle
  l: (if taken)                 IF
Keep in mind that 99 times the branch is predicted wrong(as not taken),
and 1 time is actually predicted right!

Clock Cycles to Execute the loop = 99 * 8 + 1 * 7 = 799

c) Assuming  the DLX pipeline with a single-cycle delayed branch and forwarding hardware, schedule the instructions in the loop including  the branch delay slot. You may reorder instructions and modify the individual instruction operands. Show a pipeline timing diagram and compute the number of cycles needed to execute the entire loop.

Solution:
 
1 2 3 4 5 6
l: LW R1,0(R2) IF ID EX ME WB          
ADDI R2,R2,#4   IF ID EX ME WB        
SUB R4,R3,R2     IF ID EX ME WB      
ADDI R1,R1,#1       IF ID EX ME WB    
BENZ R4, l         IF ID EX ME WB  
SW R1, 0(R2)           IF ID EX ME WB
                IF        

Clock Cycles to Execute the loop = 100 * 6 = 600