Problem on Pipeline Hazards

Consider the following pipeline with 8 stages for a version of DLX:
 
IF1 Instruction fetch starts
IF2 Instruction fetch completes
ID Instruction decode and register fetch; begin computing branch target
EX1 Execution starts; branch condition tested; finish computing branch target
EX2 Execution completes - effective address or ALU result available
MEM1/ALUWB First part of memory cycle plus WB of ALU operation
MEM2 Memory access completes
LWB Write back for a load instruction
As in the standard DLX pipeline, assume register writes are in the first half of a cycle and register reads are in the second half.

a) How many register read/write ports are required?

b) For each possible type of instruction source and each possible type of instruction destination, show a code example that depicts all possible forwarding requirements (not stalls).

c) Show the same information as part (b) but for stalls rather than forwards.

d) Assuming a predict-not-taken strategy, find the branch penalty for a taken and untaken branch. Assume that a predicted instruction can be executed up to, but not including, a pipestage that does a write back.
 

Solution:
a)

We need 2 read ports for 2 registers to read in one clock cycle in ID stage because this is the maximum number of operands in an instruction.

We need 2 write ports due to potential overlap in time between MEM1/ALUWB and LWB stages.

b)
ALU - ALU / ALU - Branch

1 ALU instr R1, _ , _
2 any instr
3 ALU instr _ , R1, _  / BNEZ R1, _
1 IF1 IF2 ID EX1 EX21 MEM1 MEM2 LWB    
2   IF1 IF2 ID EX1 EX2 MEM1 MEM2 LWB  
3     IF1 IF2 ID EX13 EX2 MEM1 MEM2 LWB
Forwarding is done for R1 from  EX21 to EX13 .
 

Memory - ALU / Memory - Branch / Memory - Memory

1 LW instr R1, _ , _
2 any instr
3 any instr
4 any instr
5 ALU instr _ , R1, _  / BNEZ R1, _ / SW _ , R1
1 IF1 IF2 ID EX1 EX2 MEM1 MEM21 LWB        
2   IF1 IF2 ID EX1 EX2 MEM1 MEM2 LWB      
3     IF1 IF2 ID EX1 EX2 MEM1 MEM2 LWB    
4       IF1 IF2 ID EX1 EX2 MEM1 MEM2 LWB  
5         IF1 IF2 ID EX15 EX2 MEM1 MEM2 LWB
Forwarding is done for R1 from  MEM21 to EX15 .
 

ALU - Memory

1 ALU instr R1, _ , _
2 SW _ , R1
1 IF1 IF2 ID EX1 EX2 MEM11 MEM2 LWB  
2   IF1 IF2 ID  EX1 EX2 MEM12 MEM2 LWB
Forwarding is done for R1 from  MEM11 to MEM12 without optioanal instruction.
 
 

c)
ALU - ALU / ALU - Branch

1 ALU instr R1, _ , _
2 ALU instr _ ,R1, _ /BNEZ R1, _
1 IF1 IF2 ID EX1 EX2 MEM1/ALUWB MEM2 LWB  
2   IF1 IF2 stall stall ID EX1 EX2 MEM1
 

Memory - ALU / Memory - Branch / Memory - Memory

1 LW instr R1, _ , _
2 ALU instr _ , R1, _  / BNEZ R1, _ /SW _ , R1
1 IF1 IF2 ID EX1 EX2 MEM1 MEM2 LWB  
2   IF1 IF2 stall stall stall stall ID ...
 

ALU - Memory

1 ALU instr R1, _ , _
2 SW _ , R1
1 IF1 IF2 ID EX1 EX2 MEM1/ALUWB MEM2 LWB  
2   IF1  IF2 stall stall ID EX1 EX2 ...
 
 

d)
Branch taken
1 BNEZ R1, N
2 any instr
3 any instr
4 any instr
...
N any instr
 
1 IF1 IF2 ID EX1 EX2 MEM1 MEM2 LWB  
2   IF1 IF2 ID          
3     IF1 IF2          
4       IF1          
    stall stall stall IF1N IF2N IDN EX1N EX2N
Target address is computed at the end of EX1 of a branch instruction. If at that time we find out that the branch is taken, we have to flush out all instructions in pipeline after the branch and fetch the instruction we jumped to.
So, it looks like we had 3 stalls.

Branch not taken
1 BNEZ R1, N
2 any instr
 
1 IF1 IF2 ID EX1 EX2 MEM1 MEM2 LWB  
2   IF1 IF2 ID EX1 EX2 MEM1 MEM2 LWB
If branch is not taken, then our pipeline will function properly because it is designed as a predict-not-taken pipeline and we have no stalls at all.