When Stalls are Required


Unfortunately, not all potential hazards can be handled by forwarding.

Consider the following  sequence of instructions:
 
1 2 3 4
LW R1, 0(R1) IF ID EX MEM WB      
SUB R4, R1, R5   IF ID  EXsub MEM WB    
AND R6, R1 R7     IF ID EXand MEM WB  
OR R8, R1, R9       IF ID EX MEM WB

The LW instruction does not have the data until the end of clock cycle 4 (MEM) , while the SUB instruction needs to have the data by the beginning of that clock cycle (EXsub).

For AND instruction we can forward the result immediately to the ALU (EXand) from the MEM/WB register(MEM).

OR instruction has no problem, since it receives the value through the register file (ID). In clock cycle no. 5, the WB of the LW instruction occurs "early" in first half of the cycle and the register read of the OR instruction occurs "late" in the second half of the cycle.

For SUB instruction, the forwarded result would arrive too late - at the end of a clock cycle, when needed at the beginning.

The load instruction has a delay or latency that cannot be eliminated by forwarding alone. Instead, we need to add hardware, called a pipeline interlock, to preserve the correct execution pattern. In general, a pipeline interlock detects a hazard and stalls the pipeline until the hazard is cleared.

The pipeline with a stall and the legal forwarding is:
 
1 2 3 4 9
LW R1, 0(R1) IF ID EX MEM WB        
SUB R4, R1, R5   IF ID  stall EXsub MEM WB    
AND R6, R1 R7     IF stall ID EX MEM WB   
OR R8, R1, R9       stall IF ID EX MEM WB

The only necessary forwarding is done for R1 from MEM  to EXsub.
Notice that there is no need to forward R1 for AND instruction because now it is getting the value through the register file in ID (as OR above).

There are techniques to reduce number of stalls even in this case, which we consider next.