Consider the following sequence of instructions:
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | ||
| LW | R1, 0(R1) | IF | ID | EX | MEM | WB | |||
| SUB | R4, R1, R5 | IF | ID | EXsub | MEM | WB | |||
| AND | R6, R1 R7 | IF | ID | EXand | MEM | WB | |||
| OR | R8, R1, R9 | IF | ID | EX | MEM | WB |
For AND instruction we can forward the result immediately to the ALU (EXand) from the MEM/WB register(MEM).
OR instruction has no problem, since it receives the value through the register file (ID). In clock cycle no. 5, the WB of the LW instruction occurs "early" in first half of the cycle and the register read of the OR instruction occurs "late" in the second half of the cycle.
For SUB instruction, the forwarded result would arrive too late - at the end of a clock cycle, when needed at the beginning.
The load instruction has a delay or latency that cannot be eliminated by forwarding alone. Instead, we need to add hardware, called a pipeline interlock, to preserve the correct execution pattern. In general, a pipeline interlock detects a hazard and stalls the pipeline until the hazard is cleared.
The pipeline with a stall and the legal forwarding is:
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ||
| LW | R1, 0(R1) | IF | ID | EX | MEM | WB | ||||
| SUB | R4, R1, R5 | IF | ID | stall | EXsub | MEM | WB | |||
| AND | R6, R1 R7 | IF | stall | ID | EX | MEM | WB | |||
| OR | R8, R1, R9 | stall | IF | ID | EX | MEM | WB |
The only necessary forwarding is done for R1 from MEM
to EXsub.
Notice that there is no need to forward R1 for AND instruction
because now it is getting the value through the register file in ID
(as OR above).
There are techniques to reduce number of stalls even in this case, which we consider next.