The problem with data hazards, introduced
by this sequence of instructions can be solved with a simple hardware technique
called forwarding.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | ||
| ADD | R1, R2, R3 | IF | ID | EX | MEM | WB | ||
| SUB | R4, R5, R1 | IF | IDsub | EX | MEM | WB | ||
| AND | R6, R1, R7 | IF | IDand | EX | MEM | WB |
If the result can be moved from where the ADD produces it (EX/MEM register),
to where the SUB needs it (ALU input latch), then the need for a stall can
be avoided.
Using this observation , forwarding works as follows:
The ALU result from the EX/MEM register is always fed back to the ALU input latches.
If the forwarding hardware detects that the previous ALU operation has written the register corresponding to the source for the current ALU operation, control logic selects the forwarded result as the ALU input rather than the value read from the register file.
| Forwarding of results to the ALU requires the additional of three
extra inputs on each ALU multiplexer and the addtion of three paths to
the new inputs.
|
Without forwarding our example will execute correctly with stalls:
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ||
| ADD | R1, R2, R3 | IF | ID | EX | MEM | WB | ||||
| SUB | R4, R5, R1 | IF | stall | stall | IDsub | EX | MEM | WB | ||
| AND | R6, R1, R7 | stall | stall | IF | IDand | EX | MEM | WB |
As our example shows, we need to forward results not only from the immediately
previous instruction, but possibly from an instruction that started three
cycles earlier. Forwarding can be arranged from MEM/WB latch to ALU input
also. Using those forwarding paths the code sequence can be executed
without stalls:
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | ||
| ADD | R1, R2, R3 | IF | ID | EXadd | MEMadd | WB | ||
| SUB | R4, R5, R1 | IF | ID | EXsub | MEM | WB | ||
| AND | R6, R1, R7 | IF | ID | EXand | MEM | WB |
The first forwarding is for value of R1
from EXadd to EXsub
.
The second forwarding is also for value of R1
from MEMadd to EXand.
This code now can be executed without stalls.
Forwarding can be generalized to include passing
the result directly to the functional unit that requires it: a result is
forwarded from the output of one unit to the input of another, rather than
just from the result of a unit to the input of the same unit.
One more Example
To prevent a stall in this example, we would need to forward the values
of R1 and R4 from the pipeline registers to the ALU and data memory inputs.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | ||
| ADD | R1, R2, R3 | IF | ID | EXadd | MEMadd | WB | ||
| LW | R4, d (R1) | IF | ID | EXlw | MEMlw | WB | ||
| SW | R4,12(R1) | IF | ID | EXsw | MEMsw | WB |
Stores require an operand during MEM, and forwarding of that operand
is shown here.
The first forwarding is for value of R1
from EXadd to EXlw
.
The second forwarding is also for value of R1
from MEMadd to EXsw.
The third forwarding is for value of R4
from MEMlw
to
MEMsw.
Observe that the SW instruction is storing the value of R4 into a memory location computed by adding the displacement 12
to the value contained in register R1. This effective address computation is done in the ALU during the EX stage of the SW instruction.
The value to be stored (R4 in this case) is needed only in the MEM stage as an input to Data Memory. Thus the value of R1
is forwarded to the EX stage for effective address computation and is needed earlier in time than the value of R4 which is
forwarded to the input of Data Memory in the MEM stage.
So forwarding takes place from "left-to-right" in time, but operands are not ALWAYS forwarded to the EX stage - it
depends on the instruction and the point in the Datapath where the operand is needed. Of course, hardware support is
necessary to support data forwarding.