MIPS R4000 Microprocessor User's Manual 59
The CPU Pipeline
3.6 R4400 Processor Uncached Store Buffer
The R4400 processor contains an uncached store buffer to improve the
performance of uncached stores over that available from an R4000
processor. When an uncached store reaches the write-back (WB) stage in
the CPU pipeline, the CPU must stall until the store is sent off-chip. In the
R4400 processor, a single-entry buffer stores this uncached WB-stage data
on the chip without stalling the pipeline.
If a second uncached store reaches the WB stage in the R4400 processor
before the first uncached store has been moved off-chip, the CPU stalls
until the store buffer completes the first uncached store. To avoid this
stall, the compiler can insert seven instruction cycles between the two
uncached stores, as shown in Figure 3-12. A single instruction that
requires seven cycles to complete could be used in place of the seven No
Operation (NOP) instructions.
Figure 3-12 Pipeline Sequence for Back-to-Back Uncached Stores
If the two uncached stores execute within a loop, the two killed
instructions which are part of the loop branch latency are included in the
count of seven interpolated cycles. Figure 3-13 shows the four NOP
instructions that need to be scheduled in this case.
SW R2, (r3) # uncached store
NOP # NOP 1
NOP # NOP 2
NOP # NOP 3
NOP # NOP 4
NOP # NOP 5
NOP # NOP 6
NOP # NOP 7
SW R2, (R3) # uncached store