Serving as a base for a family of RISC chips, the PowerPC
derives its core architecture from the performance-opti-
mized-with-enhanced-RISC (POWER) architecture. The
instruction set and 32 32-bit, general-purpose registers sup-
port multiple microarchitecture implementations that
include the 32-bit 603e, 604e, 740, 750, and embedded
processors (Motorola’s MPC 50x, MPC8x0, MPC82x, and
IBM’s 400 series).
The PowerPC 750 contains seven parallel-operating exe-
cution units: two integer units, a branch-processing unit, a
load/store unit (LSU), a floating-point unit (FPU), a condi-
tion-register unit, and an L2-cache-interface unit. (The 740 is
the lower cost version of the 750 and lacks the L2-cache-inter-
face unit.) This CPU can fetch as many as four instructions
per cycle. The 750 processes branches as they enter the
instruction buffer and can decode and dispatch two non-
branches in one cycle. Completion logic keeps track of the
outstanding instructions and retires them in order.
The PowerPC 750 mP uses static or dynamic branch pre-
diction to improve the accuracy of instruction prefetching.
For static prediction, the branch-operation codes provide
hints to predict whether a branch is taken or not. For dynam-
ic prediction, the CPU uses a 512-entry branch-history table
and a 64-entry branch-target instruction. The CPU permits
speculative execution down a predicted path beyond one
unresolved branch.
The 750 has separate 32-kbyte instruction and data caches.
Both eight-way, set-associative, lockable caches provide byte-
level parity checking. A locked cache typically supplies data
on a hit, but cache lines are not replaced on a miss. The 750
contains an on-chip L2-cache controller and backside L2 bus,
which improves system performance by reducing system-bus
traffic. The L2-cache controller includes 8196 tag entries,
which support 256 kbytes, 512 kbytes, or 1 Mbyte of exter-
nal, two-way, set-associative, unified L2 cache. The L2 cache
uses standard, commodity SRAMs. The nonblocking L2 cache
supports hit-under-miss mode and can simultaneously ser-
vice as many as four requests. The L2-cache bus can operate
at various speeds relative to the processor frequency.
The PowerPC 604e contains seven independent execution
units: two single-cycle integer units, a multiple-cycle integer
unit, a branch-processing unit, an LSU, an FPU, and a condi
-
tion-register unit. Instructions execute out of order, and exe-
cution results can be immediately available to subsequent
instructions through the use of rename registers. The com-
pletion unit commits, or “retires,” results to floating-point or
general-purpose registers. The unit retires as many as four
instructions per clock cycle in order, ensuring a precise excep-
tion model.
The PowerPC 604e mP uses dynamic branch prediction to
improve the accuracy of instruction prefetching. This feature
and the ability to speculatively execute through two unre-
solved branches minimize pipeline stalls. The 604e has sepa-
rate 32-kbyte, four-way, set-associative instruction and data
caches, both of which provide byte-level parity checking.
IBM/Motorola PowerPC