RM0046 e200z0 and e200z0h Core
Doc ID 16912 Rev 5 271/936
● Testability
– Synthesizeable, full MuxD scan design
– ABIST/MBIST for optional memory arrays
12.2.1 Microarchitecture summary
The e200z0 processor utilizes a four stage pipeline for instruction execution. The Instruction
Fetch (stage 1), Instruction Decode/Register file Read/Effective Address Calculation (stage
2), Execute/Memory Access (stage 3), and Register Writeback (stage 4) stages operate in
an overlapped fashion, allowing single clock instruction execution for most instructions.
The integer execution unit consists of a 32-bit Arithmetic Unit (AU), a Logic Unit (LU), a 32-
bit Barrel shifter (Shifter), a Mask-Insertion Unit (MIU), a Condition Register manipulation
Unit (CRU), a Count-Leading-Zeros unit (CLZ), an 8 × 32 Hardware Multiplier array, result
feed-forward hardware, and a hardware divider.
Arithmetic and logical operations are executed in a single cycle with the exception of the
divide and multiply instructions. A Count-Leading-Zeros unit operates in a single clock cycle.
The Instruction Unit contains a PC incrementer and a dedicated Branch Address adder to
minimize delays during change of flow operations. Sequential prefetching is performed to
ensure a supply of instructions into the execution pipeline. Branch target prefetching from
the BTB is performed to accelerate certain taken branches in the e200z0. Prefetched
instructions are placed into an instruction buffer with 4 entries (2 entries in e200z0), each
capable of holding a single 32-bit instruction or a pair of 16-bit instructions.
Conditional branches that are not taken execute in a single clock. Branches with successful
target prefetching have an effective execution time of one clock on e200z0h. All other taken
branches have an execution time of two clocks.
Memory load and store operations are provided for byte, halfword, and word (32-bit) data
with automatic zero or sign extension of byte and halfword load data as well as optional byte
reversal of data. These instructions can be pipelined to allow effective single cycle
throughput. Load and store multiple word instructions allow low overhead context save and
restore operations. The load/store unit contains a dedicated effective address adder to allow
effective address generation to be optimized. Also, a load-to-use dependency does not incur
any pipeline bubbles for most cases.
The Condition Register unit supports the condition register (CR) and condition register
operations defined by the Power Architecture architecture. The condition register consists of
eight 4-bit fields that reflect the results of certain operations, such as move, integer and
floating-point compare, arithmetic, and logical instructions, and provide a mechanism for
testing and branching.
Vectored and autovectored interrupts are supported by the CPU. Vectored interrupt support
is provided to allow multiple interrupt sources to have unique interrupt handlers invoked with
no software overhead.