1.4.1.1 Instruction Cache Unit
The
ICU provides one or two instructions per cycle to the execution unit (EXU) over a 64-bit bus. A
line buffer (built into the output of the array for manufacturing test) enables the ICU to be accessed
only once for every four instructions, to reduce power consumption by the array.
The
ICU can forward any
or
all of the words of a line fill to the EXU to minimize pipeline stalls caused
by cache misses. The
ICU aborts speculative fetches abandoned
by
the EXU, eliminating
unnecessary
line fills and enabling the ICU to handle the next EXU fetch. Aborting abandoned
requests
also eliminates unnecessary PLB activity to increase PLB availability for other on-chip cores,
such as the DMA and Ethernet controllers.
1.4.1.2 Data Cache Unit
The
DCU transfers
1,
2, 3, 4,
or
8 bytes per cycle, depending on the number of byte enables
presented by the CPU. The DCU contains a single-element command and store data queue to reduce
pipeline stalls; this queue enables the DCU to independently process load/store and cache control
instructions. Dynamic PLB request prioritization reduces
pipeline stalls even further. When the DCU is
busy with a low-priority request while a subsequent storage operation requested by the CPU is
stalled, the DCU automatically increases the priority of the current request to the PLB.
The DCU uses a
two-line flush queue to minimize pipeline stalls caused by cache misses. Line
flushes are postponed until after a
line fill is completed. Registers comprise the first position of the
flush queue; the
line buffer built into the output of the array for manufacturing test serves as the
second
pOSition
of the flush queue. Pipeline stalls are further reduced by forwarding the requested
word to the CPU during the
line fill. Single-queued flushes are non-blocking. When a flush operation is
pending, the DCU can continue to access the array to determine subsequent
load or store hits. Under
these conditions,
load hits can occur concurrently with store hits to write-back memory without stalling
the pipeline. Requests abandoned by the CPU can also be aborted by the cache controller.
Additional DCU features enable the programmer to tailor performance for a given application. The
DCU can function in write-back
or
write-through mode, as controlled by the Data Cache Write-through
Register (DCWR)
or
the translation look-aside bufferยท(TLB). DCU performance can be tuned to
balance performance and memory coherency. Store-without-allocate, controlled by the
SWOA field of
the Core Configuration Register
0 (CCRO), can inhibit line fills caused by store misses to
further'
reduce potential pipeline stalls and unwanted external bus traffic. Similarly, load-without-allocate,
controlled by
CCRO[LWOA], can inhibit line fills caused by load misses.
1.4.2
Memory
Management
Unit
The 4GB address space of the PPC405GP is presented as a flat address space.
The MMU provides address translation, protection functions, and storage attribute control for
embedded applications. The MMU supports demand paged virtual memory and other management
schemes that require precise control of
logical to physical address mapping and flexible memory
protection. Working with appropriate system
level software, the MMU provides the following functions:
โข Translation of the 4GB logical address space into physical addresses
โข Independent enabling of instruction and data translation/protection
โข Page level access control using the translation mechanism
โข Software control of page replacement strategy
โข Additional control over protection using zones
1-6 PPC40SGP User's Manual Preliminary