EasyManua.ls Logo

AMCC PPC405 - C.1.4 CR Dependencies; C.1.5 Branch Prediction; C.1.6 Alignment; C.2 Instruction Timings

Default Icon
450 pages
Print Icon
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Loading...
AMCC Proprietary 431
Revision 1.02 - September 10, 2007
PPC405 Processor
Preliminary User’s Manual
Moving new code and data into the cache arrays occurs at the speed of external memory. Much faster
execution is possible when all code and data is available in the cache. Organizing code to uniformly use
A
m:26
minimizes the use of congruent addresses.
C.1.4 CR Dependencies
For CR-setting arithmetic, compare, CR-logical, and logical instructions, and the CR-setting
mcrf, mcrxr, and
mtcrf instructions, put two instructions between the CR-setting instruction and a Branch instruction that uses a bit
in the CR field set by the CR-setting instruction.
C.1.5 Branch Prediction
Use the Y-bit in branch instructions to force proper branch prediction when there is a more likely prediction than the
standard prediction. See Branch Prediction on page 52 for a more information about branch prediction.
C.1.6 Alignment
For speed, align all accesses on the appropriate operand-size boundary. For example, load/store word operands
should be word-aligned, and so on. Hardware does not trap unaligned accesses; instead, two accesses are
performed for a load or store of an unaligned operand that crosses a word boundary. Unaligned accesses that do
not cross word boundaries are performed in one access.
Align branch targets that are unlikely to be hit by “fall-through” code on cache line boundaries (such as the address
of functions such as
strcpy), to minimize the number of unused instructions in cache line fills.
C.2 Instruction Timings
The following timing descriptions consider only “first order” effects of cache misses in the ICU (instruction-side) and
DCU (data-side) arrays.
The timing descriptions
do not provide complete descriptions of the performance penalty associated with cache
misses; the timing descriptions do not consider bus contention between the instruction-side and the data-side, or
the time associated with performing line fills or flushes. Unless specifically stated otherwise, the number of cycles
apply to systems having zero-wait memory access.
C.2.1 General Rules
Instructions execute in order.
All instructions, assuming cache hits, execute in one cycle, except:
Divide instructions execute in 35 clock cycles.
Branches execute in one or three clock cycles, as described in “Branches.”
MAC and multiply instructions execute in one to five cycles as described in “Multiplies.”
Aligned load/store instructions that hit in the cache execute in one clock cycle/word. See “Alignment” for infor-
mation on execution timings for unaligned load/stores.
In isolation, a data cache control instruction takes two cycles in the processor pipeline. However, subsequent
DCU accesses are stalled until a cache control instruction finishes accessing the data cache array.
Note: Note that subsequent DCU accesses do not remain stalled while transfers associated with previous
data cache control instructions continue on the PLB.

Table of Contents