AMCC PPC405 - C.1.4 CR Dependencies; C.1.5 Branch Prediction; C.1.6 Alignment; C.2 Instruction Timings

To Next Page

To Previous Page

AMCC Proprietary 431

Revision 1.02 - September 10, 2007

PPC405 Processor

Preliminary User’s Manual

Moving new code and data into the cache arrays occurs at the speed of external memory. Much faster

execution is possible when all code and data is available in the cache. Organizing code to uniformly use

m:26

minimizes the use of congruent addresses.

C.1.4 CR Dependencies

For CR-setting arithmetic, compare, CR-logical, and logical instructions, and the CR-setting

mcrf, mcrxr, and

mtcrf instructions, put two instructions between the CR-setting instruction and a Branch instruction that uses a bit

in the CR field set by the CR-setting instruction.

C.1.5 Branch Prediction

Use the Y-bit in branch instructions to force proper branch prediction when there is a more likely prediction than the

standard prediction. See Branch Prediction on page 52 for a more information about branch prediction.

C.1.6 Alignment

For speed, align all accesses on the appropriate operand-size boundary. For example, load/store word operands

should be word-aligned, and so on. Hardware does not trap unaligned accesses; instead, two accesses are

performed for a load or store of an unaligned operand that crosses a word boundary. Unaligned accesses that do

not cross word boundaries are performed in one access.

Align branch targets that are unlikely to be hit by “fall-through” code on cache line boundaries (such as the address

of functions such as

strcpy), to minimize the number of unused instructions in cache line fills.

C.2 Instruction Timings

The following timing descriptions consider only “first order” effects of cache misses in the ICU (instruction-side) and

DCU (data-side) arrays.

The timing descriptions

do not provide complete descriptions of the performance penalty associated with cache

misses; the timing descriptions do not consider bus contention between the instruction-side and the data-side, or

the time associated with performing line fills or flushes. Unless specifically stated otherwise, the number of cycles

apply to systems having zero-wait memory access.

C.2.1 General Rules

Instructions execute in order.

All instructions, assuming cache hits, execute in one cycle, except:

• Divide instructions execute in 35 clock cycles.

• Branches execute in one or three clock cycles, as described in “Branches.”

• MAC and multiply instructions execute in one to five cycles as described in “Multiplies.”

• Aligned load/store instructions that hit in the cache execute in one clock cycle/word. See “Alignment” for infor-

mation on execution timings for unaligned load/stores.

• In isolation, a data cache control instruction takes two cycles in the processor pipeline. However, subsequent

DCU accesses are stalled until a cache control instruction finishes accessing the data cache array.

Note: Note that subsequent DCU accesses do not remain stalled while transfers associated with previous

data cache control instructions continue on the PLB.