AMCC PPC405 - C.2.5 Scalar Store Instructions; C.2.6 Alignment in Scalar Load and Store Instructions; C.2.7 String and Multiple Instructions

To Next Page

To Previous Page

AMCC Proprietary 434

Revision 1.02 - September 10, 2007

PPC405 Processor

Preliminary User’s Manual

C.2.5 Scalar Store Instructions

Cacheable stores that miss in the DCU, and non cacheable stores, are queued in the data cache so that the store

appears to execute in a single cycle if operand-aligned. Under certain conditions, the DCU can pipeline up to three

store instructions. (See Cache Operations on page 69 for more information.)

stwcx. instructions that do not cause

alignment errors execute in two cycles.

C.2.6 Alignment in Scalar Load and Store Instructions

The PPC405 requires an extra cycle to execute scalar loads and stores having unaligned big or little endian data

(except for

lwarx and stwcx., which require word-aligned operands). If the target data is not operand aligned, and

the sum of the least two significant bits of the effective address (EA) and the byte count is greater than four, the

PPC405 decomposes a load or store scalar into two load or store operations. That is, the PPC405 never presents

the DCU with a request for a transfer that crosses a word boundary. For example, a

lwz with an EA of 0b11 causes

the PPC405 to decompose the

lwz into two load operations. The first load operation is for a byte at the starting

effective address; the second load operation is for three bytes, starting at the next word address.

C.2.7 String and Multiple Instructions

Calculating execution times for string and multiple instructions (

lmw and stmw) instructions requires an

understanding of data alignment, and of the behavior of the string instructions with respect to alignment.

In the following example, the string contains 21 bytes. The first three bytes do not begin on a word boundary,

and the final two bytes do not end on a word boundary. The PPC405 handles any unaligned leading bytes as

a special case, then moves as many bytes as aligned words as possible, and finally handles any unaligned

trailing bytes as a special case.

In the following example, arrows indicate word boundaries (the address is an exact multiple of four); shaded

boxes represent unaligned bytes.

The execution time of the string instruction is the sum of the:

1. Cycles required to handle unaligned leading bytes; if any, add one clock cycle.

In the example, there are unaligned leading bytes; this transfer adds one clock cycle.

2. Cycles required to handle the number of word-aligned transfers required. Assuming data cache hits, each

word-aligned transfer requires one clock cycle.

In the example, there are four aligned words; this transfer requires four clock cycles.

3. Cycles required to handle unaligned trailing bytes; if any, add one clock cycle.

In the example, there are unaligned trailing bytes; this transfer adds one clock cycle.

A string instruction operating on the example 21-byte string requires six clock cycles.