EasyManua.ls Logo

AMCC PPC405 - C.2.5 Scalar Store Instructions; C.2.6 Alignment in Scalar Load and Store Instructions; C.2.7 String and Multiple Instructions

Default Icon
450 pages
Print Icon
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Loading...
AMCC Proprietary 434
Revision 1.02 - September 10, 2007
PPC405 Processor
Preliminary User’s Manual
C.2.5 Scalar Store Instructions
Cacheable stores that miss in the DCU, and non cacheable stores, are queued in the data cache so that the store
appears to execute in a single cycle if operand-aligned. Under certain conditions, the DCU can pipeline up to three
store instructions. (See Cache Operations on page 69 for more information.)
stwcx. instructions that do not cause
alignment errors execute in two cycles.
C.2.6 Alignment in Scalar Load and Store Instructions
The PPC405 requires an extra cycle to execute scalar loads and stores having unaligned big or little endian data
(except for
lwarx and stwcx., which require word-aligned operands). If the target data is not operand aligned, and
the sum of the least two significant bits of the effective address (EA) and the byte count is greater than four, the
PPC405 decomposes a load or store scalar into two load or store operations. That is, the PPC405 never presents
the DCU with a request for a transfer that crosses a word boundary. For example, a
lwz with an EA of 0b11 causes
the PPC405 to decompose the
lwz into two load operations. The first load operation is for a byte at the starting
effective address; the second load operation is for three bytes, starting at the next word address.
C.2.7 String and Multiple Instructions
Calculating execution times for string and multiple instructions (
lmw and stmw) instructions requires an
understanding of data alignment, and of the behavior of the string instructions with respect to alignment.
In the following example, the string contains 21 bytes. The first three bytes do not begin on a word boundary,
and the final two bytes do not end on a word boundary. The PPC405 handles any unaligned leading bytes as
a special case, then moves as many bytes as aligned words as possible, and finally handles any unaligned
trailing bytes as a special case.
In the following example, arrows indicate word boundaries (the address is an exact multiple of four); shaded
boxes represent unaligned bytes.
The execution time of the string instruction is the sum of the:
1. Cycles required to handle unaligned leading bytes; if any, add one clock cycle.
In the example, there are unaligned leading bytes; this transfer adds one clock cycle.
2. Cycles required to handle the number of word-aligned transfers required. Assuming data cache hits, each
word-aligned transfer requires one clock cycle.
In the example, there are four aligned words; this transfer requires four clock cycles.
3. Cycles required to handle unaligned trailing bytes; if any, add one clock cycle.
In the example, there are unaligned trailing bytes; this transfer adds one clock cycle.
A string instruction operating on the example 21-byte string requires six clock cycles.

Table of Contents