Freescale Semiconductor PowerPC e500 Core - Simple and Multiple Unit Execution

To Next Page

To Previous Page

Execution Timing

PowerPC e500 Core Family Reference Manual, Rev. 1

Freescale Semiconductor 4-27

The core interface unit handles all bus transactions initiated by the ILFB, DLFB, and DWB. The

core interface unit handles all ordering and bus protocol and is the interface between the core

complex and the external memory and caches.

The core interface unit performs transactions through the core complex bus by transferring either

the critical–double-word first (8 bytes) or the critical–quad-word first (16 bytes). It then forwards

the transaction to the instruction or data line fill buffer critical double word first. The core complex

bus also captures snoop addresses for the L1 data cache and the memory reservation (lwarx and

stwcx.) operations.

4.4.3 Simple and Multiple Unit Execution

The e500 has two simple units (SU1, SU2) and one multiple unit (MU). On the e500v2, the MU

has an additional six-stage subunit through which all double-precision floating-point instructions

pass. The SUs execute all Book E logical and computational instructions except multiplies and

divides, SPE single-cycle arithmetic, logical, shift, and splat instructions, and embedded

floating-point APU arithmetic and logical instructions. The MU executes multiplies, divides, and

multi-cycle arithmetic instructions defined by the SPE and embedded floating-point APUs.

Divide latency depends upon the operand data and ranges from 4 to 35 cycles, as shown in

Table 4-2.

LSU data

line fill

buffer

(DLFB)

DLFB entries are used for loads and cacheable stores. Stores are allocated in the DLFB so loads can access data

from the store immediately (loads cannot access data from the L1 store queue). Also, by using the DLFB entries for

stores, the LSU frees L1 store queue entries, even on store misses. Multiple cacheable store misses to the same

cache line are merged in a DLFB.

LSU data

write buffer

(DWB)

When a full line of data is available in the DLFB, the data cache is updated. If a data cache update requires a cache

line to be evicted, the line is cast out and placed in the DWB until the data has been transferred through the core

interface unit to the core complex bus. If global memory’s coherency needs to be maintained as a result of bus

snooping, the L1 cache can also evict a line to the DWB. (This is a snoop push.) Cast-out and snoop push writes

from the L1 cache are cache-line aligned (critical word is not written first), regardless of which word in a modified

cache line is accessed.

One DWB entry is dedicated for snoop pushes, one is for cast outs, and one can be used for either.

Table 4-2. The Effect of Operand Size on Divide Latency

Instruction Condition Latency

efsdiv

rA or rB is 0.0 4

All others 29

efddiv

All double-precision floating-point divides (e500v2 only) 32

evfsdiv

rA or rB are 0.0 for both upper and lower 4

All others 29

Table 4-1. Load and Store Queues (continued)

Queue Description

Freescale Semiconductor PowerPC e500 Core - Simple and Multiple Unit Execution

Table of Contents

Related product manuals