Freescale Semiconductor PowerPC e500 Core - Load;Store Interaction; LSU Considerations; MU Considerations

To Next Page

To Previous Page

PowerPC e500 Core Family Reference Manual, Rev. 1

4-48 Freescale Semiconductor

Execution Timing

pipelined. A new instruction cannot begin execution if the previous instruction is still executing.

Although the majority of instructions executed by the SUs require only a single cycle, mfcr and

many mfspr instructions require several cycles and can cause stalls.

A new instruction cannot execute if one of its operands is not yet available. A new instruction that

is marked as completion-serialized cannot begin execution until it is signalled from the completion

unit that it is the oldest instruction.

4.7.6.2 MU Considerations

The MU is similar to the SUs. The MU has one reservation station. The bypass unit, described in

Section 4.4.3, “Simple and Multiple Unit Execution,” allows divide instructions to execute in

parallel with other MU instructions. Note the following:

• A new instruction cannot execute if one of its operands is not yet available.

• A new instruction that is marked as completion-serialized cannot begin execution until it is

signaled from the completion unit that it is the oldest instruction.

• A new divide instruction cannot begin execution if the previous divide instruction is still

executing.

• A new instruction cannot begin execution if it would finish execution at the same time as

an executing divide instruction. As shown in Figure 4-1 and Figure 4-1, the MU consists of

a multiply subunit and a divide subunit. These subunits share the same reservation station

and result bus. In general, when a divide is in progress (which could take up to 35 cycles),

new multiply instructions can proceed down the four-stage multiply subunit. However,

because there is only one result bus, the processor ensures that a divide and a multiply do

not collide on the result bus, with both attempting to write results at the same time. When

a divide is 4 cycles away from providing its result, it blocks a new 4-cycle multiply from

beginning execution (inserting a bubble in the multiply subunit) so that when the divide

provides its result, no multiply will collide with it.

4.7.6.3 LSU Considerations

The following sections describe situations that can affect LSU timing.

4.7.6.3.1 Load/Store Interaction

When loads and stores are intermixed, stores normally lose arbitration to the cache. A store that

repeatedly loses arbitration can stay in the core interface unit store queue much longer than

3 cycles, which is not normally a performance problem because a store in this queue is effectively

part of the architecture-defined state. However, sometimes—including if the store queue fills up

or if a store causes a pipeline stall (as in a partial address alias case of store to load)—the arbiter

gives higher priority to the store, guaranteeing forward progress.

Related product manuals