EasyManua.ls Logo

Freescale Semiconductor PowerPC e500 Core - Load;Store Interaction; LSU Considerations; MU Considerations

Default Icon
548 pages
Print Icon
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Loading...
PowerPC e500 Core Family Reference Manual, Rev. 1
4-48 Freescale Semiconductor
Execution Timing
pipelined. A new instruction cannot begin execution if the previous instruction is still executing.
Although the majority of instructions executed by the SUs require only a single cycle, mfcr and
many mfspr instructions require several cycles and can cause stalls.
A new instruction cannot execute if one of its operands is not yet available. A new instruction that
is marked as completion-serialized cannot begin execution until it is signalled from the completion
unit that it is the oldest instruction.
4.7.6.2 MU Considerations
The MU is similar to the SUs. The MU has one reservation station. The bypass unit, described in
Section 4.4.3, “Simple and Multiple Unit Execution,” allows divide instructions to execute in
parallel with other MU instructions. Note the following:
A new instruction cannot execute if one of its operands is not yet available.
A new instruction that is marked as completion-serialized cannot begin execution until it is
signaled from the completion unit that it is the oldest instruction.
A new divide instruction cannot begin execution if the previous divide instruction is still
executing.
A new instruction cannot begin execution if it would finish execution at the same time as
an executing divide instruction. As shown in Figure 4-1 and Figure 4-1, the MU consists of
a multiply subunit and a divide subunit. These subunits share the same reservation station
and result bus. In general, when a divide is in progress (which could take up to 35 cycles),
new multiply instructions can proceed down the four-stage multiply subunit. However,
because there is only one result bus, the processor ensures that a divide and a multiply do
not collide on the result bus, with both attempting to write results at the same time. When
a divide is 4 cycles away from providing its result, it blocks a new 4-cycle multiply from
beginning execution (inserting a bubble in the multiply subunit) so that when the divide
provides its result, no multiply will collide with it.
4.7.6.3 LSU Considerations
The following sections describe situations that can affect LSU timing.
4.7.6.3.1 Load/Store Interaction
When loads and stores are intermixed, stores normally lose arbitration to the cache. A store that
repeatedly loses arbitration can stay in the core interface unit store queue much longer than
3 cycles, which is not normally a performance problem because a store in this queue is effectively
part of the architecture-defined state. However, sometimes—including if the store queue fills up
or if a store causes a pipeline stall (as in a partial address alias case of store to load)—the arbiter
gives higher priority to the store, guaranteeing forward progress.

Table of Contents

Related product manuals