Intel ARCHITECTURE IA-32 - Instruction Latency and Throughput

To Next Page

To Previous Page

IA-32 Intel® Architecture Processor Family Overview

1-17

Some parts of the core may speculate that a common condition holds to

allow faster execution. If it does not, the machine may stall. An example

of this pertains to store-to-load forwarding (see “Store Forwarding” in

this chapter). If a load is predicted to be dependent on a store, it gets its

data from that store and tentatively proceeds. If the load turned out not

to depend on the store, the load is delayed until the real data has been

loaded from memory, then it proceeds.

Instruction Latency and Throughput

The superscalar out-of-order core contains hardware resources that can

execute multiple μops in parallel. The core’s ability to make use of

available parallelism of execution units can enhanced by software’s

ability to:

• select IA-32 instructions that can be decoded in less than 4 μops

and/or have short latencies

• order IA-32 instructions to preserve available parallelism by

minimizing long dependence chains and covering long instruction

latencies

• order instructions so that their operands are ready and their

corresponding issue ports and execution units are free when they

reach the scheduler

This subsection describes port restrictions, result latencies, and issue

latencies (also referred to as throughput). These concepts form the basis

to assist software for ordering instructions to increase parallelism. The

order that μops are presented to the core of the processor is further

affected by the machine’s scheduling resources.

It is the execution core that reacts to an ever-changing machine state,

reordering μops for faster execution or delaying them because of

dependence and resource constraints. The ordering of instructions in

software is more of a suggestion to the hardware.

Appendix C, “IA-32 Instruction Latency and Throughput,” lists some of

the more-commonly-used IA-32 instructions with their latency, their

issue throughput, and associated execution units (where relevant). Some

Related product manuals