EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #406 background imageLoading...
Page #406 background image
IA-32 Intel® Architecture Optimization
7-60
throughput of a physical processor package. The non-halted CPI metric
can be interpreted as the inverse of the throughput of a logical
processor
9
.
When a single thread is executing and all on-chip execution resources
are available to it, non-halted CPI can indicate the unused execution
bandwidth available in the physical processor package. If the value of a
non-halted CPI is significantly higher than unity and overall on-chip
execution resource utilization is low, a multithreaded application can
direct tuning efforts to encompass the factors discussed earlier.
An optimized single thread with exclusive use of on-chip execution
resources may exhibit a non-halted CPI in the neighborhood of unity
10
.
Because most frequently used instructions typically decode into a single
micro-op and have throughput of no more than two cycles, an optimized
thread that retires one micro-op per cycle is only consuming about one
third of peak retirement bandwidth. Significant portions of the issue port
bandwidth are left unused. Thus, optimizing single-thread performance
usually can be complementary with optimizing a multithreaded
application to take advantage of the benefits of Hyper-Threading
Technology.
On a processor supporting Hyper-Threading Technology, it is possible
that an execution unit with lower throughput than one issue every two
cycles may find itself in contention from two threads implemented using
a data decomposition threading model. In one scenario, this can happen
when the inner loop of both threads rely on executing a low-throughput
instruction, such as
fdiv, and the execution time of the inner loop is
bound by the throughput of
fdiv.
9. Non-halted CPI can correlate to the resource utilization of an application thread, if the
application thread is affinitized to a fixed logical processor.
10. In current implementations of processors based on Intel NetBurst microarchitecture, the
theoretical lower bound for either non-halted CPI or non-sleep CPI is 1/3. Practical
applications rarely achieve any value close to the lower bound.

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals