EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #405 background imageLoading...
Page #405 background image
Multi-Core and Hyper-Threading Technology 7
7-59
seldom reaches 50% of peak retirement bandwidth. Thus, improving
single-thread execution throughput should also benefit multi-threading
performance.
Tuning Suggestion 4. (H Impact, M Generality) Optimize multithreaded
applications to achieve optimal processor scaling with respect to the number of
physical processors or processor cores.
Following the guidelines, such as reduce thread synchronization costs,
locality enhancements, and conserving bus bandwidth, will allow
multi-threading hardware to exploit task-level parallelism in the
workload and improve MP scaling. In general, reducing the dependence
of resources shared between physical packages will benefit processor
scaling with respect to the number of physical processors. Similarly,
heavy reliance on resources shared with different cores is likely to
reduce processor scaling performance. On the other hand, using shared
resource effectively can deliver positive benefit in processor scaling, if
the workload does saturate the critical resource in contention.
Tuning Suggestion 5. (M Impact, L Generality) Schedule threads that
compete for the same execution resource to separate processor cores.
Tuning Suggestion 6. (M Impact, L Generality) Use on-chip execution
resources cooperatively if two logical processors are sharing the execution
resources in the same processor core.
Using Shared Execution Resources in a Processor Core
One way to measure the degree of overall resource utilization by a
single thread is to use performance-monitoring events to count the clock
cycles that a logical processor is executing code and compare that
number to the number of instructions executed to completion. Such
performance metrics are described in Appendix B and can be accessed
using the Intel VTune Performance Analyzer.
An event ratio like non-halted cycles per instructions retired (non-halted
CPI) and non-sleep CPI can be useful in directing code-tuning efforts.
The non-sleep CPI metric can be interpreted as the inverse of the overall

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals