Intel ARCHITECTURE IA-32 - Using Shared Execution Resources in a Processor Core

To Next Page

To Previous Page

Multi-Core and Hyper-Threading Technology 7

7-59

seldom reaches 50% of peak retirement bandwidth. Thus, improving

single-thread execution throughput should also benefit multi-threading

performance.

Tuning Suggestion 4. (H Impact, M Generality) Optimize multithreaded

applications to achieve optimal processor scaling with respect to the number of

physical processors or processor cores.

Following the guidelines, such as reduce thread synchronization costs,

locality enhancements, and conserving bus bandwidth, will allow

multi-threading hardware to exploit task-level parallelism in the

workload and improve MP scaling. In general, reducing the dependence

of resources shared between physical packages will benefit processor

scaling with respect to the number of physical processors. Similarly,

heavy reliance on resources shared with different cores is likely to

reduce processor scaling performance. On the other hand, using shared

resource effectively can deliver positive benefit in processor scaling, if

the workload does saturate the critical resource in contention.

Tuning Suggestion 5. (M Impact, L Generality) Schedule threads that

compete for the same execution resource to separate processor cores.

Tuning Suggestion 6. (M Impact, L Generality) Use on-chip execution

resources cooperatively if two logical processors are sharing the execution

resources in the same processor core.

Using Shared Execution Resources in a Processor Core

One way to measure the degree of overall resource utilization by a

single thread is to use performance-monitoring events to count the clock

cycles that a logical processor is executing code and compare that

number to the number of instructions executed to completion. Such

performance metrics are described in Appendix B and can be accessed

using the Intel VTune Performance Analyzer.

An event ratio like non-halted cycles per instructions retired (non-halted

CPI) and non-sleep CPI can be useful in directing code-tuning efforts.

The non-sleep CPI metric can be interpreted as the inverse of the overall

Related product manuals