Intel ARCHITECTURE IA-32 - Locality Enhancement

To Next Page

To Previous Page

IA-32 Intel® Architecture Optimization

2-50

be no RFO since the line is not cached, and there is no such delay. For

details on write-combining, see the Intel Architecture Software Devel-

oper’s Manual.

Locality Enhancement

Locality enhancement can reduce data traffic originating from an

outer-level sub-system in the cache/memory hierarchy, this is to address

the fact that the access-cost in terms of cycle-count from an outer level

will be more expensive than from an inner level. Typically, the

cycle-cost of accessing a given cache level (or memory system) varies

across different microarchitecture, processor implementations, and

platform components. It may be sufficient to recognize the relative data

access cost trend by locality rather than to follow a large table of

numeric values of cycle-costs, listed per locality, per processor/platform

implementations, etc. The general trend is typically that access cost

from an outer sub-system may be somewhere between 3-10X more

expensive than accessing data from the immediate inner level in the

cache/memory hierarchy, assuming similar degrees of data access

parallelism.

Thus locality enhancement should start with characterizing the

dominant data traffic locality. “Workload Characterization” in

Appendix A describes some techniques that can be used to determine

the dominant data traffic locality for any workload.

Even if cache miss rates of the last level cache may be low relative to

the number of cache references, processors typically spend a sizable

portion of their execution time waiting for cache misses to be serviced.

Reducing cache misses by enhancing a program’s locality is a key

optimization. This can take several forms:

• blocking to iterate over a portion of an array that will fit in the cache

(with the purpose that subsequent references to the data-block (or

tile) will be cache hit references)

• loop interchange to avoid crossing cache lines or page boundaries

• loop skewing to make accesses contiguous

Related product manuals