EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #122 background imageLoading...
Page #122 background image
IA-32 Intel® Architecture Optimization
2-50
be no RFO since the line is not cached, and there is no such delay. For
details on write-combining, see the Intel Architecture Software Devel-
opers Manual.
Locality Enhancement
Locality enhancement can reduce data traffic originating from an
outer-level sub-system in the cache/memory hierarchy, this is to address
the fact that the access-cost in terms of cycle-count from an outer level
will be more expensive than from an inner level. Typically, the
cycle-cost of accessing a given cache level (or memory system) varies
across different microarchitecture, processor implementations, and
platform components. It may be sufficient to recognize the relative data
access cost trend by locality rather than to follow a large table of
numeric values of cycle-costs, listed per locality, per processor/platform
implementations, etc. The general trend is typically that access cost
from an outer sub-system may be somewhere between 3-10X more
expensive than accessing data from the immediate inner level in the
cache/memory hierarchy, assuming similar degrees of data access
parallelism.
Thus locality enhancement should start with characterizing the
dominant data traffic locality. “Workload Characterization” in
Appendix A describes some techniques that can be used to determine
the dominant data traffic locality for any workload.
Even if cache miss rates of the last level cache may be low relative to
the number of cache references, processors typically spend a sizable
portion of their execution time waiting for cache misses to be serviced.
Reducing cache misses by enhancing a program’s locality is a key
optimization. This can take several forms:
blocking to iterate over a portion of an array that will fit in the cache
(with the purpose that subsequent references to the data-block (or
tile) will be cache hit references)
loop interchange to avoid crossing cache lines or page boundaries
loop skewing to make accesses contiguous

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals