EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #309 background imageLoading...
Page #309 background image
Optimizing Cache Usage 6
6-19
Hardware Prefetch
The automatic hardware prefetch, can bring cache lines into the unified
last-level cache based on prior data misses. The automatic hardware
prefetcher will attempt to prefetch two cache lines ahead of the prefetch
stream. This feature is introduced with the Pentium 4 processor.
The characteristics of the hardware prefetching are as follows:
Requires some regularity in the data access patterns:
if a data access pattern has constant stride, hardware prefetching
is effective only if access stride is less than half of the trigger
distance of hardware prefetcher (see Table 1-2).
if access stride is not constant, the automatic hardware
prefetcher can mask memory latency if the strides of two
successive cache misses are less than the trigger threshold
distance (small-stride memory traffic).
the automatic hardware prefetcher is most effective if the
strides of two successive cache misses remain less than the
trigger threshold distance and close to 64 bytes.
Start-up penalty before hardware prefetcher triggers and extra
fetches after array finishes. For short arrays this overhead can
reduce effectiveness of the hardware prefetcher.
The hardware prefetcher requires a couple misses before it
starts operating.
Hardware prefetching will generate a request for data beyond
the end of an array, which will not be utilized. This behavior
wastes bus bandwidth. In addition this behavior results in a
start-up penalty when fetching the beginning of the next array;
this occurs because the wasted prefetch should have been used
instead to hide the latency for the initial data in the next array.
Software prefetching can recognize and handle these cases.
Will not prefetch across a 4K page boundary; i.e., the program
would have to initiate demand loads for the new page before the
hardware prefetcher will start prefetching from the new page.

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals