EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #314 background imageLoading...
Page #314 background image
IA-32 Intel® Architecture Optimization
6-24
The performance loss caused by poor utilization of resources can be
completely eliminated by correctly scheduling the prefetch instructions
appropriately. As shown in Figure 6-3, prefetch instructions are issued
two vertex iterations ahead. This assumes that only one vertex gets
processed in one iteration and a new data cache line is needed for each
iteration. As a result, when iteration n, vertex V
n
, is being processed, the
requested data is already brought into cache. In the meantime, the
front-side bus is transferring the data needed for iteration n+1, vertex
V
n+1
. Because there is no dependence between V
n+1
data and the
execution of V
n
, the latency for data access of V
n+1
can be entirely
hidden behind the execution of V
n
. Under such circumstances, no
“bubbles” are present in the pipelines and thus the best possible
performance can be achieved.
Prefetching is useful for inner loops that have heavy computations, or
are close to the boundary between being compute-bound and
memory-bandwidth-bound.
The prefetch is probably not very useful for loops which are
predominately memory bandwidth-bound.
When data is already located in the first level cache, prefetching can be
useless and could even slow down the performance because the extra
µops either back up waiting for outstanding memory accesses or may be
dropped altogether. This behavior is platform-specific and may change
in the future.
Software Prefetching Usage Checklist
The following checklist covers issues that need to be addressed and/or
resolved to use the software prefetch instruction properly:
Determine software prefetch scheduling distance
Use software prefetch concatenation
Minimize the number of software prefetches
Mix software prefetch with computation instructions
Use cache blocking techniques (for example, strip mining)

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals