EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #382 background imageLoading...
Page #382 background image
IA-32 Intel® Architecture Optimization
7-36
Avoid Excessive Software Prefetches
Pentium 4 and Intel Xeon Processors have an automatic hardware
prefetcher. It can bring data and instructions into the unified
second-level cache based on prior reference patterns. In most situations,
the hardware prefetcher is likely to reduce system memory latency
without explicit intervention from software prefetches. It is also
preferable to adjust data access patterns in the code to take advantage of
the characteristics of the automatic hardware prefetcher to improve
locality or mask memory latency. Using software prefetch instructions
excessively or indiscriminately will inevitably cause performance
penalties. This is because excessively or indiscriminately using software
prefetch instructions wastes the command and data bandwidth of the
system bus.
Using software prefetches delays the hardware prefetcher from starting
to fetch data needed by the processor core. It also consumes critical
execution resources and can result in stalled execution. The guidelines
for using software prefetch instructions are described in Chapter 2. The
techniques of using automatic hardware prefetcher is discussed in
Chapter 6.
User/Source Coding Rule 28. (M impact, L generality) Avoid excessive use
of software prefetch instructions and allow automatic hardware prefetcher to
work. Excessive use of software prefetches can significantly and unnecessarily
increase bus utilization if used inappropriately.
Improve Effective Latency of Cache Misses
System memory access latency due to cache misses is affected by bus
traffic. This is because bus read requests must be arbitrated along with
other requests for bus transactions. Reducing the number of outstanding
bus transactions helps improve effective memory access latency.
One technique to improve effective latency of memory read transactions
is to use multiple overlapping bus reads to reduce the latency of sparse
reads. In situations where there is little locality of data or when memory
reads need to be arbitrated with other bus transactions, the effective

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals