EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #549 background image
Mathematics of Prefetch Scheduling Distance E
E-3
T
b
data transfer latency which is equal to number of lines
per iteration * line burst latency
Note that the potential effects of µop reordering are not factored into the
estimations discussed.
Examine Example E-1 that uses the
prefetchnta instruction with a
prefetch scheduling distance of 3, that is, psd = 3. The data prefetched in
iteration i, will actually be used in iteration i+3. T
c
represents the cycles
needed to execute
top_loop - assuming all the memory accesses hit L1
while il (iteration latency) represents the cycles needed to execute this
loop with actually run-time memory footprint. T
c
can be determined by
computing the critical path latency of the code dependency graph. This
work is quite arduous without help from special performance
characterization tools or compilers. A simple heuristic for estimating the
T
c
value is to count the number of instructions in the critical path and
multiply the number with an artificial CPI. A reasonable CPI value
would be somewhere between 1.0 and 1.5 depending on the quality of
code scheduling.
Example E-1 Calculating Insertion for Scheduling Distance of 3
top_loop:
prefetchnta [edx+esi+32*3]
prefetchnta [edx*4+esi+32*3]
. . . . .
movaps xmm1, [edx+esi]
movaps xmm2, [edx*4+esi]
movaps xmm3, [edx+esi+16]
movaps xmm4, [edx*4+esi+16]
. . . . .
. . .
add esi, 32
cmp esi, ecx
jl top_loop

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals