EasyManua.ls Logo

Intel ARCHITECTURE IA-32 - Example E-1 Calculating Insertion for Scheduling Distance of 3

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Loading...
Mathematics of Prefetch Scheduling Distance E
E-3
T
b
data transfer latency which is equal to number of lines
per iteration * line burst latency
Note that the potential effects of µop reordering are not factored into the
estimations discussed.
Examine Example E-1 that uses the
prefetchnta instruction with a
prefetch scheduling distance of 3, that is, psd = 3. The data prefetched in
iteration i, will actually be used in iteration i+3. T
c
represents the cycles
needed to execute
top_loop - assuming all the memory accesses hit L1
while il (iteration latency) represents the cycles needed to execute this
loop with actually run-time memory footprint. T
c
can be determined by
computing the critical path latency of the code dependency graph. This
work is quite arduous without help from special performance
characterization tools or compilers. A simple heuristic for estimating the
T
c
value is to count the number of instructions in the critical path and
multiply the number with an artificial CPI. A reasonable CPI value
would be somewhere between 1.0 and 1.5 depending on the quality of
code scheduling.
Example E-1 Calculating Insertion for Scheduling Distance of 3
top_loop:
prefetchnta [edx+esi+32*3]
prefetchnta [edx*4+esi+32*3]
. . . . .
movaps xmm1, [edx+esi]
movaps xmm2, [edx*4+esi]
movaps xmm3, [edx+esi+16]
movaps xmm4, [edx*4+esi+16]
. . . . .
. . .
add esi, 32
cmp esi, ecx
jl top_loop

Table of Contents

Related product manuals