EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #215 background imageLoading...
Page #215 background image
Coding for SIMD Architectures 3
3-35
For the first iteration of the inner loop, each access to array B will generate a
cache miss.
If the size of one row of array A, that is, A[2, 0:MAX-1], is
large enough, by the time the second iteration starts, each access to array
B will always generate a cache miss. For instance, on the first iteration,
the cache line containing
B[0, 0:7] will be brought in when B[0,0] is
referenced because the
float type variable is four bytes and each cache
line is 32 bytes. Due to the limitation of cache capacity, this line will be
evicted due to conflict misses before the inner loop reaches the end. For
the next iteration of the outer loop, another cache miss will be generated
while referencing
B[0,1]. In this manner, a cache miss occurs when
each element of array
B is referenced, that is, there is no data reuse in the
cache at all for array
B.
Example 3-20 Loop Blocking
A. Original Loop
float A[MAX, MAX], B[MAX, MAX]
for (i=0; i< MAX; i++) {
for (j=0; j< MAX; j++) {
A[i,j] = A[i,j] + B[j, i];
}
}
B. Transformed Loop after Blocking
float A[MAX, MAX], B[MAX, MAX];
for (i=0; i< MAX; i+=block_size) {
for (j=0; j< MAX; j+=block_size) {
for (ii=i; ii<i+block_size; ii++) {
for (jj=j; jj<j+block_size; jj++) {
A[ii,jj] = A[ii,jj] + B[jj, ii];
}
}
}
}

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals