EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #113 background imageLoading...
Page #113 background image
General Optimization Guidelines 2
2-41
However, if the access pattern of the array exhibits locality, such as if
the array index is being swept through, then the Pentium 4 processor
prefetches data from
struct_of_array, even if the elements of the
structure are accessed together.
When the elements of the structure are not accessed with equal
frequency, such as when element
a is accessed ten times more often than
the other entries, then
struct_of_array not only saves memory, but it
also prevents fetching unnecessary data items
b, c, d, and e.
Using
struct_of_array also enables the use of the SIMD data types by
the programmer and the compiler.
Note that
struct_of_array can have the disadvantage of requiring
more independent memory stream references. This can require the use
of more prefetches and additional address generation calculations. It can
also have an impact on DRAM page access efficiency. An alternative,
hybrid_struct_of_array blends the two approaches. In this case, only
2 separate address streams are generated and referenced: 1 for
hybrid_struct_of_array_ace and 1 for
hybrid_struct_of_array_bd. The second alterative also prevents
fetching unnecessary data (assuming the variables
a, c and e are always
used together; whereas the variables
b and d would be also used
together, but not at the same time as
a, c and e).
The hybrid approach ensures:
simpler/fewer address generation than struct_of_array
fewer streams, which reduces DRAM page misses
use of fewer prefetches due to fewer streams
efficient cache line packing of data elements that are used
concurrently
Assembly/Compiler Coding Rule 23. (H impact, M generality) Try to
arrange data structures such that they permit sequential access.
If the data is arranged into set of streams, the automatic hardware
prefetcher can prefetch data that will be needed by the application,
reducing the effective memory latency. If the data is accessed in a

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals