EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #259 background imageLoading...
Page #259 background image
Optimizing for SIMD Integer Applications 4
4-39
Increasing Bandwidth of Memory Fills and Video Fills
It is beneficial to understand how memory is accessed and filled. A
memory-to-memory fill (for example a memory-to-video fill) is defined
as a 64-byte (cache line) load from memory which is immediately stored
back to memory (such as a video frame buffer). The following are
guidelines for obtaining higher bandwidth and shorter latencies for
sequential memory fills (video fills). These recommendations are
relevant for all Intel architecture processors with MMX technology and
refer to cases in which the loads and stores do not hit in the first- or
second-level cache.
Increasing Memory Bandwidth Using the MOVDQ
Instruction
Loading any size data operand will cause an entire cache line to be
loaded into the cache hierarchy. Thus any size load looks more or less
the same from a memory bandwidth perspective. However, using many
smaller loads consumes more microarchitectural resources than fewer
larger stores. Consuming too many of these resources can cause the
processor to stall and reduce the bandwidth that the processor can
request of the memory subsystem.
Using
movdq to store the data back to UC memory (or WC memory in
some cases) instead of using 32-bit stores (for example,
movd) will
reduce by three-quarters the number of stores per memory fill cycle. As
a result, using the
movdq instruction in memory fill cycles can achieve
significantly higher effective bandwidth than using the
movd instruction.
Increasing Memory Bandwidth by Loading and Storing to
and from the Same DRAM Page
DRAM is divided into pages, which are not the same as operating
system (OS) pages. The size of a DRAM page is a function of the total
size of the DRAM and the organization of the DRAM. Page sizes of
several Kilobytes are common. Like OS pages, DRAM pages are
constructed of sequential addresses. Sequential memory accesses to the

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals