Intel ARCHITECTURE IA-32 - Increasing Bandwidth of Memory Fills and Video Fills; Increasing Memory Bandwidth Using the MOVDQ Instruction; Increasing Memory Bandwidth by Loading and Storing to and from the Same DRAM Page

To Next Page

To Previous Page

Optimizing for SIMD Integer Applications 4

4-39

Increasing Bandwidth of Memory Fills and Video Fills

It is beneficial to understand how memory is accessed and filled. A

memory-to-memory fill (for example a memory-to-video fill) is defined

as a 64-byte (cache line) load from memory which is immediately stored

back to memory (such as a video frame buffer). The following are

guidelines for obtaining higher bandwidth and shorter latencies for

sequential memory fills (video fills). These recommendations are

relevant for all Intel architecture processors with MMX technology and

refer to cases in which the loads and stores do not hit in the first- or

second-level cache.

Increasing Memory Bandwidth Using the MOVDQ

Instruction

Loading any size data operand will cause an entire cache line to be

loaded into the cache hierarchy. Thus any size load looks more or less

the same from a memory bandwidth perspective. However, using many

smaller loads consumes more microarchitectural resources than fewer

larger stores. Consuming too many of these resources can cause the

processor to stall and reduce the bandwidth that the processor can

request of the memory subsystem.

Using

movdq to store the data back to UC memory (or WC memory in

some cases) instead of using 32-bit stores (for example,

movd) will

reduce by three-quarters the number of stores per memory fill cycle. As

a result, using the

movdq instruction in memory fill cycles can achieve

significantly higher effective bandwidth than using the

movd instruction.

Increasing Memory Bandwidth by Loading and Storing to

and from the Same DRAM Page

DRAM is divided into pages, which are not the same as operating

system (OS) pages. The size of a DRAM page is a function of the total

size of the DRAM and the organization of the DRAM. Page sizes of

several Kilobytes are common. Like OS pages, DRAM pages are

constructed of sequential addresses. Sequential memory accesses to the

Related product manuals