EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #258 background imageLoading...
Page #258 background image
IA-32 Intel® Architecture Optimization
4-38
SSE3 provides an instruction LDDQU for loading from memory
address that are not 16 byte aligned. LDDQU is a special 128-bit
unaligned load designed to avoid cache line splits. If the address of the
load is aligned on a 16-byte boundary, LDQQU loads the 16 bytes
requested. If the address of the load is not aligned on a 16-byte
boundary, LDDQU loads a 32-byte block starting at the 16-byte aligned
address immediately below the address of the load request. It then
provides the requested 16 bytes. If the address is aligned on a 16-byte
boundary, the effective number of memory requests is implementation
dependent (one, or more).
LDDQU is designed for programming usage of loading data from
memory without storing modified data back to the same address. Thus,
the usage of LDDQU should be restricted to situations where no
store-to-load forwarding is expected. For situations where store-to-load
forwarding is expected, use regular store/load pairs (either aligned or
unaligned based on the alignment of the data accessed).
Example 4-29 Video Processing Using LDDQU to Avoid Cache Line Splits
// Average half-pels horizonally (on // the “x” axis),
// from one reference frame only.
nextLinesLoop:
lddqu xmm0, XMMWORD PTR [edx] // may not be 16B aligned
lddqu xmm0, XMMWORD PTR [edx+1]
lddqu xmm1, XMMWORD PTR [edx+eax]
lddqu xmm1, XMMWORD PTR [edx+eax+1]
pavgbxmm0, xmm1
pavgbxmm2, xmm3
movdqaXMMWORD PTR [ecx], xmm0 //results stored elsewhere
movdqaXMMWORD PTR [ecx+eax], xmm2
// (repeat ...)

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals