EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #256 background imageLoading...
Page #256 background image
IA-32 Intel® Architecture Optimization
4-36
Let us now consider a case with a series of small loads after a large store
to the same area of memory (beginning at memory address
mem) as
shown in Example 4-26. Most of the small loads will stall because they
are not aligned with the store; see “Store Forwarding” in Chapter 2 for
more details.
The word loads must wait for the quadword store to write to memory
before they can access the data they require. This stall can also occur
with other data types (for example, when doublewords or words are
stored and then words or bytes are read from the same area of memory).
When you change the code sequence as shown in Example 4-27, the
processor can access the data without delay.
Example 4-26 A Series of Small Loads after a Large Store
movq mem, mm0 ; store qword to address “mem"
:
:
mov bx, mem + 2 ; load word at “mem + 2" stalls
mov cx, mem + 4 ; load word at “mem + 4" stalls
Example 4-27 Eliminating Delay for a Series of Small Loads after a Large Store
movq mem, mm0 ; store qword to address “mem"
:
:
movq mm1, mem ; load qword at address “mem"
movd eax, mm1 ; transfer “mem + 2" to eax from
; MMX register, not memory
psrlq mm1, 32
shr eax, 16
movd ebx, mm1 ; transfer “mem + 4" to bx from
; MMX register, not memory
and ebx, 0ffffh

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals