EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #121 background imageLoading...
Page #121 background image
General Optimization Guidelines 2
2-49
write misses; only four write-combining buffers are guaranteed to be
available for simultaneous use. Write combining applies to memory
type WC; it does not apply to memory type UC.
Assembly/Compiler Coding Rule 28. (H impact, L generality) If an inner
loop writes to more than four arrays, (four distinct cache lines), apply loop
fission to break up the body of the loop such that only four arrays are being
written to in each iteration of each of the resulting loops.
The write combining buffers are used for stores of all memory types.
They are particularly important for writes to uncached memory: writes
to different parts of the same cache line can be grouped into a single,
full-cache-line bus transaction instead of going across the bus (since
they are not cached) as several partial writes. Avoiding partial writes can
have a significant impact on bus bandwidth-bound graphics applica-
tions, where graphics buffers are in uncached memory. Separating
writes to uncached memory and writes to writeback memory into sepa-
rate phases can assure that the write combining buffers can fill before
getting evicted by other write traffic. Eliminating partial write transac-
tions has been found to have performance impact of the order of 20%
for some applications. Because the cache lines are 64 bytes, a write to
the bus for 63 bytes will result in 8 partial bus transactions.
When coding functions that execute simultaneously on two threads,
reducing the number of writes that are allowed in an inner loop will
help take full advantage of write-combining store buffers. For
write-combining buffer recommendations for Hyper-Threading
Technology, see Chapter 7.
Store ordering and visibility are also important issues for write combin-
ing. When a write to a write-combining buffer for a previously-unwrit-
ten cache line occurs, there will be a read-for-ownership (RFO). If a
subsequent write happens to another write-combining buffer, a separate
RFO may be caused for that cache line. Subsequent writes to the first
cache line and write-combining buffer will be delayed until the second
RFO has been serviced to guarantee properly ordered visibility of the
writes. If the memory type for the writes is write-combining, there will

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals