EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #173 background imageLoading...
Page #173 background image
General Optimization Guidelines 2
2-101
Assembly/Compiler Coding Rule 10. (M impact, L generality) Do not
put more than four branches in 16-byte chunks. 2-22
Assembly/Compiler Coding Rule 11. (M impact, L generality) Do not
put more than two end loop branches in a 16-byte chunk. 2-22
Assembly/Compiler Coding Rule 12. (M impact, MH generality) If the
average number of total iterations is less than or equal to 100, use a
forward branch to exit the loop. 2-23
Assembly/Compiler Coding Rule 13. (H impact, M generality) Unroll
small loops until the overhead of the branch and the induction variable
accounts, generally, for less than about 10% of the execution time of the
loop. 2-27
Assembly/Compiler Coding Rule 14. (H impact, M generality) Avoid
unrolling loops excessively, as this may thrash the trace cache or
instruction cache. 2-27
Assembly/Compiler Coding Rule 15. (M impact, M generality) Unroll
loops that are frequently executed and that have a predictable number of
iterations to reduce the number of iterations to 16 or fewer, unless this
increases code size so that the working set no longer fits in the trace
cache. If the loop body contains more than one conditional branch, then
unroll so that the number of iterations is 16/(# conditional branches).
2-27
Assembly/Compiler Coding Rule 16. (H impact, H generality) Align
data on natural operand size address boundaries. If the data will be
accesses with vector instruction loads and stores, align the data on
16-byte boundaries. 2-30
Assembly/Compiler Coding Rule 17. (H impact, M generality) Pass
parameters in registers instead of on the stack where possible. Passing
arguments on the stack is a case of store followed by a reload. While this
sequence is optimized in IA-32 processors by providing the value to the
load directly from the memory order buffer without the need to access the
data cache, floating point values incur a significant latency in forwarding.
Passing floating point argument in (preferably XMM) registers should
save this long latency operation. 2-33

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals