EasyManuals Logo

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #94 background imageLoading...
Page #94 background image
IA-32 Intel® Architecture Optimization
2-22
Inlining, Calls and Returns
The return address stack mechanism augments the static and dynamic
predictors to optimize specifically for calls and returns. It holds 16
entries, which is large enough to cover the call depth of most programs.
If there is a chain of more than 16 nested calls and more than 16 returns
in rapid succession, performance may be degraded.
The trace cache maintains branch prediction information for calls and
returns. As long as the trace with the call or return remains in the trace
cache and if the call and return targets remain unchanged, the depth
limit of the return address stack described above will not impede
performance.
To enable the use of the return stack mechanism, calls and returns must
be matched in pairs. If this is done, the likelihood of exceeding the
stack depth in a manner that will impact performance is very low.
Assembly/Compiler Coding Rule 4. (MH impact, MH generality) Near
calls must be matched with near returns, and far calls must be matched with
far returns. Pushing the return address on the stack and jumping to the routine
to be called is not recommended since it creates a mismatch in calls and
returns.
Calls and returns are expensive; use inlining for the following reasons:
Parameter passing overhead can be eliminated.
In a compiler, inlining a function exposes more opportunity for
optimization.
If the inlined routine contains branches, the additional context of the
caller may improve branch prediction within the routine.
A mispredicted branch can lead to larger performance penalties
inside a small function than if that function is inlined.
Assembly/Compiler Coding Rule 5. (MH impact, MH generality)
Selectively inline a function where doing so decreases code size or if the
function is small and the call site is frequently executed.

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
Instruction Setx86
Instruction Set TypeCISC
Memory SegmentationSupported
Operating ModesReal mode, Protected mode, Virtual 8086 mode
Max Physical Address Size36 bits (with PAE)
Max Virtual Address Size32 bits
ArchitectureIA-32 (Intel Architecture 32-bit)
Addressable Memory4 GB (with Physical Address Extension up to 64 GB)
Floating Point Registers8 x 80-bit
MMX Registers8 x 64-bit
SSE Registers8 x 128-bit
RegistersGeneral-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP), Segment registers (CS, DS, SS, ES, FS, GS), Instruction pointer (EIP), Flags register (EFLAGS)
Floating Point UnitYes (x87)

Related product manuals