EasyManuals Logo
Home>Intel>Processor>ARCHITECTURE IA-32

Intel ARCHITECTURE IA-32 User Manual

Intel ARCHITECTURE IA-32
568 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #144 background imageLoading...
Page #144 background image
IA-32 Intel® Architecture Optimization
2-72
Floating-Point Stalls
Floating-point instructions have a latency of at least two cycles. But,
because of the out-of-order nature of Pentium II and the subsequent
processors, stalls will not necessarily occur on an instruction or µop
basis. However, if an instruction has a very long latency such as an
fdiv, then scheduling can improve the throughput of the overall
application.
x87 Floating-point Operations with Integer Operands
For Pentium 4 processor, splitting floating-point operations (fiadd,
fisub, fimul, and fidiv) that take 16-bit integer operands into two
instructions (
fild and a floating-point operation) is more efficient.
However, for floating-point operations with 32-bit integer operands,
using
fiadd, fisub, fimul, and fidiv is equally efficient compared
with using separate instructions.
Assembly/Compiler Coding Rule 36. (M impact, L generality) Try to use
32-bit operands rather than 16-bit operands for fild. However, do not do so
at the expense of introducing a store forwarding problem by writing the two
halves of the 32-bit memory operand separately.
x87 Floating-point Comparison Instructions
On Pentium II and the subsequent processors, the fcomi and fcmov
instructions should be used when performing floating-point
comparisons. Using (
fcom, fcomp, fcompp) instructions typically
requires additional instruction like
fstsw. The latter alternative causes
more
μops to be decoded, and should be avoided.
Transcendental Functions
If an application needs to emulate math functions in software due to
performance or other reasons (see the “Guidelines for Optimizing
Floating-point Code” section), it may be worthwhile to inline math
library calls because the
call and the prologue/epilogue involved with
such calls can significantly affect the latency of operations.

Table of Contents

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Intel ARCHITECTURE IA-32 and is the answer not in the manual?

Intel ARCHITECTURE IA-32 Specifications

General IconGeneral
BrandIntel
ModelARCHITECTURE IA-32
CategoryProcessor
LanguageEnglish

Related product manuals