Intel ARCHITECTURE IA-32 - X87 Vs. Scalar SIMD Floating-Point Trade-Offs

To Next Page

To Previous Page

General Optimization Guidelines 2

2-69

This in turn allows instructions to be reordered to make instructions

available to be executed in parallel. Out-of-order execution precludes

the need for using

fxch to move instructions for very short distances.

x87 vs. Scalar SIMD Floating-point Trade-offs

There are a number of differences between x87 floating-point code and

scalar floating-point code (using SSE and SSE2). The following

differences drive decisions about which registers and instructions to use:

• When an input operand for a SIMD floating-point instruction

contains values that are less than the representable range of the data

type, a denormal exception occurs. This causes significant

performance penalty. SIMD floating-point operation has a

flush-to-zero mode. In flush-to-zero mode, the results will not

underflow. Therefore subsequent computation will not face the

performance penalty of handling denormal input operands. For

example, in the case of 3D applications with low lighting levels,

using flush-to-zero mode can improve performance by as much as

50% for applications with large numbers underflows.

• Scalar floating point SIMD instructions have lower latencies. This

generally does not matter much as long as resource utilization is

low.

• Only x87 supports transcendental instructions.

• x87 supports 80-bit precision, double extended floating point.

Streaming SIMD Extensions support a maximum of 32-bit

precision, and Streaming SIMD Extensions 2 supports a maximum

of 64-bit precision.

• On the Pentium 4 processor, floating point adds are pipelined for

x87 but not for scalar floating-point code. Floating point multiplies

are not pipelined for either case. For applications with a large

number of floating-point

adds relative to the number of

multiplies, x87 may be a better choice.

Related product manuals