Intel ARCHITECTURE IA-32

To Next Page

To Previous Page

General Optimization Guidelines 2

2-61

executing SSE/SSE2/SSE3 instructions and when speed is more

important than complying to IEEE standard. The following paragraphs

give recommendations on how to optimize your code to reduce

performance degradations related to floating-point exceptions.

Dealing with floating-point exceptions in x87 FPU code

Every special situation listed in the “Floating-point Exceptions” section

is costly in terms of performance. For that reason, x87 FPU code should

be written to avoid these situations.

There are basically three ways to reduce the impact of

overflow/underflow situations with x87 FPU code:

• Choose floating-point data types that are large enough to

accommodate results without generating arithmetic overflow and

underflow exceptions.

• Scale the range of operands/results to reduce as much as possible the

number of arithmetic overflow/underflow situations.

• Keep intermediate results on the x87 FPU register stack until the

final results have been computed and stored to memory. Overflow

or underflow is less likely to happen when intermediate results are

kept in the x87 FPU stack (this is because data on the stack is stored

in double extended-precision format and overflow/underflow

conditions are detected accordingly).

Denormalized floating-point constants (which are read only, and hence

never change) should be avoided and replaced, if possible, with zeros of

the same sign.

Dealing with Floating-point Exceptions in SSE and SSE2

code

Most special situations that involve masked floating-point exceptions

are handled efficiently on the Pentium 4 processor. When a masked

overflow exception occurs while executing SSE or SSE2 code, the

Pentium 4 processor handles it without performance penalty.

Intel ARCHITECTURE IA-32 - Page 133