Intel ARCHITECTURE IA-32

To Next Page

To Previous Page

General Optimization Guidelines 2

2-63

FPU control word (FCW), such as when performing conversions to

integers. On Pentium M, Intel Core Solo and Intel Core Duo processors;

FLDCW is improved over previous generations.

Specifically, the optimization for

FLDCW allows programmers to

alternate between two constant values efficiently. For the

FLDCW

optimization to be effective, the two constant FCW values are only

allowed to differ on the following 5 bits in the FCW:

FCW[8-9] precision control

FCW[10-11] rounding control

FCW[12] infinity control

If programmers need to modify other bits (for example: mask bits) in the

FCW, the

FLDCW instruction is still an expensive operation.

In situations where an application cycles between three (or more)

constant values,

FLDCW optimization does not apply and the performance

degradation occurs for each

FLDCW instruction.

One solution to this problem is to choose two constant FCW values,

take advantage of the optimization of the

FLDCW instruction to alternate

between only these two constant FCW values, and devise some means

to accomplish the task that requires the 3rd FCW value without actually

changing the FCW to a third constant value. An alternative solution is to

structure the code so that, for periods of time, the application alternates

between only two constant FCW values. When the application later

alternates between a pair of different FCW values, the performance

degradation occurs only during the transition.

It is expected that SIMD applications are unlikely to alternate FTZ and

DAZ mode values. Consequently, the SIMD control word does not have

the short latencies that the floating-point control register does. A read of

the

MXCSR register has a fairly long latency, and a write to the register is

a serializing instruction.

There is no separate control word for single and double precision; both

use the same modes. Notably, this applies to both FTZ and DAZ modes.

Intel ARCHITECTURE IA-32 - Page 135