Intel ARCHITECTURE IA-32

To Next Page

To Previous Page

IA-32 Intel® Architecture Optimization

2-64

Assembly/Compiler Coding Rule 31. (H impact, M generality) Minimize

changes to bits 8-12 of the floating point control word. Changes for more than

two values (each value being a combination of the following bits: precision,

rounding and infinity control, and the rest of bits in FCW) leads to delays that

are on the order of the pipeline depth.

Rounding Mode

Many libraries provide the float-to-integer library routines that convert

floating-point values to integer. Many of these libraries conform to

ANSI C coding standards which state that the rounding mode should be

truncation. With the Pentium 4 processor, one can use the

cvttsd2si

and

cvttss2si instructions to convert operands with truncation and

without ever needing to change rounding modes. The cost savings of

using these instructions over the methods below is enough to justify

using Streaming SIMD Extensions and Streaming SIMD Extensions 2

wherever possible when truncation is involved.

For x87 floating point, the

fist instruction uses the rounding mode

represented in the floating-point control word (FCW). The rounding

mode is generally round to nearest, therefore many compiler writers

implement a change in the rounding mode in the processor in order to

conform to the C and FORTRAN standards. This implementation

requires changing the control word on the processor using the

fldcw

instruction. For a change in the rounding, precision, and infinity bits;

use the

fstcw instruction to store the floating-point control word. Then

use the

fldcw instruction to change the rounding mode to truncation.

In a typical code sequence that changes the rounding mode in the FCW,

fstcw instruction is usually followed by a load operation. The load

operation from memory should be a 16-bit operand to prevent store-

forwarding problem. If the load operation on the previously-stored

FCW word involves either an 8-bit or a 32-bit operand, this will cause a

store-forwarding problem due to mismatch of the size of the data

between the store operation and the load operation.

Make sure that the write and read to the FCW are both 16-bit operations,

to avoid store-forwarding problems.

Intel ARCHITECTURE IA-32 - Page 136