Intel ARCHITECTURE IA-32 - Memory Operands

To Next Page

To Previous Page

General Optimization Guidelines 2

2-71

Recommendation: Use the compiler switch to generate SSE2 scalar

floating-point code over x87 code.

When working with scalar SSE/SSE2 code, pay attention to the need for

clearing the content of unused slots in an xmm register and the

associated performance impact. For example, loading data from

memory with movss or movsd causes an extra micro-op for zeroing

the upper part of the xmm register.

On Pentium M, Intel Core Solo and Intel Core Duo processors; this

penalty can be avoided by using movlpd. However, using movlpd

causes performance penalty on Pentium 4 processors.

Another situation occurs when mixing single-precision and

double-precision code. On Pentium 4 processors, using cvtss2sd has

performance penalty relative to the alternative sequence:

xorps xmm1, xmm1

movss xmm1, xmm2

cvtps2pd xmm1, xmm1

On Intel Core Solo and Intel Core Duo processors, using cvtss2sd is

more desirable over the alternative sequence.

Memory Operands

Double-precision floating-point operands that are eight-byte aligned

have better performance than operands that are not eight-byte aligned,

since they are less likely to incur penalties for cache and MOB splits.

Floating-point operation on a memory operands require that the operand

be loaded from memory. This incurs an additional µop, which can have

a minor negative impact on front end bandwidth. Additionally, memory

operands may cause a data cache miss, causing a penalty.

Related product manuals