Intel ARCHITECTURE IA-32 - UserSource Coding Rules

To Next Page

To Previous Page

General Optimization Guidelines 2

2-97

User/Source Coding Rules

User/Source Coding Rule 1. (M impact, L generality) If an indirect branch

has two or more common taken targets, and at least one of those targets are

correlated with branch history leading up to the branch, then convert the

indirect branch into a tree where one or more indirect branches are preceded

by conditional branches to those targets. Apply this “peeling” procedure to the

common target of an indirect branch that correlates to branch history. 2-24

User/Source Coding Rule 2. (H impact, M generality) Pad data structures

defined in the source code so that every data element is aligned to a natural

operand size address boundary. If the operands are packed in a SIMD

instruction, align to the packed element size (64- or 128-bit). 2-39

User/Source Coding Rule 3. (M impact, L generality) Beware of false

sharing within a cache line (64 bytes) for both Pentium 4, Intel Xeon, and

Pentium M processors; and within a sector of 128 bytes on Pentium 4 and Intel

Xeon processors. 2-42

User/Source Coding Rule 4. (H impact, ML generality) Consider using a

special memory allocation library to avoid aliasing. 2-46

User/Source Coding Rule 5. (M impact, M generality) When padding

variable declarations to avoid aliasing, the greatest benefit comes from

avoiding aliasing on second-level cache lines, suggesting an offset of 128 bytes

or more. 2-46

User/Source Coding Rule 6. (H impact, H generality) Optimization

techniques such as blocking, loop interchange, loop skewing and packing are

best done by the compiler. Optimize data structures to either fit in one-half of

the first-level cache or in the second-level cache; turn on loop optimizations

in the compiler to enhance locality for nested loops. 2-52

User/Source Coding Rule 7. (M impact, ML generality) If there is a blend

of reads and writes on the bus, changing the code to separate these bus

transactions into read phases and write phases can help performance. Note,

however, that the order of read and write operations on the bus are not the

same as they appear in the program. 2-52

Related product manuals