Intel ARCHITECTURE IA-32 - Optimize Branch Predictability; Optimize Memory Access

To Next Page

To Previous Page

General Optimization Guidelines 2

2-5

Optimize Branch Predictability

• Improve branch predictability and optimize instruction prefetching

by arranging code to be consistent with the static branch prediction

assumption: backward taken and forward not taken.

• Avoid mixing near calls, far calls and returns.

• Avoid implementing a call by pushing the return address and

jumping to the target. The hardware can pair up call and return

instructions to enhance predictability.

• Use the pause instruction in spin-wait loops.

• Inline functions according to coding recommendations.

• Whenever possible, eliminate branches.

• Avoid indirect calls.

Optimize Memory Access

• Observe store-forwarding constraints.

• Ensure proper data alignment to prevent data split across cache line.

boundary. This includes stack and passing parameters.

• Avoid mixing code and data (self-modifying code).

• Choose data types carefully (see next bullet below) and avoid type

casting.

• Employ data structure layout optimization to ensure efficient use of

64-byte cache line size.

• Favor parallel data access to mask latency over data accesses with

dependency that expose latency.

• For cache-miss data traffic, favor smaller cache-miss strides to

avoid frequent DTLB misses.

• Use prefetching appropriately.

• Use the following techniques to enhance locality: blocking,

hardware-friendly tiling, loop interchange, loop skewing.

Related product manuals