EasyManua.ls Logo

Intel ARCHITECTURE IA-32 - Optimize Branch Predictability; Optimize Memory Access

Intel ARCHITECTURE IA-32
568 pages
Print Icon
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Loading...
General Optimization Guidelines 2
2-5
Optimize Branch Predictability
Improve branch predictability and optimize instruction prefetching
by arranging code to be consistent with the static branch prediction
assumption: backward taken and forward not taken.
Avoid mixing near calls, far calls and returns.
Avoid implementing a call by pushing the return address and
jumping to the target. The hardware can pair up call and return
instructions to enhance predictability.
Use the pause instruction in spin-wait loops.
Inline functions according to coding recommendations.
Whenever possible, eliminate branches.
Avoid indirect calls.
Optimize Memory Access
Observe store-forwarding constraints.
Ensure proper data alignment to prevent data split across cache line.
boundary. This includes stack and passing parameters.
Avoid mixing code and data (self-modifying code).
Choose data types carefully (see next bullet below) and avoid type
casting.
Employ data structure layout optimization to ensure efficient use of
64-byte cache line size.
Favor parallel data access to mask latency over data accesses with
dependency that expose latency.
For cache-miss data traffic, favor smaller cache-miss strides to
avoid frequent DTLB misses.
Use prefetching appropriately.
Use the following techniques to enhance locality: blocking,
hardware-friendly tiling, loop interchange, loop skewing.

Table of Contents

Related product manuals