108 POWER7 and POWER7+ Optimization and Tuning Guide
6.1 Compiler versions and optimization levels
The IBM XL compilers are updated periodically to improve application performance and add
processor-specific tuning and capabilities. The XLC11/XLF13 compilers for AIX and Linux are
the first versions to include the capabilities of POWER7, and are the preferred version for
projects that target current generation systems. The newer XLC12/XLF14 compilers provide
performance improvements, and are preferred for template-heavy C++ codes.
The enterprise Linux distributions (RHEL6.1 GCC- 4.4 and SLES11/SP1 GCC- 4.3) include
GCC compilers with POWER7 enabled (using the -mcpu and -mtune options), but do not have
the latest Higher Order Optimizations. For the GNU GCC, G++ and gfortran compilers on
Linux, the IBM Advance Toolchain 4.0 (GCC- 4.5) and 5.0 (GCC- 4.6) versions contain
releases that are preferred for POWER7. XLF is preferred over gfortran for its high floating
point performance characteristics.
For all production codes, it is imperative to enable a minimum level of compiler optimization by
adding the -O option for the XL compilers, or -O2 with the GNU compilers (-O3 is the preferred
option). Without optimization, the focus of the compiler is on faster compilation and debug
ability, and it generates code that performs poorly at run time. In practice, many projects set
up a dual build environment, with a development build without optimization for use during
development and debugging, and a production build with optimization to be used for
performance verification and production delivery.
For projects with increased focus on runtime performance, you should take advantage of the
more advanced compiler optimization. For numerical or compute-intensive codes, the XL
compiler options -O3 or -qhot -O3 enable loop transformations, which improve program
performance by restructuring loops to make their execution more efficient by the target
system. These options perform aggressive transformations that can sometimes cause minor
differences on precision of floating point computations. If that is a concern, the original
program semantics can be fully recovered with the -qstrict option.
For GCC, the minimum suggested level of optimization is -O3. The GCC default is a strict
mode, but the -ffast-math option disables strict mode. The -Ofast option combines -O3 with
-ffast-math in a single option. Other important options include -fpeel-loops,
-funroll-loops, -ftree-vectorize, -fvect-cost-model, and -mcmodel=medium.
By default, these compilers generate code that run on various Power Systems. Options
should be added to exclude older processor chips that are not supported by the target
application. This configuration might enable better code generation as the compiler takes
advantage of capabilities not available on those older systems.
There are two major XL compiler options to control this support:
-qarch: Indicates the oldest processor chip generation that the binary file supports.
-qtune: Indicates the processor chip generation of most interest for performance.
For example, for an application that must run on POWER6 systems, but for which most users
are on a POWER7 system, the appropriate combination is -qarch=pwr6 -qtune=pwr7. For an
application that must run well across both POWER6 and POWER7 Systems in current
common usage, consider using -qtune=balanced.
On GCC, the equivalent options are -mcpu and -mtune. So, for an application that must run on
POWER6, but which is usually run on POWER7, the options are -mcpu=power6
and -mtune=power7.