Chapter 6. Compilers and optimization tools for C, C++, and Fortran 109
The POWER7 processor supports the VSX instruction set, which improves performance for
numerical applications over regular data sets. These performance features can increase the
performance of some computations, and can be accessed manually by using the Altivec
vector extensions, or automatically by the XL compiler by using the -qarch=pwr7 -qhot -O3
-qsimd options.
The GCC compiler equivalents are the -maltivec and -mvsx options, which you should
combine with -ftree-vectorize and -fvect-cost-model. On GCC, the combination of -O3
and -mcpu=power7 implicitly enables Altivec and VSX code generation with auto-vector
(-ftree-vectorize) and -mpopcntd. Other important options include -mrecip=rsqrt and
-mveclibabi=mass (which
require -ffast-math or -Ofast to be effective). If the compiler uses
optimizations dependent on the MASS libraries, the link command must explicitly name the
MASS library directories and library names.
For more information about this topic, see 6.4, “Related publications” on page 123.
6.2 Advanced compiler optimization techniques
This section describes some of the more advanced compiler optimization techniques.
6.2.1 Common prerequisites
Compiler analysis and transformations improve runtime performance by changing the
translation of the program source into assembly code. Changes in these translations might
cause the application to behave differently, possibly even causing it to produce
incorrect results.
Compilers follow rules and assumptions that are part of the programming language to
perform this transformation. If the programmer breaks some of these rules, it is possible for
the application to misbehave, and it might do so only at higher optimization levels, where it is
more difficult for the problem to be diagnosed.
To put this situation into perspective, imagine a C program with three variables: “int a[4], b, c;”.
These variables are normally placed contiguously in memory. If the user runs a statement of
the form
a[5]=0, this statement breaks the language rules, but if variable b is unused, the
statement might overwrite variable b and the program might continue to behave correctly.
However, if, at a higher optimization level, variable b is eliminated, as the compiler determines
it is unused, the incorrect statement might overwrite variable c, triggering a runtime failure.
It is critical, then, to eliminate programming errors as higher optimization is applied. Testing
the application thoroughly without optimization is a good initial step, but it is not required or
sufficient. The application must be tested at the optimization level to be used in production.