EasyManua.ls Logo

IBM Power7 - Data Prefetching Using D-Cache Instructions and the Data Streams Control Register; (Dscr)

IBM Power7
224 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Loading...
46 POWER7 and POWER7+ Optimization and Tuning Guide
-D__STDC_WANT_DEC_FP__: Enables the reference of DFP defined symbols.
-ldfp: Enables the DFP functionality that is provided by recent Linux Enterprise
Distributions or the Advance Toolchain run time.
򐂰 Decimal Floating Point Abstraction Layer (DFPAL), which is a no additional cost,
downloadable library from IBM.
60
Many applications that are using BCD today use a library to perform math functions.
Changing to a native data type can be hard work, after which you might have an issue with
one code set for AIX on POWER6 and one for other platforms that do not support native
DFP. The solution to this problem is DFPAL, which is an alternative to the native support.
DFPAL contains a header file to include in your code and the DFPAL library.
The header file is downloadable from General Decimal Arithmetic at
http://speleotrove.com/decimal/ (search for DFPAL). Download the complete source
code, and compile it on your system.
If you have hardware support for DFP, use the library to access the functions.
If you do not have hardware support (or want to compare the hardware and software
emulation), you can force the use of software emulation by setting a shell variable before
you run your application by running the following command:
export DFPAL_EXE_MODE=DNSW
Determining if your applications are using DFP
There are two AIX commands that are used for monitoring:
򐂰 hpmstat (for monitoring the whole system)
򐂰 hpmcount (for monitoring a single program)
The PM_DFU_FIN (DFU instruction finish) field in the output of the hpmstat and hpmcount
commands verifies that the DFP operations finished.
The -E PM_MRK_DFU_FIN option in the tprof command uses the AIX trace subsystem, which
tells you which functions are using DFP and how often.
For more information about this topic, see 2.4, “Related publications” on page 51.
2.3.7 Data prefetching using d-cache instructions and the Data Streams
Control Register (DSCR)
The hardware data prefetch mechanism reduces the performance impact that is caused by
the latency in retrieving cache lines from higher level caches and from memory. The data
prefetch engine of the processor can recognize sequential data access patterns in addition to
certain non-sequential (stride-N) patterns and initiate prefetching of d-cache lines from L2
and L3 cache and memory into the L1 d-cache to improve the performance of these storage
reference patterns.
The Power ISA architecture also provides cache instructions to supply a hint to prefetch
engines for data prefetching to override the automatic stream detection capability of the data
prefetcher. Cache instructions, such as dcbt and dcbtst, allow applications to specify stream
direction, prefetch depth, and number of units. These instructions can avoid the starting cost
of the automatic stream detection mechanism.
60
Ibid

Table of Contents

Related product manuals