EasyManua.ls Logo

IBM Power7 - Hpmstat, Hpmcount, and Tprof -E; Linux

IBM Power7
224 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Loading...
Appendix B. Performance tooling and empirical performance analysis 171
For more information, see emstat Command, available at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.cmd
s/doc/aixcmds2/emstat.htm
hpmstat, hpmcount, and tprof -E
The POWER7 processor provides a powerful on-chip PMU that can be used to count the
number of occurrences of performance-critical processor events. A rich set of events is
countable; examples include level 2 and level 3 d-cache misses, and cache reloads from
local, remote, and distant memory.
Local memory is memory that is attached to the same
POWER7 processor chip that the software thread is running on.
Remote memory is memory
that is attached to a different POWER7 processor that is in the same CEC (that is, the same
node or building block in the case of a multi-CEC system, such as a Power 780) that the
software thread is running on.
Distant memory is memory that is attached to a POWER7
processor that is in a different CEC from the CEC the software thread is running on.
Two commands exist to count PMU events: hpmcount and hpmstat. The hpmcount command is
a command-line utility that runs a command and collects statistics from the PMU while the
command runs. The hpmstat command is similar to hpmcount, except that it collects
performance data on a system-wide basis, rather than just for the execution of a command.
Further documentation about hpmcount and hpmstat can be found at:
򐂰 http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.cm
ds/doc/aixcmds2/hpmcount.htm
򐂰 http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.cm
ds/doc/aixcmds2/hpmstat.htm
In addition to simply counting processor events, the PMU can be configured to sample
instructions based on processor events. With this capability, profiles can be generated that
show which parts of an application are experiencing specified processor events. For example,
you can show which subroutines of an application are generating level 2 or level 3 cache
misses. The tprof profiler includes this functionality through the -E flag, which allows a PMU
event name to be provided to tprof as the sampled event. The list of PMU events can be
generated by running pmlist -c -1. Whenever possible, perform profiling using
marked
events, as profiling using marked events is more accurate than using unmarked events. The
marked events begin with the prefix PM_MRK_.
For more information about using the -E flag of tprof, go to:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.cmd
s/doc/aixcmds5/tprof.htm
Linux
The section introduces tools and techniques used for optimizing software on the combination
of Power Systems and Linux. The intended audience for this section is software
development teams.

Table of Contents

Related product manuals