Chapter 2. The POWER7 processor 35
Hot locks can be caused by the programmer having lock control access to too large an area of
data, which is known as
coarse-grained locking.
26
In that case, the strategy to effectively deal
with a hot lock is to split the lock into a set of fine-grained locks, such that multiple locks, each
managing a smaller portion of the data than the original lock, now manage the data for which
access is being serialized. Hot locks can also be caused by trying to scale an application to
more cores than the original design intended. In that case, using an even finer grain of locking
might be possible, or changes can be made in data structures or algorithms, such that lock
contention is reduced.
Additionally, the programmer must spend time considering the layout of locks in the cache to
ensure that multiple locks, especially hot locks, are not in the same cache line because any
updates to the lock itself results in the cache line being invalidated on other processor cores.
When possible, locks should be padded so that they are in their own distinct cache line.
For more information about this topic, see 2.4, “Related publications” on page 51.
2.3.3 SMT priorities
POWER5 introduced the capability for the SMT thread priority level for each hardware thread
to be set, controlling the relative priority of the threads within a single core. This relative
difference between the priority of each hardware thread determines the number of decode
cycles each thread receives during a period.
27
Typically, changing the SMT priority level is
done by using a special no-op OR instruction or by using the thread_set_smt_priority
system call in AIX. The result can be boosted performance for the sibling SMT threads on the
same processor core.
Concepts and benefits
The POWER processor architecture uses SMT to provide multiple streams of hardware
execution. POWER7 provides four SMT hardware threads per core and can be configured to
run in SMT4, SMT2, or single-threaded mode (SMT1 mode or, as referred to in this
publication, ST mode) while POWER6 and POWER5 provide two SMT threads per core and
can be run in SMT2 mode or ST mode.
By using multiple SMT threads, a workload can take advantage of more of the hardware
features provided in the POWER processor than if a single SMT thread is used per core. By
configuring the processor core to run in multi-threaded mode, the operating system can
maximize the usage of the hardware capabilities that are provided in the system and the
overall workload throughput by correctly balancing software threads across all of the cores
and SMT hardware threads in the partition.
The Power Architecture provides an SMT Thread Priority mechanism by which the priority
among the SMT threads in the processor core can be adjusted so that an SMT thread can
receive more or less favorable performance (in terms of dispatch cycles) than the other
threads in the same core. This mechanism can be used in various situations, such as to boost
the performance of other threads while the thread with a lowered priority is waiting on a lock,
or when waiting on other cooperative threads to reach a synchronization point.
26
Lock granularity, available at: http://www.read.seas.harvard.edu/~kohler/class/05s-osp/notes/notes8.html
27
thread_set_smt_priority system call, available at:
http://publib.boulder.ibm.com/infocenter/aix/v7r1/index.jsp?topic=/com.ibm.aix.kerneltechref/doc/kte
chrf1/thread_set_smt_priority.htm