EasyManua.ls Logo

IBM Power7 - Page 117

IBM Power7
224 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Loading...
Chapter 5. Linux 101
By simply running some extra builds, your myapp1.0 is fully optimized for the current and
N-1/N-2 Power hardware releases. When you start your application with the appropriate
LD_LIBRARY_PATH (including /opt/ibm/myapp1.0/lib64), the dynamic linker automatically
searches the subdirectories under the library path for names that match the current platform
(POWER5, POWER6, or POWER7). If the dynamics linker finds the shared library in the
subdirectory with the matching platform name, it loads that version; otherwise, the dynamic
linker looks in the base lib64 directory and use the default implementation. This process
continues for all directories in the library path and recursively for any dependent libraries.
Using the Advance Toolchain
The latest Advance Toolchain compilers and run time can be downloaded from:
http://linuxpatch.ncsa.uiuc.edu/toolchain/at/
The latest Advance Toolchain releases (starting with Advance Toolchain 5.0) add multi-core
runtime libraries to enable you to take advantage of application level multi-cores. The
toolchain currently includes a Power port of the open source version of Intel Thread Building
Blocks, the Concurrent Building Blocks software transactional memory library, and the
UserRCU library (the application level version of the Linux kernel’s Read-Copy-Update
concurrent programming technique). Additional libraries are added to the Advance Toolchain
run time as needed and if resources allow it.
Linux on Power Enterprise Distributions default to 64 KB pages, so most applications
automatically benefit from large pages. Larger (16 MB) segments can be best used with the
libhugetlbfs API. Large segments can be used to back shared memory, malloc storage, and
(main) program text and data segments (incorporating large pages for shared library text or
data is not supported currently).
Tuning and optimizing malloc
Methods for tuning and optimizing malloc are described in this section.
Linux malloc
Generally, tuning malloc invocations on Linux systems is an application-specific focus.
Improving malloc performance
Linux is flexible regarding the system and application tuning of malloc usage.
By default, Linux manages malloc memory to balance the ability to reuse the memory pool
against the range of default sizes of memory allocation requests. Small chunks of memory
are managed on the sbrk heap. This sbrk heap is labeled as [heap] in /proc/self/maps.
When you work with Linux memory allocation, there are a number of tunables available to
users. These tunables are coded and used in the Linux malloc.c program. Our examples
(“Malloc environment variables” on page 101 and “Linux malloc considerations” on page 102)
show two of the key tunables, which force the large sized memory allocations away from
using mmap, to using the memory on the program stack by using the sbrk system directive.
When you control memory for applications, the Linux operating system automatically makes a
choice between using the stack for mallocs with the sbrk command, or mmap regions. Mmap
regions are typically used for larger memory chunks. When you use mmap for large mallocs,
the kernel must zero the newly mmapped chunk of memory.
Malloc environment variables
Users can define environment variables to control the tunables for a program. The
environment variables that are shown in the following examples caused a significant
performance improvement across several real-life workloads.

Table of Contents

Related product manuals