IBM Power7 - Page 117

To Next Page

To Previous Page

Chapter 5. Linux 101

By simply running some extra builds, your myapp1.0 is fully optimized for the current and

N-1/N-2 Power hardware releases. When you start your application with the appropriate

LD_LIBRARY_PATH (including /opt/ibm/myapp1.0/lib64), the dynamic linker automatically

searches the subdirectories under the library path for names that match the current platform

(POWER5, POWER6, or POWER7). If the dynamics linker finds the shared library in the

subdirectory with the matching platform name, it loads that version; otherwise, the dynamic

linker looks in the base lib64 directory and use the default implementation. This process

continues for all directories in the library path and recursively for any dependent libraries.

Using the Advance Toolchain

The latest Advance Toolchain compilers and run time can be downloaded from:

http://linuxpatch.ncsa.uiuc.edu/toolchain/at/

The latest Advance Toolchain releases (starting with Advance Toolchain 5.0) add multi-core

runtime libraries to enable you to take advantage of application level multi-cores. The

toolchain currently includes a Power port of the open source version of Intel Thread Building

Blocks, the Concurrent Building Blocks software transactional memory library, and the

UserRCU library (the application level version of the Linux kernel’s Read-Copy-Update

concurrent programming technique). Additional libraries are added to the Advance Toolchain

run time as needed and if resources allow it.

Linux on Power Enterprise Distributions default to 64 KB pages, so most applications

automatically benefit from large pages. Larger (16 MB) segments can be best used with the

libhugetlbfs API. Large segments can be used to back shared memory, malloc storage, and

(main) program text and data segments (incorporating large pages for shared library text or

data is not supported currently).

Tuning and optimizing malloc

Methods for tuning and optimizing malloc are described in this section.

Linux malloc

Generally, tuning malloc invocations on Linux systems is an application-specific focus.

Improving malloc performance

Linux is flexible regarding the system and application tuning of malloc usage.

By default, Linux manages malloc memory to balance the ability to reuse the memory pool

against the range of default sizes of memory allocation requests. Small chunks of memory

are managed on the sbrk heap. This sbrk heap is labeled as [heap] in /proc/self/maps.

When you work with Linux memory allocation, there are a number of tunables available to

users. These tunables are coded and used in the Linux malloc.c program. Our examples

(“Malloc environment variables” on page 101 and “Linux malloc considerations” on page 102)

show two of the key tunables, which force the large sized memory allocations away from

using mmap, to using the memory on the program stack by using the sbrk system directive.

When you control memory for applications, the Linux operating system automatically makes a

choice between using the stack for mallocs with the sbrk command, or mmap regions. Mmap

regions are typically used for larger memory chunks. When you use mmap for large mallocs,

the kernel must zero the newly mmapped chunk of memory.

Malloc environment variables

Users can define environment variables to control the tunables for a program. The

environment variables that are shown in the following examples caused a significant

performance improvement across several real-life workloads.

Main Page

IBM Power7 - Page 117

Table of Contents

Related product manuals