IBM Power7 - Page 119

To Next Page

To Previous Page

Chapter 5. Linux 103

Malloc tuning parameters

The default malloc implementation provides a mallopt() API to allow applications to adjust

some tuning parameters. For some applications, it might be useful to adjust the

MMAP_THRESHOLD, TOP_PAD, and MMAP_MAX limits. Increasing MMAP_THRESHOLD so that most

(application) allocations fall below that threshold reduces syscall and page fault impact, and

improves application start time. However, this situation can increase fragmentation within the

arenas and sbrk() storage. Fragmentation can be mitigated to some extent by also

increasing TOP_PAD, which is the extra memory that is allocated for each sbrk().

Reducing MMAP_MAX, which is the maximum number of chunks to allocate with mmap(), can

also limit the use of mmap() when MMAP_MAX is set to 0. Reducing MMAP_MAX does not always

solve the problem. The run time reverts to mmap() allocations if sbrk() storage, which is the

gap between the end of program static data (bss) and the first shared library, is exhausted.

Linux malloc and memory tools

There are several readily available tools in the Linux open source community:

򐂰 A website that describes the heap profiler that is used at Google to explore how C++

programs manage memory, found at:

http://gperftools.googlecode.com/svn/trunk/doc/heapprofile.html

򐂰 Massif: a heap profiler, available at:

http://valgrind.org/docs/manual/ms-manual.html

For more details about memory management tools, see “Empirical performance analysis

using the IBM SDK for PowerLinux” on page 172.

For more information about tuning malloc parameters, see Malloc Tunable Parameters,

available at:

http://www.gnu.org/software/libtool/manual/libc/Malloc-Tunable-Parameters.html

Thread-caching malloc (TCMalloc)

Under some circumstances, an alternative malloc implementation can prove beneficial for

improving application performance. Packaged as part of Google's Perftools package

(http://code.google.com/p/gperftools/?redir=1), and in the Advance Toolchain 5.0.4

release, this specialized malloc implementation can improve performance across a number of

C and C++ applications.

TCMalloc uses a thread-local cache for each thread and moves objects from the memory

heap into the local cache as needed. Small objects with less than 32 KB are mapped into

allocatable size-classes. A thread cache contains a singly linked list of free objects per

size-class. Large objects are rounded up to a page size (4 KB) and handled by a central page

heap, which is an array of linked lists.

For more information about how TCMalloc works, see TCMalloc: Thread-Caching Malloc,

available at:

http://gperftools.googlecode.com/svn/trunk/doc/tcmalloc.html

The TCMalloc implementation is part of the gperftools project. For more information about

this topic, go to:

http://code.google.com/p/gperftools/

Related product manuals