EasyManua.ls Logo

IBM Power7 - Page 119

IBM Power7
224 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Loading...
Chapter 5. Linux 103
Malloc tuning parameters
The default malloc implementation provides a mallopt() API to allow applications to adjust
some tuning parameters. For some applications, it might be useful to adjust the
MMAP_THRESHOLD, TOP_PAD, and MMAP_MAX limits. Increasing MMAP_THRESHOLD so that most
(application) allocations fall below that threshold reduces syscall and page fault impact, and
improves application start time. However, this situation can increase fragmentation within the
arenas and sbrk() storage. Fragmentation can be mitigated to some extent by also
increasing TOP_PAD, which is the extra memory that is allocated for each sbrk().
Reducing MMAP_MAX, which is the maximum number of chunks to allocate with mmap(), can
also limit the use of mmap() when MMAP_MAX is set to 0. Reducing MMAP_MAX does not always
solve the problem. The run time reverts to mmap() allocations if sbrk() storage, which is the
gap between the end of program static data (bss) and the first shared library, is exhausted.
Linux malloc and memory tools
There are several readily available tools in the Linux open source community:
򐂰 A website that describes the heap profiler that is used at Google to explore how C++
programs manage memory, found at:
http://gperftools.googlecode.com/svn/trunk/doc/heapprofile.html
򐂰 Massif: a heap profiler, available at:
http://valgrind.org/docs/manual/ms-manual.html
For more details about memory management tools, see “Empirical performance analysis
using the IBM SDK for PowerLinux” on page 172.
For more information about tuning malloc parameters, see Malloc Tunable Parameters,
available at:
http://www.gnu.org/software/libtool/manual/libc/Malloc-Tunable-Parameters.html
Thread-caching malloc (TCMalloc)
Under some circumstances, an alternative malloc implementation can prove beneficial for
improving application performance. Packaged as part of Google's Perftools package
(http://code.google.com/p/gperftools/?redir=1), and in the Advance Toolchain 5.0.4
release, this specialized malloc implementation can improve performance across a number of
C and C++ applications.
TCMalloc uses a thread-local cache for each thread and moves objects from the memory
heap into the local cache as needed. Small objects with less than 32 KB are mapped into
allocatable size-classes. A thread cache contains a singly linked list of free objects per
size-class. Large objects are rounded up to a page size (4 KB) and handled by a central page
heap, which is an array of linked lists.
For more information about how TCMalloc works, see TCMalloc: Thread-Caching Malloc,
available at:
http://gperftools.googlecode.com/svn/trunk/doc/tcmalloc.html
The TCMalloc implementation is part of the gperftools project. For more information about
this topic, go to:
http://code.google.com/p/gperftools/

Table of Contents

Related product manuals