Chapter 2. IBM eX5 technology 29
Memory sparing
Sparing provides a degree of redundancy in the memory subsystem, but not to the extent of
mirroring. In contrast to mirroring, sparing leaves more memory for the operating system. In
sparing mode, the trigger for failover is a preset threshold of correctable errors. Depending on
the type of sparing (DIMM or rank), when this threshold is reached, the content is copied to its
spare. The failed DIMM or rank is then taken offline, with the spare counterpart activated for
use. There are two sparing options:
DIMM sparing
Two unused DIMMs are spared per memory card. These DIMMs must have the same rank
and capacity as the largest DIMMs that we are sparing. The size of the two unused DIMMs
for sparing is subtracted from the usable capacity that is presented to the operating
system. DIMM sparing is applied on all memory cards in the system.
Rank sparing
Two ranks per memory card are configured as spares. The ranks have to be as large as
the rank relative to the highest capacity DIMM that we are sparing. The size of the two
unused ranks for sparing is subtracted from the usable capacity that is presented to the
operating system. Rank sparing is applied on all memory cards in the system.
You configure these options by using the UEFI during start-up.
For more information about system-specific memory sparing installation options, see the
following sections:
IBM System x3850 X5: 3.8.5, “Memory sparing” on page 89
IBM System x3690 X5: 4.8.7, “Memory sparing” on page 143
IBM BladeCenter HX5: 5.10.5, “Memory sparing” on page 202
Chipkill
Chipkill memory technology, an advanced form of error checking and correcting (ECC) from
IBM, is available for the eX5 blade. Chipkill protects the memory in the system from any single
memory chip failure. It also protects against multi-bit errors from any portion of a single
memory chip.
Redundant bit steering
Redundant bit steering (RBS) provides the equivalent of a hot-spare drive in a RAID array. It
is based in the memory controller, and it senses when a chip on a DIMM has failed and when
to route the data around the failed chip.
The eX5 servers do not currently support redundant bit steering, because the integrated
memory controller of the Intel Xeon 6500 and 7500 processors do not support the feature.
However, the MAX5 memory expansion unit supports RBS but only when x4 memory DIMMs
are used. The x8 DIMMs do not support RBS.
RBS is automatically enabled in the MAX5 memory port, if all DIMMs installed to that memory
port are x4 DIMMs.
RBS uses the ECC coding scheme that provides Chipkill coverage for x4 DRAMs. This
coding scheme leaves the equivalent of one x4 DRAM spare in every pair of DIMMs. In the
event that a chip failure on the DIMM is detected by memory scrubbing, the memory controller
can reroute data around that failed chip through these spare bits. DIMMs using x8 DRAM
technology use a separate ECC coding scheme that does not leave spare bits, which is why
RBS is not available on x8 DIMMs.