IBM Power 570 - Page 113

To Next Page

To Previous Page

Chapter 4. Continuous availability and manageability 99

Draft Document for Review September 2, 2008 5:05 pm4405ch04 Continuous availability and manageability.fm

Memory protection

Memory and cache arrays comprise data bit lines that feed into a memory word. A memory

word is addressed by the system as a single element. Depending on the size and

addressability of the memory element, each data bit line may include thousands of individual

bits or memory cells. For example:

򐂰 A single memory module on a Dual Inline Memory Module (DIMM) can have a capacity of

1 Gb, and supply eight bit lines of data for an ECC word. In this case, each bit line in the

ECC word holds 128 Mb behind it, corresponding to more than 128 million memory cell

addresses.

򐂰 A 32 KB L1 cache with a 16-byte memory word, on the other hand, would have only 2 Kb

behind each memory bit line.

A memory protection architecture that provides good error resilience for a relatively small L1

cache might be very inadequate for protecting the much larger system main store. Therefore,

a variety of different protection methods are used in POWER6 processor-based systems to

avoid uncorrectable errors in memory.

Memory protection plans must take into account many factors, including:

򐂰 Size

򐂰 Desired performance

򐂰 Memory array manufacturing characteristics.

POWER6 processor-based systems have a number of protection schemes designed to

prevent, protect, or limit the effect of errors in main memory. These capabilities include:

Hardware scrubbing Hardware scrubbing is a method used to deal with soft errors. IBM

POWER6 processor-based systems periodically address all

memory locations and any memory locations with an ECC error are

rewritten with the correct data.

Error correcting code Error correcting code (ECC) allows a system to detect up to two

errors in a memory word and correct one of them. However, without

additional correction techniques if more than one bit is corrupted, a

system will fail.

Chipkill™ Chipkill is an enhancement to ECC that enables a system to

sustain the failure of an entire DRAM. Chipkill spreads the bit lines

from a DRAM over multiple ECC words, so that a catastrophic

DRAM failure would affect at most one bit in each word. Barring a

future single bit error, the system can continue indefinitely in this

state with no performance degradation until the failed DIMM can be

replaced.

Redundant bit steering IBM systems use redundant bit steering to avoid situations where

multiple single-bit errors align to create a multi-bit error. In the event

that an IBM POWER6 processor-based system detects an

abnormal number of errors on a bit line, it can dynamically steer the

data stored at this bit line into one of a number of spare lines.

Main Page

IBM Power 570 - Page 113

Table of Contents

Related product manuals