Theory
of
Operation
2-19
Error Detection and Correction
The Memory Controller detects and corrects single-bit errors
in
data read from
RAM.
It
also detects
double-bit errors. Each 32-bit word written to memory has a 7 -bit Hamming code attached to
it.
When the data word
is
read, a single-bit or double-bit error
is
detected.
Any single-bit error that
is
detected
is
automatically corrected before the word
is
transferred to the
MPB.
No additional time
is
required for the correction.
At
the same time that a single-bit error
is
detected
and
corrected, the corrected data word
is
written into a healer location. Subsequent
accesses of that location are to the healer location rather than to the memory location.
A double-bit error
is
detected but not corrected. A double-bit error causes the system to halt,
preventing continuation of a program with
bad
data.
Healing
is
described
in
the following paragraphs.
Memory Healing
Each Memory Controller contains
32
words each of
CAM
and
RAM
which are used to replace
up
to
32 failed memory words (Figure 2-11). The physical address of a failed word
is
put
in
the healer
CAM,
the corrected data
is
placed
in
the corresponding healer
RAM
location,
and
bit 0 in the
CAM
is
set to enable the address to match. Subsequent accesses to this physical address are to the healer
RAM
instead of the failed memory, causing the healer word to replace the failed word. Healing in
no way affects the timing of the memory operation. Except for the healer, no record of healing
is
stored
in
memory.
All
mapped addresses are sent to the healer
CAM
which compares each mapped address to
all
addresses of failed locations. Healer
CAM
addresses of failed locations have bit 0 set.
When
all
32
locations of the healer are
full,
no further healing or automatic changes to the healer
CAM
occur
on
that Memory Controller. When
all
healer locations have been used, the healer
overflow bit
in
the
MC
status register
is
set.
When
all
healer locations are
full,
the operating system checks each healed location to determine
if
the location
is
still
faulty. If the location fault was caused by a momentary failure, the overflowed
CAM
can be cleared
and
reused by the operating system.