Table 2-12 RAS features
Mod
ule
Feature Description
CPU Corrected machine
check interrupt (CMCI)
Corrects error-triggered interrupts.
DIM
M
Failed DIMM isolation Identifies a faulty dual in-line memory module
(DIMM), and isolates it from others before replacing
the DIMM.
Memory thermal
throttling
Automatically adjusts DIMM temperatures to avoid
damage due to overheating.
Rank sparing Allocates memory ranks as backup ranks to prevent the
system from crashing due to uncorrectable errors.
Memory address parity
protection
Detects memory command and address errors.
Memory demand and
patrol scrubbing
Provides the memory patrol function for promptly
correcting correctable errors upon detection. If these
errors are not corrected promptly, uncorrectable errors
may occur.
Memory mirroring Improves system reliability.
Single device data
correction (SDDC)
Provides a single-device, multi-bit error correction
capability to improve memory reliability.
Device tagging Degrades and rectifies DIMM device faults to improve
DIMM availability.
Data scrambling Optimizes data stream distribution and reduces the
error possibility to improve the reliability of data
streams in the memory and the capability to detect
address errors.
PCIe PCIe advanced error
reporting
Improves server serviceability.
QPI Intel QPI link level retry Provides a retry mechanism upon encountering errors
to improve QPI reliability.
Intel QPI protocol
protection via CRC
Provides cyclic redundancy check (CRC) protection
for QPI packets to improve system reliability.
OS Core disable for fault
resilient boot (FRB)
Isolates a faulty CPU during startup to improve system
reliability and availability.
Corrupt data
containment mode
Identifies the memory storage unit that contains
corrupted data to minimize the impact on running
programs and improve system reliability.
Socket disable for FRB Isolates a faulty socket during startup to improve
system reliability.
RH2288 V3 Server
User Guide
2 Overview
Issue 32 (2019-03-28) Copyright © Huawei Technologies Co., Ltd. 53