© Copyright IBM Corp. 2013. All rights reserved. 159
Chapter 4. Continuous availability and
manageability
This chapter provides information about IBM reliability, availability, and serviceability (RAS)
design and features. This set of technologies, implemented on IBM Power Systems servers,
provides the possibility to improve your architecture’s total cost of ownership (TCO) by
reducing unplanned down time.
RAS can be described as follows:
Reliability: Indicates how infrequently a defect or fault in a server manifests itself.
Availability: Indicates how infrequently the functionality of a system or application is
affected by a fault or defect.
Serviceability: Indicates how well faults and their effect are communicated to users and
services, and how efficiently and nondisruptively the faults are repaired.
Each successive generation of IBM servers is designed to be more reliable than the previous
server family. POWER7 and POWER7+ processor-based servers have features to support
new levels of virtualization, help ease administrative burden, and increase system utilization.
Reliability starts with components, devices, and subsystems designed to be fault-tolerant.
POWER7 and POWER7+ uses lower voltage technology, improving reliability with stacked
latches to reduce soft error (SER) susceptibility. During the design and development process,
subsystems go through rigorous verification and integration testing processes. During system
manufacturing, systems go through a thorough testing process to help ensure high product
quality levels.
The processor and memory subsystem contain a number of features designed to avoid or
correct environmentally induced, single-bit, intermittent failures, and also handle solid faults in
components, including selective redundancy to tolerate certain faults without requiring an
outage or parts replacement.