IBM Power 780 - Chapter 4. Continuous Availability and Manageability

To Next Page

To Previous Page

Chapter 4. Continuous availability and

manageability

This chapter provides information about IBM reliability, availability, and serviceability (RAS)

design and features. This set of technologies, implemented on IBM Power Systems servers,

provides the possibility to improve your architecture’s total cost of ownership (TCO) by

reducing unplanned down time.

RAS can be described as follows:

򐂰 Reliability: Indicates how infrequently a defect or fault in a server manifests itself.

򐂰 Availability: Indicates how infrequently the functionality of a system or application is

affected by a fault or defect.

򐂰 Serviceability: Indicates how well faults and their effect are communicated to users and

services, and how efficiently and nondisruptively the faults are repaired.

Each successive generation of IBM servers is designed to be more reliable than the previous

server family. POWER7 and POWER7+ processor-based servers have features to support

new levels of virtualization, help ease administrative burden, and increase system utilization.

Reliability starts with components, devices, and subsystems designed to be fault-tolerant.

POWER7 and POWER7+ uses lower voltage technology, improving reliability with stacked

latches to reduce soft error (SER) susceptibility. During the design and development process,

subsystems go through rigorous verification and integration testing processes. During system

manufacturing, systems go through a thorough testing process to help ensure high product

quality levels.

The processor and memory subsystem contain a number of features designed to avoid or

correct environmentally induced, single-bit, intermittent failures, and also handle solid faults in

components, including selective redundancy to tolerate certain faults without requiring an

outage or parts replacement.

Related product manuals