IBM Power 720 - Page 176

To Next Page

To Previous Page

162 IBM Power 720 and 740 Technical Overview and Introduction

The service processor provides the following features:

򐂰 Placing calls

On systems without a Hardware Management Console, the service processor can place

calls to report surveillance failures with the POWER Hypervisor, critical environmental

faults, and critical processing faults even when the main processing unit is inoperable.

򐂰 Mutual surveillance

The service processor monitors the operation of the firmware during the boot process, and

also monitors the hypervisor for termination. The hypervisor monitors the service

processor and will perform a reset/reload operation if it detects the loss of the service

processor. If the reset/reload operation does not correct the problem with the service

processor, the hypervisor will notify the operating system and the operating system can

take appropriate action, including calling for service.

򐂰 Availability

The POWER7+ family of systems continues to offer and introduce significant

enhancements designed to increase system availability.

As in POWER6, POWER6+, and POWER7, the POWER7+ processor has the ability to do

processor instruction retry and alternate processor recovery for several core-related faults.

This significantly reduces exposure to both hard (logic) and soft (transient) errors in the

processor core. Soft failures in the processor core are transient (intermittent) errors, often

because of cosmic rays or other sources of radiation, and generally are not repeatable.

When an error is encountered in the core, the POWER7+ processor will first automatically

retry the instruction. If the source of the error was truly transient, the instruction will

succeed and the system will continue as before. On IBM systems prior to POWER6, this

error would have caused a checkstop.

Hard failures are more difficult, being true logical errors that will be replicated each time

the instruction is repeated. Retrying the instruction will not help in this situation. As in

POWER6, POWER6+, and POWER7, all POWER7+ processors have the ability to extract

the failing instruction from the faulty core and retry it elsewhere in the system for several

faults, after which the failing core is dynamically deconfigured and called out for

replacement. These systems are designed to avoid a full system outage.

򐂰 Uncorrectable error recovery

The auto-restart (reboot) option, when enabled, can reboot the system automatically

following an unrecoverable firmware error, firmware hang, hardware failure, or

environmentally induced (AC power) failure.

The auto-restart (reboot) option must be enabled from the Advanced System

Management Interface (ASMI) or from the Control (Operator) Panel.

Related product manuals