EasyManua.ls Logo

IBM Power 570 User Manual

IBM Power 570
142 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #116 background imageLoading...
Page #116 background image
4405ch04 Continuous availability and manageability.fmDraft Document for Review September 2, 2008 5:05 pm
102 IBM Power 570 Technical Overview and Introduction
In cases where the data cannot be recovered from another source, a technique called Special
Uncorrectable Error (SUE) handling is used to determine whether the corruption is truly a
threat to the system. If, as may sometimes be the case, the data is never actually used but is
simply over-written, then the error condition can safely be voided and the system will continue
to operate normally.
When an uncorrectable error is detected, the system modifies the associated ECC word,
thereby signaling to the rest of the system that the “standard” ECC is no longer valid. The
Service Processor is then notified, and takes appropriate actions. When running AIX V5.2 or
greater or Linux
1
and a process attempts to use the data, the OS is informed of the error and
terminates only the specific user program.
It is only in the case where the corrupt data is used by the POWER Hypervisor that the entire
system must be rebooted, thereby preserving overall system integrity.
Depending upon system configuration and source of the data, errors encountered during I/O
operations may not result in a machine check. Instead, the incorrect data is handled by the
processor host bridge (PHB) chip. When the PHB chip detects a problem it rejects the data,
preventing data being written to the I/O device. The PHB then enters a freeze mode halting
normal operations. Depending on the model and type of I/O being used, the freeze may
include the entire PHB chip, or simply a single bridge. This results in the loss of all I/O
operations that use the frozen hardware until a power-on reset of the PHB. The impact to
partition(s) depends on how the I/O is configured for redundancy. In a server configured for
fail-over availability, redundant adapters spanning multiple PHB chips could enable the
system to recover transparently, without partition loss.
4.2.3 Cache protection mechanisms
POWER6 processor-based systems are designed with cache protection mechanisms,
including cache line delete in both L2 and L3 arrays, Processor Instruction Retry and
Alternate Processor Recovery protection on L1-I and L1-D, and redundant “Repair” bits in
L1-I, L1-D, and L2 caches, as well as L2 and L3 directories.
L1 instruction and data array protection
The POWER6 processor’s instruction and data caches are protected against temporary
errors using the POWER6 Processor Instruction Retry feature and against solid failures by
Alternate Processor Recovery, both mentioned earlier. In addition, faults in the SLB array are
recoverable by the POWER Hypervisor.
L2 Array Protection
On a POWER6 processor-based system, the L2 cache is protected by ECC, which provides
single-bit error correction and double-bit error detection. Single-bit errors are corrected before
forwarding to the processor, and subsequently written back to L2. Like the other data caches
and main memory, uncorrectable errors are handled during run-time by the Special
Uncorrectable Error handling mechanism. Correctable cache errors are logged and if the
error reaches a threshold, a Dynamic Processor Deallocation event is initiated.
Starting with POWER6 processor-based systems, the L2 cache is further protected by
incorporating a dynamic cache line delete algorithm similar to the feature used in the L3
cache. Up to six L2 cache lines may be automatically deleted. It is not likely that deletion of a
few cache lines will adversely affect server performance. When six cache lines have been
repaired, the L2 is marked for persistent deconfiguration on subsequent system reboots until
it can be replaced.
1
SLES 10 SP1 or later, and in RHEL 4.5 or later (including RHEL 5.1).

Table of Contents

Question and Answer IconNeed help?

Do you have a question about the IBM Power 570 and is the answer not in the manual?

IBM Power 570 Specifications

General IconGeneral
BrandIBM
ModelPower 570
CategoryServer
LanguageEnglish

Summary

Chapter 1. General description

1.1 System specifications

Lists general system specifications including operating temperature, humidity, noise, and altitude.

1.2 Physical package

Details the physical attributes and dimensions of the CEC drawer building blocks.

1.3 System features

Outlines key features like core configurations, memory capacity, and disk drive support.

1.3.1 Processor card features

Describes processor card types, frequencies, cache, and Capacity on Demand (CoD) options.

1.3.2 Memory features

Details memory feature codes, capacities, frequencies, and population rules.

1.3.4 I/O drawers

Explains the types of I/O drawers, their slots, and connectivity options.

1.4 System racks

Covers rack compatibility, features, and installation considerations for the system.

1.4.1 IBM 7014 Model T00 rack

Describes the features and specifications of the 1.8-meter IBM 7014 Model T00 rack.

1.4.4 Intelligent Power Distribution Unit (iPDU)

Details the characteristics and function of the Intelligent Power Distribution Unit.

Chapter 2. Architecture and technical overview

2.1 The POWER6 processor

Explains the POWER6 processor's enhancements, core architecture, and advanced features.

2.1.1 Decimal floating point

Details the decimal floating-point processor's support for data types and instructions.

2.3 Processor cards

Describes the POWER6 processor cards, their layout, and memory interfaces.

2.4 Memory subsystem

Covers the memory controller, DIMM slots, and memory architecture.

2.4.1 Fully buffered DIMM

Explains the fully buffered DIMM technology for enhanced memory performance.

2.7 Integrated Virtual Ethernet adapter

Details the IVE adapter, its features, ports, and system integration.

2.8 PCI adapters

Discusses PCI and PCIe adapter types, slots, and general support.

2.8.1 LAN adapters

Lists available LAN adapters for connecting to a local area network.

2.8.3 iSCSI

Explains the iSCSI protocol for storage transport over IP networks.

2.9 Internal storage

Covers the internal disk subsystem using SAS interface and DASD backplane.

2.10 External I/O subsystems

Describes external I/O drawers like 7311-D11, 7311-D20, and 7314-G30.

2.10.1 7311 Model D11 I/O drawers

Details the 7311 Model D11 I/O drawer's features and slot configurations.

2.12 Hardware Management Console

Explains the HMC's role in managing system tasks and partitions.

Chapter 3. Virtualization

3.1 POWER Hypervisor

Introduces the POWER Hypervisor as a core component for system virtualization.

Virtual SCSI

Describes the virtual SCSI mechanism for storage virtualization using VIO Server.

Virtual Ethernet

Explains the virtual Ethernet switch function for secure inter-partition communication.

3.2 Logical partitioning

Discusses LPARs and virtualization for resource utilization and configuration.

3.2.2 Micro-Partitioning

Details Micro-Partitioning for allocating processor fractions to logical partitions.

3.3 PowerVM

Covers the PowerVM platform for industry-leading virtualization.

3.3.1 PowerVM editions

Outlines the functional elements of PowerVM Standard and Enterprise editions.

3.3.2 Virtual I/O Server

Explains the VIO Server's role in sharing physical resources among logical partitions.

3.3.4 PowerVM Live Partition Mobility

Describes moving running logical partitions between systems without disruption.

3.4 System Planning Tool

Explains the SPT for designing system configurations and planning partitions.

Chapter 4. Continuous availability and manageability

4.1 Reliability

Discusses the design principles for achieving high system reliability.

4.1.1 Designed for reliability

Covers design choices that reduce failure opportunities and improve reliability.

4.2 Availability

Details features that prevent unexpected application loss due to outages.

4.2.1 Detecting and deallocating failing components

Explains monitoring and deconfiguring faulty hardware to avoid system outages.

4.3 Serviceability

Outlines the strategy for efficient system service and repair.

4.3.1 Detecting errors

Covers the critical ability to accurately detect system errors.

4.3.2 Diagnosing problems

Explains how systems perform self-diagnosis using hardware and OS logic.

4.3.5 Locating and repairing the problem

Details methods for quickly identifying and replacing service parts.

4.5 Manageability

Covers functions and tools for efficient system management.

4.5.1 Service processor

Describes the service processor's role in monitoring, managing, and error detection.

4.5.6 IBM System p firmware maintenance

Explains the process of managing and installing microcode updates.

Related publications

IBM Redbooks

Lists IBM Redbooks relevant for detailed discussion of topics.

Online resources

Provides links to relevant IBM websites for further information.

Related product manuals