Chapter 7. Advanced maintenance, troubleshooting, and diagnostics 429
7.9 Replacement and maintenance procedures
In this section, we discuss how to resolve some hardware failures on the DS4000 or DS5000
storage subsystem. See the Installation, User’s and Maintenance Guide for your DS4000 or
DS5000 storage subsystem for detailed parts replacement procedures. These publications
can be downloaded from the Documents section on the IBM Storage Support Web site at the
following address:
http://www.storage.ibm.com/support
7.9.1 Managing disk failures
The DS4000 and DS5000 controllers are constantly monitoring the status of the disk drives.
Whenever an error threshold is exceeded, then the disk is marked as failed. This triggers the
audible enclosure alarm to sound (unless disabled) and the subsystem appears in a
non-optimal state. A critical event is logged in MEL and the Recovery Guru button starts
flashing. All critical events are sent to the SNMP management console or to the e-mail
recipient that you have configured to receive alert notifications by selecting Edit ï‚® Configure
Alerts in the Enterprise Management window. The amber FAULT LED is illuminated on the
faulty drive.
If the array has been configured with redundancy protection (RAID 1, 3, 5, or 6), then the
drive failure will cause the array and associated logical drives to change to a degraded state.
This indicates that the array has lost RAID redundancy. For RAID 1 or 6 arrays, this is only a
partial loss of redundancy.
If a standby hotspare drive with the same (or greater) capacity and performance
characteristics is available, then it takes over from the failed drive. Reconstruction of data
onto the hotspare starts automatically. Once reconstruction of all associated logical drives is
complete, the array returns to an Optimal state. At this point, the failed drive slot is still
associated with the array. The hotspare drive remains assigned as a hotspare, but assumes a
temporary association with the array.