432 IBM Midrange System Storage Hardware Guide
The Replace Drives option can also be used as a concurrent method of migrating individual
drives to new locations within the same storage subsystem. To do so, we need to ensure that
the target drive is the only hotspare of the same capacity and type as the source. Then we
can fail the source drive from Storage Manager by selecting it and navigating to Advanced ï‚®
Recovery ï‚® Fail drive. When the target hotspare drive takes over, we can use the same
procedure described in step #2 to make it the permanent replacement.
Disk replacement
The amber fault LED on the front of the drive indicates that it is in a powered down state and
ready for replacement. To replace the disk, perform these steps:
1. Release the latch on the disk by pressing on the inside of the bottom of the tray handle.
2. Pull the tray handle out into the open position and slide the drive out.
3. Wait for at least 60 seconds before inserting the replacement drive. Gently push the new
disk into the empty bay until the hinge of the tray handle latches beneath the storage
subsystem enclosure bezel and then push the tray handle down into the closed (latched)
position.
The amber fault LED on the front of the drive will flash while the drive is spinning up. When
complete, the new drive appears in the Storage Manager physical view and copyback from
hotspare starts automatically.
In cold climates, we recommend allowing the replacement disk to acclimatize within the drive
slot for at least one hour before pushing it in fully. This reduces the risk of early life failures as
the CRU drive is introduced into a controlled data center environment from a delivery vehicle.
The sudden change in temperature and humidity can result in a buildup of condensation. The
drive bays must never be left empty for an extended period, as this affects the internal airflow
within the enclosure.
7.9.2 Managing disks with an impending drive failure error
The DS4000 and DS5000 controllers are constantly monitoring the status of the disk drives. A
Predictive Failure Analysis (PFA) error is logged against the drive whenever a sufficient level
of errors are detected and regarded as a concern yet the drive remains usable. This should
be regarded as a warning that the drive is deteriorating and likely to fail in the near future.
This triggers the audible enclosure alarm to sound (unless disabled) and the subsystem
appears in a non-optimal state. A critical event is logged in MEL and the Recovery Guru
button starts flashing. All critical events are sent to the SNMP management console or to the
e-mail recipient that you have configured to receive alert notifications (you set these
notifications by selecting Edit ï‚® Configure Alerts in the Enterprise Management window).
Recovery Guru reports three levels of impending drive failure:
Low risk PFA This is when a PFA threshold is exceeded on an unassigned drive or
standby hotspare drive. The suspect drive should be replaced
whenever possible.
Medium risk PFA When a PFA threshold is exceeded on a drive that is a member of a
RAID 1, 3, 5, or 6 array. If the drive fails, then you might lose
redundancy. The suspect drive should be replaced at the earliest
opportunity.