Thermal sensors
Thermal sensors throughout the enclosure and its components monitor the thermal health of the storage
system. Exceeding the limits of critical values will cause the Over-temperature alarm to occur.
Troubleshooting
The following sections describe problems that can occur with your storage systems and some possible
solutions. The module fault LED on the ops panel displays a solid amber color to indicate a fault. All
alarms will also be reported by SES. See Elastic Storage Server Spectrum Scale RAID Administration Guide
and the Maintenance Procedures section in Elastic Storage Server Problem Determination Guide.
Table 7. Alarm Conditions
Status Severity
PSU alert – loss of DC power from a single PSU Fault: loss of redundancy
Cooling module fan failure Fault: loss of redundancy
SBB I/O module detected PSU fault Fault
PSU removed Configuration error
Enclosure configuration error (VPD) Fault: critical
Low temperature warning Warning
High temperature warning Warning
Over-temperature alarm Fault: critical
Under-temperature alarm Fault: critical
I2C bus failure Fault: loss of redundancy
Ops panel communication error (I2C) Fault: critical
SBB I/O module fault Fault – critical
SBB I/O module removed Warning
Drive power control fault Warning; no loss of drive power
Drive power control fault Fault: critical; loss of drive power
Insufficient power available Warning
For information on how to remove and replace a module, see "Module Replacement."
Thermal monitoring and control
The system uses extensive thermal monitoring and takes a number of actions to ensure that component
temperatures are kept low and also to minimize acoustic noise. Air flows from the front to the rear of the
enclosure.
Symptom
If the ambient air is below 77 °F (25 °C) and the fans are observed to increase in speed, then
some restriction on airflow may be causing additional internal temperature rise.
Note: This is not a fault condition.
Cause The first stage in the thermal control process is for the fans to automatically increase in speed
when a thermal threshold is reached. This may be caused by higher ambient temperatures in the
local environment and may be perfectly normal.
Note: This threshold changes according to the number of drives and power supplies fitted.
Chapter 5. Troubleshooting 35