82 Troubleshooting
• Use the CLI
• Monitor event notification
• View the enclosure LEDs
Use the SMC or RAIDar
The SMC and RAIDar use health icons to show OK, Degraded, Fault, or Unknown status for the system and
its components. The SMC and RAIDar enable you to monitor the health of the system and its components.
If any component has a problem, the system health will be Degraded, Fault, or Unknown. Use the web
application’s GUI to drill down to find each component that has a problem, and follow actions in the
component Health Recommendations field to resolve the problem.
Use the CLI
As an alternative to using the SMC or RAIDar, you can run the show system command in the CLI to
view the health of the system and its components. If any component has a problem, the system health will
be Degraded, Fault, or Unknown, and those components will be listed as Unhealthy Components. Follow
the recommended actions in the component Health Recommendation field to resolve the problem.
Monitor event notification
With event notification configured and enabled, you can view event logs to monitor the health of the
system and its components. If a message tells you to check whether an event has been logged, or to view
information about an event in the log, you can do so using the SMC, RAIDar, or the CLI. Using either the
SMC or RAIDar, you would view the event log and then click on the event message to see detail about that
event. Using the CLI, you would run the
show events detail command (with additional parameters to
filter the output) to see the detail for an event.
View the enclosure LEDs
You can view the LEDs on the hardware (while referring to LED descriptions for your enclosure model) to
identify component status. If a problem prevents access to the SMC, RAIDar, or the CLI, this is the only
option available. However, monitoring/management is often done at a management console using
storage management interfaces, rather than relying on line-of-sight to LEDs of racked hardware
components.
Performing basic steps
You can use any of the available options described above in performing the basic steps comprising the
fault isolation methodology.
Gather fault information
When a fault occurs, it is important to gather as much information as possible. Doing so will help you
determine the correct action needed to remedy the fault.
Begin by reviewing the reported fault:
•
Is the fault related to an internal data path or an external data path?
• Is the fault related to a hardware component such as a disk drive module, controller module, or power
supply unit?
By isolating the fault to one of the components within the storage system, you will be able to determine the
necessary corrective action more quickly.
Determine where the fault is occurring
Once you have an understanding of the reported fault, review the enclosure LEDs. The enclosure LEDs are
designed to immediately alert users of any system faults, and might be what alerted the user to a fault in
the first place.
When a fault occurs, the Fault ID status LED on an enclosure’s right ear illuminates [see the diagram
pertaining to your product’s front panel components on page 14 (2U24); page 15 (2U12); page 16
(2U48); or page 18 (4U56)]. Check the LEDs on the back of the enclosure to narrow the fault to a FRU,
connection, or both. The LEDs also help you identify the location of a FRU reporting a fault.