NVIDIA DGX A100 Service Manual
3.2.2. Identifying the Failed Power Supply from the
Console
There are several ways to identify the failed PSU from the DGX A100 console.
▶ Use the NVSM CLI as follows.
$ sudo nvsm show psus
The output shows information for each PSU. Look for any that do not report Status_Health=OK.
▶ View the PSU status from the BMC.
Click Sensor from the left side menu and inspect the PSU information from the Normal Sensors
section.
▶ Use ipmitool.
$ sudo ipmitool sdr |grep -i psu
Look for power supplies with no temperature reading or an output reading close to or equal to
zero.
Both NVSM and the BMC identify each power supply as PSUx, where x is from 0 to 5. The following
diagram shows the physical location of each PSU.
8 Chapter 3. Power Supply Replacement