EasyManuals Logo

Nvidia DGX A100 Service Manual

Nvidia DGX A100
108 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #14 background imageLoading...
Page #14 background image
NVIDIA DGX A100 Service Manual
3.2.2. Identifying the Failed Power Supply from the
Console
There are several ways to identify the failed PSU from the DGX A100 console.
Use the NVSM CLI as follows.
$ sudo nvsm show psus
The output shows information for each PSU. Look for any that do not report Status_Health=OK.
View the PSU status from the BMC.
Click Sensor from the left side menu and inspect the PSU information from the Normal Sensors
section.
Use ipmitool.
$ sudo ipmitool sdr |grep -i psu
Look for power supplies with no temperature reading or an output reading close to or equal to
zero.
Both NVSM and the BMC identify each power supply as PSUx, where x is from 0 to 5. The following
diagram shows the physical location of each PSU.
8 Chapter 3. Power Supply Replacement

Table of Contents

Other manuals for Nvidia DGX A100

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Nvidia DGX A100 and is the answer not in the manual?

Nvidia DGX A100 Specifications

General IconGeneral
GPU8 x NVIDIA A100 Tensor Core GPUs
System Memory1 TB DDR4
Storage15 TB NVMe SSD
GPU Memory320 GB total (40 GB per GPU)
CPU2 x 64-Core AMD EPYC 7742
Networking8 x 200 Gb/s InfiniBand or Ethernet
InterconnectNVIDIA NVLink

Related product manuals