NVIDIA DGX A100 Service Manual
7.2.2. Identifying the Failed NVMe from the Console
To identify the failed NVMe drive from the DGX A100 console, enter the following and then look for
drive alerts in the output to identify the failed drive.
$ sudo nvsm show health
The command returns the PCIe bus ID. Refer to the following gure to nd the slot ID that corresponds
to the PCIe bus ID for the faulty drive.
Fig. 1: NVMe Drives: PCIe to Slot Mapping
Alternatively, you can use the BMC dashboard to access the Sensor screen, the IPMI event log, and
the System log to identify issues with the U.2 drives.
Note: The PCIe bus IDs for slots 6 and 7 depend on the rmware version.
30 Chapter 7. U.2 NVMe Cache Drive Replacement