U.2 NVMe Cache Drive Replacement
DGX-2 System DU-09224-001 _v09|8
Identifying the Failed NVMe from the Console
To identify the failed NVMe drive from the DGX-2 console, enter the following and then look for
a missing entry from the output.
$ sudo mdadm -D /dev/md1
Number Major Minor RaidDevice State
0 259 8 0 active sync /dev/nvme9n1
1 259 13 1 active sync /dev/nvme5n1
2 259 7 2 active sync /dev/nvme6n1
3 259 10 3 active sync /dev/nvme3n1
4 259 12 4 active sync /dev/nvme2n1
5 259 11 5 active sync /dev/nvme7n1
6 259 9 6 active sync /dev/nvme8n1
7 259 6 7 active sync /dev/nvme4n1
The list should include device names from nvme2n1 through nvme9n1 for systems with 8 NVMe
drives, and from nvme0n1 through nvme15n1 for systems with 16 NVMe drives.
To map the device name to the physical slot ID, enter the following, where X corresponds to
the missing device name.
$ ls -l /dev/disk/by-path |grep nvmeX |cut -d'|' -f3
The command returns the PCIe bus ID. Refer to the following figure to find the slot ID that
corresponds to the PCIe bus ID for the faulty drive.