EasyManuals Logo

Nvidia DGX A100 Service Manual

Nvidia DGX A100
108 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #45 background imageLoading...
Page #45 background image
Chapter 10. M.2 NVMe Boot Drive
Replacement
10.1. M.2 NVMe Boot Drive Replacement
Overview
This is a high-level overview of the procedure to replace a boot drive.
1. With the help of NVIDIA Enterprise Support, determine which M.2 drive needs to be replaced.
2. Get replacement from NVIDIA Enterprise Support.
3. Power down the system.
4. Label all cables and unplug them from the motherboard tray.
5. Slide motherboard out until it locks in place.
6. Open rear compartment and pull out the M.2 riser card with both M.2 disks attached.
7. Replace the failed M.2 device on the riser card.
8. Install the M.2 riser card with both M.2 disks.
9. Close the rear motherboard compartment and then slide the motherboard back into the system.
10. Plug in all cables using the labels as a reference.
11. Power on the system.
12. Conrm the M.2 RAID 1 mirror is synchronizing.
13. Ship back the failed unit to NVIDIA Enterprise Support using the packaging provided.
10.2. Identifying the Failed M.2 NVMe
The DGX A100 system automatically sets the failed M.2 drive oine when it detects the failure.
1. Identify which of the M.2 drives has failed (nvme0n1 or nvme1n1).
$ sudo nvsm show health
2. You can conrm this by issuing the following.
39

Table of Contents

Other manuals for Nvidia DGX A100

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Nvidia DGX A100 and is the answer not in the manual?

Nvidia DGX A100 Specifications

General IconGeneral
GPU8 x NVIDIA A100 Tensor Core GPUs
System Memory1 TB DDR4
Storage15 TB NVMe SSD
GPU Memory320 GB total (40 GB per GPU)
CPU2 x 64-Core AMD EPYC 7742
Networking8 x 200 Gb/s InfiniBand or Ethernet
InterconnectNVIDIA NVLink

Related product manuals