EasyManua.ls Logo

Nvidia DGX A100

Nvidia DGX A100
108 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Loading...
Chapter 12. DIMM Replacement
12.1. DIMM Replacement Overview
This is a high-level overview of the procedure to replace a dual inline memory module (DIMM) on the
DGX A100 system.
1. Use the nvsm health commands to identify the failed DIMM
2. Get a replacement DIMM from NVIDIA Enterprise Support.
3. Shut down the system.
4. Label all motherboard tray cables and unplug them.
5. Remove the motherboard tray and place on a solid at surface.
6. Remove the motherboard tray lid.
7. Use the reference diagram on the lid of the motherboard tray to identify the failed DIMM.
8. Replace the bad DIMM with the new one.
9. Close the lid on the motherboard tray.
10. Insert the motherboard tray into the system.
11. Plug in all cables using the labels as a reference.
12. Power on the system.
13. Verify that all DIMMs are now healthy with nvsm.
12.2. Identifying the Failed DIMM
1. From the console, run the following nvsm command to identify memory alerts.
$ sudo nvsm show ∕systems∕localhost∕memory∕alerts
Alerts will appear under the Target section. For example.
Targets:
alert0
2. Get specic information about the memory alert.
The following example obtains information for alert0.
51

Table of Contents

Other manuals for Nvidia DGX A100

Related product manuals