Chapter 14. Network Card Replacement
14.1. Network Card Replacement Overview
This is a high-level overview of the procedure to replace one or more network cards on the DGX A100
system.
1. Use the nvsm show commands to identify the failed network card.
2. Get a replacement card from NVIDIA Enterprise Support.
3. Shut down the system.
4. Label all motherboard tray cables and unplug them.
5. Remove the motherboard tray and open the lid.
6. Locate the failed network card and remove it.
7. Insert the new card into the slot and secure with the screw.
8. Close the lid on the motherboard tray, then insert the tray into the system.
9. Plug in all cables using the labels as a reference.
10. Power on the system.
11. Verify that the network card is healthy using nvsm.
14.2. Identifying the Failed Network Card
Before attempting to replace any of the network cards, be sure to have performed the following:
1. Issue the following.
$ sudo nvsm show health
2. Match the PCIe bus ID for the failed card with the slot ID using the following diagram.
63