EasyManuals Logo
Home>Nvidia>Computer Hardware>DGX A100

Nvidia DGX A100 User Manual

Nvidia DGX A100
118 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #33 background imageLoading...
Page #33 background image
Quick Start and Basic Operation
NVIDIA DGX A100 DU-09821-001 _v01|25
4.4.1. Startup Considerations
To keep your DGX A100 running smoothly, allow up to a minute of idle time after reaching the
login prompt. This ensures that all components can complete their initialization.
4.4.2. Shutdown Considerations
When shutting down DGX A100, always initiate the shutdown from the operating system,
momentary press of the power button, or by using Graceful Shutdown from the BMC, and wait
until the system enters a powered-off state before performing any maintenance.
WARNING: Risk of Danger - Removing power cables or using Power Distribution Units (PDUs)
to shut off the system while the Operating System is running may cause damage to sensitive
components in the DGX A100 server.
4.5. Verifying Functionality - Quick Health
Check
NVIDIA provides customers a diagnostics and management tool called NVIDIA System
Management, or NVSM. The nvsm command can be used to determine the system's health,
identify component issues and alerts, or run a stress test to make sure all components are
in working order while under load. The use of Docker is key to getting the most performance
out of the system since NVIDIA has optimized containers for all the major frameworks and
workloads used on DGX systems.
The following are the steps for performing a health check on the DGX A100 System, and
verifying the Docker and NVIDIA driver installation.
1. Establish an SSH connection to the DGX A100 System.
2. Run a basic system check.
$ sudo nvsm show health
3. Verify that the output summary shows that all checks are Healthy and that the overall
system status is Healthy.
4. Verify that Docker is installed by viewing the installed Docker version.
$ sudo docker --version
This should return the version as “Docker version 19.03.5-ce”, where the actual version
may differ depending on the specific release of the DGX OS Server software.
5. Verify connection to the NVIDIA repository and that the NVIDIA Driver is installed.
$ sudo docker run --gpus all --rm nvcr.io/nvidia/cuda:11.0-base nvidia-smi
Docker pulls the nvidia/cuda container image layer by layer, then runs nvidia-smi.
When completed, the output should show the NVIDIA Driver version and a description of
each installed GPU.

Table of Contents

Other manuals for Nvidia DGX A100

Questions and Answers:

Question and Answer IconNeed help?

Do you have a question about the Nvidia DGX A100 and is the answer not in the manual?

Nvidia DGX A100 Specifications

General IconGeneral
BrandNvidia
ModelDGX A100
CategoryComputer Hardware
LanguageEnglish

Related product manuals