EasyManua.ls Logo

Nvidia DGX-2 SYSTEM User Manual

Nvidia DGX-2 SYSTEM
109 pages
To Next Page IconTo Next Page
To Next Page IconTo Next Page
To Previous Page IconTo Previous Page
To Previous Page IconTo Previous Page
Page #41 background imageLoading...
Page #41 background image
Network Configuration
DGX-2 System User Guide
41
5.8.3 Switching the Port from InfiniBand to
Ethernet
Make sure that you have started the Mellanox Software Tools (MST) services as explain
in the section Starting the Mellanox Software Tools
, and have identified the correct ports
to change.
1. Change the configuration for the network cluster ports to Ethernet by setting
LINK_TYPE_P1=2 for each port.
The following example configures the 8 network cluster ports.
~$ sudo mlxconfig -y -d /dev/mst/mt4119_pciconf0 set LINK_TYPE_P1=2
~$ sudo mlxconfig -y -d /dev/mst/mt4119_pciconf1 set LINK_TYPE_P1=2
~$ sudo mlxconfig -y -d /dev/mst/mt4119_pciconf2 set LINK_TYPE_P1=2
~$ sudo mlxconfig -y -d /dev/mst/mt4119_pciconf3 set LINK_TYPE_P1=2
~$ sudo mlxconfig -y -d /dev/mst/mt4119_pciconf4 set LINK_TYPE_P1=2
~$ sudo mlxconfig -y -d /dev/mst/mt4119_pciconf5 set LINK_TYPE_P1=2
~$ sudo mlxconfig -y -d /dev/mst/mt4119_pciconf6 set LINK_TYPE_P1=2
~$ sudo mlxconfig -y -d /dev/mst/mt4119_pciconf7 set LINK_TYPE_P1=2
2. Reboot the server.
3. Verify the configuration changes have been applied.
$ sudo mlxconfig query | egrep -e Device\|LINK_TYPE
Device #1:
Device type: ConnectX4
Device: 0000:bd:00.0
LINK_TYPE_P1 ETH (1)
Device #2:
Device type: ConnectX4
Device: 0000:b8:00.0
LINK_TYPE_P1 ETH (1)
Device #3:
Device type: ConnectX4
Device: 0000:3a:00.0
LINK_TYPE_P1 ETH (1)
Device #4:
Device type: ConnectX4
Device: 0000:e1:00.0
LINK_TYPE_P1 ETH (1)
Device #5:
Device type: ConnectX4
Device: 0000:35:00.0
LINK_TYPE_P1 ETH (1)
Device #6:
Device type: ConnectX4
Device: 0000:5d:00.0
LINK_TYPE_P1 ETH (1)
Device #7:
Device type: ConnectX4

Table of Contents

Other manuals for Nvidia DGX-2 SYSTEM

Question and Answer IconNeed help?

Do you have a question about the Nvidia DGX-2 SYSTEM and is the answer not in the manual?

Nvidia DGX-2 SYSTEM Specifications

General IconGeneral
GPU Memory512 GB total (32 GB per GPU)
RAM1.5 TB
GPU16 x NVIDIA Tesla V100
CPU2 x Intel Xeon Platinum 8168
Storage30 TB SSD
InterconnectNVSwitch
Form FactorServer
Operating SystemUbuntu Server

Summary

Introduction to the NVIDIA DGX-2 System

1.1 About This Document

Provides an overview of the DGX-2 System structure and content.

1.2 Hardware Overview

Details the major components and specifications of the DGX-2 System.

1.2.2 Other Components not in Exploded View

Lists additional components not shown in the main exploded view.

1.2.3 Mechanical Specifications

Details the physical dimensions and form factor of the DGX-2 System.

1.2.4 Power Specifications

Outlines the power requirements and supply details.

1.2.5 Environmental Specifications

Information on operating temperature, humidity, and airflow.

1.2.6 Front Panel Connections and Controls

Identifies and explains the front panel interfaces and buttons.

1.2.7 Rear Panel Connections and Controls

Details the rear panel interfaces and connectors.

1.2.8 Motherboard Tray Ports and Controls

Describes ports and controls on the motherboard tray.

1.3 Network Ports

Describes the available network ports and their purposes.

1.4 Recommended Ports for External Storage

Guides on using specific ports for external storage connectivity.

1.5 DGX OS Software

Outlines the operating system and included software packages.

1.6 Additional Documentation

Lists other NVIDIA documentation for DGX systems.

1.7 Customer Support

Provides contact information for NVIDIA Enterprise Support.

Connecting to the DGX-2 Console

2.1 Direct Connection

Instructions for connecting a display and keyboard directly.

2.2 Remote Connection through the BMC

Steps for accessing the console via the Baseboard Management Controller.

2.3 SSH Connection

How to establish an SSH connection for console access.

Setting Up the DGX-2 System

3.1 Quick Start Instructions

Basic requirements and steps for initial system setup.

3.2 Installation and Configuration

Overview of the installation and configuration process.

3.3 Obtaining an NVIDIA GPU Cloud Account

Guide to setting up an NGC account for GPU-accelerated tools.

3.4 Getting NGC API Key and Container Tags

Steps to get NGC API Key and select container tags for examples.

3.5 Verifying Basic Functionality

Procedures to check system health and software installation.

Network Configuration

4.1 BMC Security

Recommendations for securing BMC management access.

4.2 Configuring Network Proxies

Steps to configure proxy settings for OS and applications.

4.3 Configuring Docker IP Addresses

How to set up distinct subnets for Docker containers.

4.4 Opening Ports

Required firewall ports for DGX-2 System access.

4.5 Connectivity Requirements

Network URLs and verification for NGC container access.

4.6 Configuring Static IP Address for the BMC

Methods to set a static IP address for the BMC.

4.7 Configuring Static IP Addresses for Network Ports

Steps to configure static IP addresses for network interfaces.

4.8 Switching Between InfiniBand and Ethernet

How to reconfigure network ports between InfiniBand and Ethernet.

Configuring Storage – NFS Mount and Cache

Updating the DGX OS Software

6.1 Connectivity Requirements for Software Updates

Verifying network connectivity for OS software updates.

6.2 Update Instructions

Step-by-step guide for updating the DGX OS software.

Updating Firmware

7.1 General Firmware Update Guidelines

Best practices to prevent firmware corruption during updates.

7.2 Obtaining the Firmware Update Container

How to get the container for firmware updates.

7.3 Querying the Firmware Manifest

How to display firmware components qualified by NVIDIA.

7.4 Querying Installed Firmware Versions

Checking current firmware versions against the manifest.

7.5 Updating the Firmware

Procedures for updating system and BMC firmware.

7.6 Additional Options

Explains options like forcing firmware updates.

7.7 Command Summary

A quick reference for firmware update commands.

7.8 Removing the Container

Instructions for removing firmware update containers.

Using the BMC

8.1 Connecting to the BMC

Steps to access the Baseboard Management Controller.

8.2 Overview of BMC Controls

Explains the primary controls available on the BMC dashboard.

Using DGX-2 System in KVM Mode

9.1 Overview

Introduction to NVIDIA KVM for GPU multi-tenancy.

9.2 Preliminary Setup - Converting to KVM Host

Steps to install DGX KVM software and prepare the system.

9.3 Launching a Guest GPU VM Instance

Guide to creating and managing virtual machines.

9.4 Stopping, Restarting, and Deleting Guest VMs

Procedures for managing the lifecycle of virtual machines.

9.5 Connecting to Your Guest GPU VM

Methods for establishing connections to virtual machines.

9.6 Managing Images

Instructions for installing and managing KVM images.

9.7 Using Guest OS Drives and Data Drives

Explains how OS and data drives are generated for VMs.

9.8 Updating the Software

How to update DGX OS software for host and guest VMs.

9.9 Supplemental Information

Details on resource allocations and management in KVM.

9.10 NVIDIA KVM Security Considerations

Notes on security policies and firewall settings for KVM.

9.11 Launching VMs in Degraded Mode

Using VMs when GPUs may have failed.

9.12 Restarting VM After System or VM Crashes

Steps to restart VMs after system or VM failures.

9.13 Restoring a System from Degraded Mode

How to return the system to normal operation after GPU failures.

Appendix A. Installing Software on Air-gapped DGX-2 Systems

A.1 Installing NVIDIA DGX-2 Software

Methods for updating software on isolated systems.

A.2 Re-Imaging the System

Procedures for re-imaging isolated DGX-2 systems.

A.3 Creating a Local Mirror of Repositories

Setting up private repositories for software updates.

A.4 Installing Docker Containers

Method for installing containers from NGC on air-gapped systems.

Appendix B. SAFETY

B.1 Safety Information

General safety guidelines for handling the server.

B.2 Safety Warnings and Cautions

Explains safety symbols and associated hazards.

B.3 Intended Application Uses

Specifies the intended use of the product as ITE.

B.4 Site Selection

Criteria for choosing a suitable installation site.

B.5 Equipment Handling Practices

Guidelines for safe movement and lifting of equipment.

B.6 Electrical Precautions

Warnings and precautions related to power and electrical safety.

B.7 System Access Warnings

Safety instructions for accessing the internal components.

B.8 Rack Mount Warnings

Guidelines for safely installing the system in a rack.

B.9 Electrostatic Discharge (ESD)

Precautions to prevent damage from static electricity.

B.10 Other Hazards

Information on chemical hazards and battery replacement.

Appendix C. COMPLIANCE

C.1 United States

FCC compliance information for the US.

C.2 United States / Canada

cULus listing mark and US/Canada compliance.

C.3 Canada

Canadian ISED compliance and interference regulations.

C.4 CE

European Conformity directives and Class A product notice.

C.5 Japan

VCCI compliance and Japan RoHS material content declaration.

C.6 Australia and New Zealand

ACMA EMC requirements for Class A equipment.

C.7 China

China RoHS material content declaration and hazardous substance limits.

C.8 Israel

SII compliance information for Israel.

C.9 Russia/Kazakhstan/Belarus

EurAsian Customs Union compliance.

C.10 Vietnam

ICT compliance for Vietnam.

C.11 South Korea

KATS compliance and Korea RoHS material content declaration.

C.12 Taiwan

BSMI compliance and Taiwan RoHS material content declaration.

Related product manuals