57
www.huawei.com ▪ Huawei Confidential ▪ 57
How to Defend: Disadvantages
The G5500 supports PCIe
GPU cards and does not
support NVLink GPU cards.
1. NVLink is mainly used in specific
scenarios to improve the
performance and the price is high.
2. The specifications are being
developed.
Currently, the roadmap and guidance are based on PCIe GPU
mainly out of the following considerations:
(1) NVLink GPU is used to add NVLink high
-speed interconnection between GPUs, increase the transmission bandwidth
between GPUs, and improve the performance of some specific applications (requiring high
bandwidth communication between
GPUs); the performance improvement is not obvious in other applications.
(2) The cost of NVLink GPU cards is higher than that of the PCIe GPU cards (estimated to be 30% highly) and the application
scope is narrow. The PCIe GPU card specifications are preferentially developed. The 8*NVLink card specifications are
expected to be delivered in Q2 2018.
8G,
and the density is not high.
1. As cabinet power supply in normal
equipment rooms is limited, heat
dissipation for some components is
difficult if the density is too high.
2. When the density is too high, the
chassis power modules will be limited
and it will be difficult to support N+N
redundancy.
1. The power consumption of the GPU server is high (up to 250
–300 W for a GPU). According to the preliminary tests, after
the 8 P100 and 145 W CPUs are pushed to maximum load, the power consumption will exceed 3200 W. In most equipment
rooms, the power supply capability of a single cabinet is 6
15 kW. Even if the 15 kW is used, only up to 4 GPU servers can be
configured. The high density is not a significant benefit
on GPU servers, and the high density also causes difficulties in
cabinet
heat dissipation and increases difficulty in engineering design.
2. When the 2U height supports 8 GPUs, it is difficult to supply power to the entire chassis. For example, two power modules
are generally used for 2U
-8GPU. Even if with 16 A power, the maximum power supply is 3000 W with 220 V AC. When the
power consumption exceeds 3000 W, power supply cannot work in 1+1 mode and reliability cannot be ensured. An alternative
is
to separate the power modules from the chassis and use external power modules. However, this will occupy space when
and the actual density will decrease.
3. When the height of 8GPU is 2U, the CPUs and GPUs will have to
be configured in a front and rear layout due to the
limitation of the GPU card size. As a result, the overall heat dissipation channel is cascaded. When the load is full, the CP
and GPU cannot support 35
°C. (Some vendors clearly indicate that the temperature specification is reduced to 25°
C when the
GPU server is fully configured.)
The G5500 G560 does not
support the Scalable
Processors.
1. The GPU server focuses on GPU
acceleration.
2. The specifications are being
developed.
1. The GPU server is used for AI and HPC applications, which are mainly used for GPU acceleration and do not require high
CPU performance. Therefore, the GPU server preferentially supported the latest GPU card first.
2. Using the modular design, the CPU and GPU can be independently upgraded and replaced. The model supporting Scalable
Processors is being developed.
When the G5500 G560 is
configured with 4 or more
SATA SSDs, the
performance cannot reach
the line rate.
1. For specifications lower than 4
drives, consider using the SATA
SSDs to form RAID 5.
2. Consider using NVMe SSD.
1. When the 4 SATA SSDs are used to form RAID 5, the performance can reach the line rate, which can meet the
requirements of most applications.
2. The G560 supports 6 NVMe SSDs, which deliver higher performance.
The G5500 G560 does not
support the 2G Cache 3108
card.
1. There are only a few customers
with such requirements. Guide the
customers to use the 1G Cache 3108
card.
1. According to the 2G and 1G Cache test results, the read performance is not improved when RAID 5 is used. For writes
above 16KB, the random
write performance is improved by 10–20%. In the case of RAID 10, the read performance is
improved obviously for operations above
16KB and the write performance is not improved. It is necessary to confirm the
customer's application scenario and guide the customer to use 1G Cache.