Comments Off on NCCL


The NVIDIA Collective Communication Library (NCCL) implements multi-GPU and multi-node communication primitives optimized for NVIDIA GPUs and Networking. NCCL provides routines such as all-gather, all-reduce, broadcast, reduce, reduce-scatter as well as point-to-point send and receive that are optimized to achieve high bandwidth and low latency over PCIe and NVLink high-speed interconnects within a node and over NVIDIA Mellanox Network across nodes.

Set up the environment and version

ml nvidia/nccl

Available version : 2.18.1


Run NCCL tests on a GPU server.

  • Run tests on a GPU server. Faire tourner les tests sur un serveur GPU
  • Edit the file called
#SBATCH --job-name=nccl_test
#SBATCH --partition=bigpu
#SBATCH --gres=gpu:2
#SBATCH --time=0:10:00
#SBATCH --output=job-%j.out
#SBATCH --nodes=1

ml nvidia/nccl
git clone
cd nccl-tests
./build/all_reduce_perf -b 8 -e 256M -f 2 -g 2
#./build/all_reduce_perf -b 8 -e 256M -f 2 -g <ngpus>
  • Launch the job