diff options
author | Sylvain Jeaugey <sjeaugey@nvidia.com> | 2020-09-05 00:35:05 +0300 |
---|---|---|
committer | Sylvain Jeaugey <sjeaugey@nvidia.com> | 2020-11-17 22:08:52 +0300 |
commit | 920dbe5b359fe5817b8ba874476ca4ba2dc5f1ef (patch) | |
tree | da539cb823c9e11e4fa8e7e6de88dd4a662c7128 /src/graph/rings.cc | |
parent | 084207e685c4587e7d0aa2f1f7f148d3e0e68da6 (diff) |
2.8.3-1
Optimization for Tree allreduce on A100.
Improve aggregation performance.
Use shared buffers for inter-node send/recv.
Add NVTX profiling hooks.
Accelerate alltoall connections by merging communication for all
channels.
Add support for one hop communication through NVLink, for faster
send/recv communication on cubemesh topologies like DGX-1.
Improve alltoall scheduling to better balance intra/inter node
communication.
Increase send/recv parallelism by 8x, each warp sending or
receiving to a different peer.
Net: move to v4.
Net: make flush operation asynchronous to accelerate alltoall.
Net: define maximum number of requests.
Fix hang when using LL128 protocol after 2^31 steps.
Fix #379 : topology injection failing when using less GPUs than
described in the XML.
Fix #394 : protocol mismatch causing hangs or crashes when using
one GPU per node.
Diffstat (limited to 'src/graph/rings.cc')
-rw-r--r-- | src/graph/rings.cc | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/src/graph/rings.cc b/src/graph/rings.cc index 5aacbb5..53130d1 100644 --- a/src/graph/rings.cc +++ b/src/graph/rings.cc @@ -21,7 +21,7 @@ void dumpLine(int* values, int nranks, const char* prefix) { ncclResult_t ncclBuildRings(int nrings, int* rings, int rank, int nranks, int* prev, int* next) { for (int r=0; r<nrings; r++) { - char prefix[30]; + char prefix[40]; /*sprintf(prefix, "[%d] Channel %d Prev : ", rank, r); dumpLine(prev+r*nranks, nranks, prefix); sprintf(prefix, "[%d] Channel %d Next : ", rank, r); |