Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/marian-nmt/nccl.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-11-20Merge remote-tracking branch 'nccl/master' into HEADHEADmasterMarcin Junczys-Dowmunt
2020-11-172.8.3-1Sylvain Jeaugey
Optimization for Tree allreduce on A100. Improve aggregation performance. Use shared buffers for inter-node send/recv. Add NVTX profiling hooks. Accelerate alltoall connections by merging communication for all channels. Add support for one hop communication through NVLink, for faster send/recv communication on cubemesh topologies like DGX-1. Improve alltoall scheduling to better balance intra/inter node communication. Increase send/recv parallelism by 8x, each warp sending or receiving to a different peer. Net: move to v4. Net: make flush operation asynchronous to accelerate alltoall. Net: define maximum number of requests. Fix hang when using LL128 protocol after 2^31 steps. Fix #379 : topology injection failing when using less GPUs than described in the XML. Fix #394 : protocol mismatch causing hangs or crashes when using one GPU per node.
2020-10-21fix proxyArgs for trace logxietingwew
2020-10-14Fix affinity moveSylvain Jeaugey
2020-10-14Make sure proxy threads inherit the CPU affinity.Sylvain Jeaugey
2020-08-05Setting type when gpu sub node is discoveredJack Snyder
2020-08-05Merge pull request #364 from badgerious/net-classSylvain Jeaugey
Add GPUs and NICs based on XML sub tags instead of PCI class.
2020-08-05Don't require NIC devices to have specific PCI classEric Badger
If a PCI node is the parent of a NIC, treat it as such, regardless of the PCI class code for the device. This allows non-traditional devices to act as NICs via the net plugin mechanism. For consistency, treat GPUs similarly.
2020-07-282.7.8-1David Addison
Fix collective mismatch error when using ncclSend/ncclRecv
2020-07-07Fix build action orderRiatre Foo
Add $(INCTARGETS) to build dependencies of %.o and $(DEVICELIB). As there were no dep files during the first build, Make may kick off source compilation before nccl.h got generated, which leads to occasional build failures on systems with high core count. The build failure could be reproduced reliably with a `sleep 5` in $(INCDIR)/nccl.h rule.
2020-06-272.7.6-1Sylvain Jeaugey
Fix crash when NVswitch is not visible inside a VM.
2020-06-272.7.5-1Sylvain Jeaugey
Minor fixes for A100 platforms. Add a WARN for invalid GroupEnd call.
2020-06-082.7.3-1Sylvain Jeaugey
Add support for A100 GPU and related platforms. Add support for CUDA 11. Add support for send/receive operations (beta).
2020-04-17Fix crash when only a subset of GPUs are visible within a container.Sylvain Jeaugey
Fixes #326.
2020-04-17Improve robustness of PCI detectionSylvain Jeaugey
Fallback to default values when class/speed is unknown.
2020-04-15Fix wrong variable name "slice" to "chunk"aokomoriuta
https://github.com/NVIDIA/nccl/issues/287
2020-04-10Fix bug #307 : wrong NIC selection on the reduction tree.Sylvain Jeaugey
The reduction tree (tree up) was inverting the NICs to use, causing performance issue in cases where we are using different NICs on a given channel.
2020-03-27Merge pull request #314 from NVIDIA/v2.6Sylvain Jeaugey
2.6.4-1
2020-03-212.6.4-1Sylvain Jeaugey
Add support for network collectives. Add support for XML topology dump/injection. Add text values for GDR and P2P Levels, including "NVL". Add speed detection for PCI, Infiniband and Ethernet cards. Add CPU detection for ARM and AMD CPUs. Add support for adaptive routing on Infiniband. Change NET plugin API to v3 : merge PCI path and GPU pointer capability into a single structure and add other properties.
2020-03-17Check return code for Flush operationRashika Kheria
Current NCCL code does not abort for failed Flush operations by underlying network. This may compromise data integrity. Signed-off-by: Rashika Kheria <rashika@amazon.com>
2020-02-12Fix Allgather operations above 4G with multiple GPUs per process.Sylvain Jeaugey
Fixes nccl-tests#37. Direct offsets were still on 32 bits in the low-level primitives.
2020-01-172.5.7-1Sylvain Jeaugey
2020-01-17Merge pull request #283 from lukeyeager/topo-trim-net-linksSylvain Jeaugey
Topo trim net links
2020-01-08[topology] remove NET links when trimming systemLuke Yeager
This fixes a memory leak.
2020-01-08[build] Allow setting CXXFLAGS on the command lineLuke Yeager
2020-01-07merge with newest masterMarcin Junczys-Dowmunt
2019-12-09Fix clang build (#274)Christian Sigg
The attribute is called `optnone`, not `noopt`.
2019-12-07Merge branch 'master' into HEADKe Wen
2019-12-072.5.6-2Ke Wen
Fix PPC64 Debian packaging
2019-12-06Fix clang compilationSylvain Jeaugey
2019-12-06Fix clang build (#271)Christian Sigg
Clang doesn't understand `optimize("O0")`. It has `noopt`, which GCC doesn't understand. Wrap the difference in a macro.
2019-11-202.5.6-1 (#255)Sylvain Jeaugey
Add LL128 Protocol. Rewrite the topology detection and tree/ring creation (#179). Improve tree performance by sending/receiving from different GPUs. Add model-based tuning to switch between the different algorithms and protocols. Rework P2P/SHM detection in containers (#155, #248). Detect duplicated devices and return an error (#231). Add tuning for GCP
2019-08-14Merge branch 'lowintelligence-shm'David Addison
PR#196
2019-08-14Updated PR#196 to use a common hash functionDavid Addison
2019-08-14Merge branch 'shm' of git://github.com/lowintelligence/nccl into ↵David Addison
lowintelligence-shm
2019-08-14Make use of SO_REUSEPORT conditionalDavid Addison
Fixes: #244 SO_RESUEPORT was introduced in Linux 3.9 and later. This change allows NCCL to compile against older releases. The functionality is only required if the user is specifying a NCCL bootstrap address via an environment variable.
2019-07-31Refine RPM package building spec file.Cao Zongyan
Add /sbin/ldconfig into RPM package install operations.
2019-07-17Fix NIC distances for 11+ NICsKe Wen
2019-07-17Fix #224: prevent number of IB devices from going out of boundKe Wen
2019-07-12Size up IPC buffers to multiples of 2MBKe Wen
Avoid potential CUDA error in concurrent communicator initialization
2019-07-10Add the exact matching modifier support "=" to the NCCL_IB_HCA variable (#236)Hirochika Asai
Perform exact matching when the prefix "=" is specified in the NCCL_IB_HCA variable to exclude HCAs mlx5_X[0-9]+ when mlx5_X is specified.
2019-06-25Merge branch 'master' into HEADKe Wen
2019-06-252.4.8-1Ke Wen
Fix #209: improve socket transport performance Split transfers over multiple sockets Launch multiple threads to drive sockets Detect AWS NICs and set nsockets/nthreads accordingly
2019-06-21Fix out-of-bounds read in ncclStrToCpuset (#233)Felix Abecassis
The affinityStr string was not null-terminated but was passed to strlen(3). Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>
2019-05-23Update debian dependencies in README (#228)Rajat Chopra
'fakeroot' is needed for building deb packages
2019-05-10NCCL 2.4.7-1David Addison
Performance tweaks for PowerPC builds only; Set default NCCL_MIN_NRINGS to 4 Disable PCI-E NUMA distance detection
2019-05-08Allow CUDA runtime library selection (#220)jakirkham
Makes a change to allow the user to select between the static CUDA runtime library (default) and the dynamic CUDA runtime library. Does this by allowing `CUDARTLIB` to be overridden.
2019-04-08Add pkgconfig file (#190)Gustavo Alvarez
2019-04-05NCCL 2.4.6-1David Addison
Added detection of IBM/Power NVLink bridge device. Add NUMA support to PCI distance calculations. Added NCCL_IGNORE_CPU_AFFINITY env var. Fix memory leaks; GithubIssue#180 Compiler warning fix; GithubIssue#178 Replace non-standard variable length arrays. GithubIssue#171 Fix Tree+Shared Memory crash. GithubPR#185 Fix LL cleanup hang during long running DL jobs. Fix NCCL_RINGS environment variable handling. Added extra checks to catch repeat calls to ncclCommDestroy() GithubIssue#191 Improve bootstrap socket connection reliability at scale. Fix hostname hashing issue. GithubIssue#187 Code cleanup to rename all non device files from *.cu to *.cc
2019-03-15Fix share memory collision in multi-communicator case.Cao Zongyan
Current SHM object name would only use pidHash and ranks as identification, which would collide each other when program runs with multiple communicators. Here we added commId info into pidHash, it makes 'pidHash'es of different communicators keeping in same process will be distincted with each other.