Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/marian-nmt/nccl.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
path: root/src
AgeCommit message (Collapse)Author
2020-03-27Merge pull request #314 from NVIDIA/v2.6Sylvain Jeaugey
2.6.4-1
2020-03-212.6.4-1Sylvain Jeaugey
Add support for network collectives. Add support for XML topology dump/injection. Add text values for GDR and P2P Levels, including "NVL". Add speed detection for PCI, Infiniband and Ethernet cards. Add CPU detection for ARM and AMD CPUs. Add support for adaptive routing on Infiniband. Change NET plugin API to v3 : merge PCI path and GPU pointer capability into a single structure and add other properties.
2020-03-17Check return code for Flush operationRashika Kheria
Current NCCL code does not abort for failed Flush operations by underlying network. This may compromise data integrity. Signed-off-by: Rashika Kheria <rashika@amazon.com>
2020-02-12Fix Allgather operations above 4G with multiple GPUs per process.Sylvain Jeaugey
Fixes nccl-tests#37. Direct offsets were still on 32 bits in the low-level primitives.
2020-01-08[topology] remove NET links when trimming systemLuke Yeager
This fixes a memory leak.
2019-12-09Fix clang build (#274)Christian Sigg
The attribute is called `optnone`, not `noopt`.
2019-12-06Fix clang compilationSylvain Jeaugey
2019-12-06Fix clang build (#271)Christian Sigg
Clang doesn't understand `optimize("O0")`. It has `noopt`, which GCC doesn't understand. Wrap the difference in a macro.
2019-11-202.5.6-1 (#255)Sylvain Jeaugey
Add LL128 Protocol. Rewrite the topology detection and tree/ring creation (#179). Improve tree performance by sending/receiving from different GPUs. Add model-based tuning to switch between the different algorithms and protocols. Rework P2P/SHM detection in containers (#155, #248). Detect duplicated devices and return an error (#231). Add tuning for GCP
2019-08-14Updated PR#196 to use a common hash functionDavid Addison
2019-08-14Merge branch 'shm' of git://github.com/lowintelligence/nccl into ↵David Addison
lowintelligence-shm
2019-08-14Make use of SO_REUSEPORT conditionalDavid Addison
Fixes: #244 SO_RESUEPORT was introduced in Linux 3.9 and later. This change allows NCCL to compile against older releases. The functionality is only required if the user is specifying a NCCL bootstrap address via an environment variable.
2019-07-17Fix NIC distances for 11+ NICsKe Wen
2019-07-17Fix #224: prevent number of IB devices from going out of boundKe Wen
2019-07-12Size up IPC buffers to multiples of 2MBKe Wen
Avoid potential CUDA error in concurrent communicator initialization
2019-07-10Add the exact matching modifier support "=" to the NCCL_IB_HCA variable (#236)Hirochika Asai
Perform exact matching when the prefix "=" is specified in the NCCL_IB_HCA variable to exclude HCAs mlx5_X[0-9]+ when mlx5_X is specified.
2019-06-25Merge branch 'master' into HEADKe Wen
2019-06-252.4.8-1Ke Wen
Fix #209: improve socket transport performance Split transfers over multiple sockets Launch multiple threads to drive sockets Detect AWS NICs and set nsockets/nthreads accordingly
2019-06-21Fix out-of-bounds read in ncclStrToCpuset (#233)Felix Abecassis
The affinityStr string was not null-terminated but was passed to strlen(3). Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>
2019-05-10NCCL 2.4.7-1David Addison
Performance tweaks for PowerPC builds only; Set default NCCL_MIN_NRINGS to 4 Disable PCI-E NUMA distance detection
2019-05-08Allow CUDA runtime library selection (#220)jakirkham
Makes a change to allow the user to select between the static CUDA runtime library (default) and the dynamic CUDA runtime library. Does this by allowing `CUDARTLIB` to be overridden.
2019-04-08Add pkgconfig file (#190)Gustavo Alvarez
2019-04-05NCCL 2.4.6-1David Addison
Added detection of IBM/Power NVLink bridge device. Add NUMA support to PCI distance calculations. Added NCCL_IGNORE_CPU_AFFINITY env var. Fix memory leaks; GithubIssue#180 Compiler warning fix; GithubIssue#178 Replace non-standard variable length arrays. GithubIssue#171 Fix Tree+Shared Memory crash. GithubPR#185 Fix LL cleanup hang during long running DL jobs. Fix NCCL_RINGS environment variable handling. Added extra checks to catch repeat calls to ncclCommDestroy() GithubIssue#191 Improve bootstrap socket connection reliability at scale. Fix hostname hashing issue. GithubIssue#187 Code cleanup to rename all non device files from *.cu to *.cc
2019-03-15Fix share memory collision in multi-communicator case.Cao Zongyan
Current SHM object name would only use pidHash and ranks as identification, which would collide each other when program runs with multiple communicators. Here we added commId info into pidHash, it makes 'pidHash'es of different communicators keeping in same process will be distincted with each other.
2019-03-04Fix crash during shared memory creation (#185)Rong Ou
The shared memory filename was only based on the destination. While this was OK for rings since only one rank would send data to a given rank, it would crash with trees because they communicate in both directions. Co-authored-by: Rong Ou <rong.ou@gmail.com>
2019-01-302.4.2-1Sylvain Jeaugey
Add tree algorithms for allreduce to improve performance at scale. Add ncclCommAbort() and ncclCommGetAsyncError() to properly handle network errors and be permit recover. Detect initial CPU affinity and no longer escape it.
2019-01-08Fix memory leak in bootstrapRoot()Christian Sigg
2018-12-14Replace CUDA_VERSION by CUDART_VERSIONSylvain Jeaugey
2018-12-14Qualify nullptr_t with std::Christian Sigg
2018-12-14Two temporary workarounds for cuda-clang issues.Christian Sigg
2018-12-14Change __CUDACC_VER_*__ preprocessor directives to CUDA_VERSION because ↵Christian Sigg
clang doesn't define the former.
2018-12-11Fix #163 : remove warningsSylvain Jeaugey
2018-12-05Remove error logging from a normal pathSylvain Jeaugey
When initNet fails, we should not print the backtrace as it is supposed to be normal operation (falling back to sockets)
2018-12-05Fix GPU Direct RDMA detection.Sylvain Jeaugey
Whether the network supported GPU Direct RDMA or not was ignored, causing sockets to break when cards were local enough that NCCL tried to use it.
2018-12-05Add NCCL_NET flag to many debug lines.Sylvain Jeaugey
2018-12-04Improve INFO message when external network is not found.Sylvain Jeaugey
Fix #162
2018-11-30Fixed some compilation errors when TRACE=1 setDavid Addison
2018-11-29Rework shared memory code to use SYSCHECK macros.Sylvain Jeaugey
This is to handle EINTR/EGAIN properly (issue #137), and also make the code consistent with the rest. Unfortunately posix_fallocate and mmap do not follow the classic return code/errno pattern, so we need to write wrappers around those functions.
2018-11-29Rework SYSCHECK macros to better handle retries.Sylvain Jeaugey
SYSCHECKVAL was not retrying when a retry was needed. Since not all calls are inside a loop, that means we could silently miss an EINTR/EAGAIN return code. Also rework the socket connection code and improve error reporting.
2018-11-27Improve net API descriptionSylvain Jeaugey
2018-11-27Make network isend/irecv non blockingSylvain Jeaugey
2018-11-27Add support for external network.Sylvain Jeaugey
Dynamically load external network from libnccl-net.so. Add init function in networks. Move PCI scoring to net.cu, only ask transport to provide a path. Simplify CUDA PCI path detection. Add dummy external network
2018-11-20Generate host-hash for P2P and SHM based on $(readlink /proc/self/ns/uts) + ↵Alex Sergeev
$(readlink /proc/self/ns/mnt) (#156)
2018-11-10Generate nccl.h in build instead of srcSylvain Jeaugey
Generating nccl.h in src makes source directories dirty after builds.
2018-10-252.3.7-1v2.3.7-1David Addison
Improved LL tuning for multi-node jobs. Improved bootstrap for large job scaling. Fixed a hang during bootstrap due to socket reuse. Added operation name to the COLL INFO logging.
2018-09-262.3.5-5v2.3.5-5Sylvain Jeaugey
Add support for inter-node communication using sockets and InfiniBand/RoCE. Improve latency. Add support for aggregation. Improve LL/regular tuning. Remove tests as those are now at github.com/nvidia/nccl-tests .
2017-06-14Add support for CUDA9 half semanticsSylvain Jeaugey
2017-03-16Fix compilation error when compiling with 'clang -x cuda'.Ilya Biryukov
Functions vFetch and vStore are not found by ADL with clang, so they need to be declared before usage in ReduceCopy.
2017-03-02Only enable peer access for ring neighbors.Nathan Luehr
This enables support for systems with more than 9 GPUs attached to a single PCIe root complex.
2017-03-02Fix copy/paste typo in error messageSylvain Jeaugey