github.com/marian-nmt/nccl.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2020-03-27	Merge pull request #314 from NVIDIA/v2.6	Sylvain Jeaugey
	2.6.4-1
2020-03-21	2.6.4-1	Sylvain Jeaugey
	Add support for network collectives. Add support for XML topology dump/injection. Add text values for GDR and P2P Levels, including "NVL". Add speed detection for PCI, Infiniband and Ethernet cards. Add CPU detection for ARM and AMD CPUs. Add support for adaptive routing on Infiniband. Change NET plugin API to v3 : merge PCI path and GPU pointer capability into a single structure and add other properties.
2020-03-17	Check return code for Flush operation	Rashika Kheria
	Current NCCL code does not abort for failed Flush operations by underlying network. This may compromise data integrity. Signed-off-by: Rashika Kheria <rashika@amazon.com>
2020-02-12	Fix Allgather operations above 4G with multiple GPUs per process.	Sylvain Jeaugey
	Fixes nccl-tests#37. Direct offsets were still on 32 bits in the low-level primitives.
2020-01-08	[topology] remove NET links when trimming system	Luke Yeager
	This fixes a memory leak.
2019-12-09	Fix clang build (#274)	Christian Sigg
	The attribute is called `optnone`, not `noopt`.
2019-12-06	Fix clang compilation	Sylvain Jeaugey

2019-12-06	Fix clang build (#271)	Christian Sigg
	Clang doesn't understand `optimize("O0")`. It has `noopt`, which GCC doesn't understand. Wrap the difference in a macro.
2019-11-20	2.5.6-1 (#255)	Sylvain Jeaugey
	Add LL128 Protocol. Rewrite the topology detection and tree/ring creation (#179). Improve tree performance by sending/receiving from different GPUs. Add model-based tuning to switch between the different algorithms and protocols. Rework P2P/SHM detection in containers (#155, #248). Detect duplicated devices and return an error (#231). Add tuning for GCP
2019-08-14	Updated PR#196 to use a common hash function	David Addison

2019-08-14	Merge branch 'shm' of git://github.com/lowintelligence/nccl into ↵	David Addison
	lowintelligence-shm
2019-08-14	Make use of SO_REUSEPORT conditional	David Addison
	Fixes: #244 SO_RESUEPORT was introduced in Linux 3.9 and later. This change allows NCCL to compile against older releases. The functionality is only required if the user is specifying a NCCL bootstrap address via an environment variable.
2019-07-17	Fix NIC distances for 11+ NICs	Ke Wen

2019-07-17	Fix #224: prevent number of IB devices from going out of bound	Ke Wen

2019-07-12	Size up IPC buffers to multiples of 2MB	Ke Wen
	Avoid potential CUDA error in concurrent communicator initialization
2019-07-10	Add the exact matching modifier support "=" to the NCCL_IB_HCA variable (#236)	Hirochika Asai
	Perform exact matching when the prefix "=" is specified in the NCCL_IB_HCA variable to exclude HCAs mlx5_X[0-9]+ when mlx5_X is specified.
2019-06-25	Merge branch 'master' into HEAD	Ke Wen

2019-06-25	2.4.8-1	Ke Wen
	Fix #209: improve socket transport performance Split transfers over multiple sockets Launch multiple threads to drive sockets Detect AWS NICs and set nsockets/nthreads accordingly
2019-06-21	Fix out-of-bounds read in ncclStrToCpuset (#233)	Felix Abecassis
	The affinityStr string was not null-terminated but was passed to strlen(3). Signed-off-by: Felix Abecassis <fabecassis@nvidia.com>
2019-05-10	NCCL 2.4.7-1	David Addison
	Performance tweaks for PowerPC builds only; Set default NCCL_MIN_NRINGS to 4 Disable PCI-E NUMA distance detection
2019-05-08	Allow CUDA runtime library selection (#220)	jakirkham
	Makes a change to allow the user to select between the static CUDA runtime library (default) and the dynamic CUDA runtime library. Does this by allowing `CUDARTLIB` to be overridden.
2019-04-08	Add pkgconfig file (#190)	Gustavo Alvarez

2019-04-05	NCCL 2.4.6-1	David Addison
	Added detection of IBM/Power NVLink bridge device. Add NUMA support to PCI distance calculations. Added NCCL_IGNORE_CPU_AFFINITY env var. Fix memory leaks; GithubIssue#180 Compiler warning fix; GithubIssue#178 Replace non-standard variable length arrays. GithubIssue#171 Fix Tree+Shared Memory crash. GithubPR#185 Fix LL cleanup hang during long running DL jobs. Fix NCCL_RINGS environment variable handling. Added extra checks to catch repeat calls to ncclCommDestroy() GithubIssue#191 Improve bootstrap socket connection reliability at scale. Fix hostname hashing issue. GithubIssue#187 Code cleanup to rename all non device files from .cu to .cc
2019-03-15	Fix share memory collision in multi-communicator case.	Cao Zongyan
	Current SHM object name would only use pidHash and ranks as identification, which would collide each other when program runs with multiple communicators. Here we added commId info into pidHash, it makes 'pidHash'es of different communicators keeping in same process will be distincted with each other.
2019-03-04	Fix crash during shared memory creation (#185)	Rong Ou
	The shared memory filename was only based on the destination. While this was OK for rings since only one rank would send data to a given rank, it would crash with trees because they communicate in both directions. Co-authored-by: Rong Ou <rong.ou@gmail.com>
2019-01-30	2.4.2-1	Sylvain Jeaugey
	Add tree algorithms for allreduce to improve performance at scale. Add ncclCommAbort() and ncclCommGetAsyncError() to properly handle network errors and be permit recover. Detect initial CPU affinity and no longer escape it.
2019-01-08	Fix memory leak in bootstrapRoot()	Christian Sigg

2018-12-14	Replace CUDA_VERSION by CUDART_VERSION	Sylvain Jeaugey

2018-12-14	Qualify nullptr_t with std::	Christian Sigg

2018-12-14	Two temporary workarounds for cuda-clang issues.	Christian Sigg

2018-12-14	Change __CUDACC_VER_*__ preprocessor directives to CUDA_VERSION because ↵	Christian Sigg
	clang doesn't define the former.
2018-12-11	Fix #163 : remove warnings	Sylvain Jeaugey

2018-12-05	Remove error logging from a normal path	Sylvain Jeaugey
	When initNet fails, we should not print the backtrace as it is supposed to be normal operation (falling back to sockets)
2018-12-05	Fix GPU Direct RDMA detection.	Sylvain Jeaugey
	Whether the network supported GPU Direct RDMA or not was ignored, causing sockets to break when cards were local enough that NCCL tried to use it.
2018-12-05	Add NCCL_NET flag to many debug lines.	Sylvain Jeaugey

2018-12-04	Improve INFO message when external network is not found.	Sylvain Jeaugey
	Fix #162
2018-11-30	Fixed some compilation errors when TRACE=1 set	David Addison

2018-11-29	Rework shared memory code to use SYSCHECK macros.	Sylvain Jeaugey
	This is to handle EINTR/EGAIN properly (issue #137), and also make the code consistent with the rest. Unfortunately posix_fallocate and mmap do not follow the classic return code/errno pattern, so we need to write wrappers around those functions.
2018-11-29	Rework SYSCHECK macros to better handle retries.	Sylvain Jeaugey
	SYSCHECKVAL was not retrying when a retry was needed. Since not all calls are inside a loop, that means we could silently miss an EINTR/EAGAIN return code. Also rework the socket connection code and improve error reporting.
2018-11-27	Improve net API description	Sylvain Jeaugey

2018-11-27	Make network isend/irecv non blocking	Sylvain Jeaugey

2018-11-27	Add support for external network.	Sylvain Jeaugey
	Dynamically load external network from libnccl-net.so. Add init function in networks. Move PCI scoring to net.cu, only ask transport to provide a path. Simplify CUDA PCI path detection. Add dummy external network
2018-11-20	Generate host-hash for P2P and SHM based on $(readlink /proc/self/ns/uts) + ↵	Alex Sergeev
	$(readlink /proc/self/ns/mnt) (#156)
2018-11-10	Generate nccl.h in build instead of src	Sylvain Jeaugey
	Generating nccl.h in src makes source directories dirty after builds.
2018-10-25	2.3.7-1v2.3.7-1	David Addison
	Improved LL tuning for multi-node jobs. Improved bootstrap for large job scaling. Fixed a hang during bootstrap due to socket reuse. Added operation name to the COLL INFO logging.
2018-09-26	2.3.5-5v2.3.5-5	Sylvain Jeaugey
	Add support for inter-node communication using sockets and InfiniBand/RoCE. Improve latency. Add support for aggregation. Improve LL/regular tuning. Remove tests as those are now at github.com/nvidia/nccl-tests .
2017-06-14	Add support for CUDA9 half semantics	Sylvain Jeaugey

2017-03-16	Fix compilation error when compiling with 'clang -x cuda'.	Ilya Biryukov
	Functions vFetch and vStore are not found by ADL with clang, so they need to be declared before usage in ReduceCopy.
2017-03-02	Only enable peer access for ring neighbors.	Nathan Luehr
	This enables support for systems with more than 9 GPUs attached to a single PCIe root complex.
2017-03-02	Fix copy/paste typo in error message	Sylvain Jeaugey