diff options
author | Sylvain Jeaugey <sjeaugey@nvidia.com> | 2019-11-20 01:57:39 +0300 |
---|---|---|
committer | GitHub <noreply@github.com> | 2019-11-20 01:57:39 +0300 |
commit | 299c554dccf923230321ad7495946543f3e9b457 (patch) | |
tree | 6a70b52080f0570fc87285b3b2300dbd2f2918ad /src/collectives/device/common_kernel.h | |
parent | ccb1298148327bacb9b83452ed6ae0b29417e7e2 (diff) |
2.5.6-1 (#255)
Add LL128 Protocol.
Rewrite the topology detection and tree/ring creation (#179). Improve
tree performance by sending/receiving from different GPUs. Add
model-based tuning to switch between the different algorithms and
protocols.
Rework P2P/SHM detection in containers (#155, #248).
Detect duplicated devices and return an error (#231).
Add tuning for GCP
Diffstat (limited to 'src/collectives/device/common_kernel.h')
-rw-r--r-- | src/collectives/device/common_kernel.h | 2 |
1 files changed, 0 insertions, 2 deletions
diff --git a/src/collectives/device/common_kernel.h b/src/collectives/device/common_kernel.h index 435a598..aa1e936 100644 --- a/src/collectives/device/common_kernel.h +++ b/src/collectives/device/common_kernel.h @@ -263,8 +263,6 @@ __device__ __forceinline__ void ReduceCopyMulti(const int tid, const int nthread } } -#define WARP_SIZE 32 - template<class FUNC, typename T, int UNROLL, int MINSRCS, int MAXSRCS, int MINDSTS, int MAXDSTS> __device__ __forceinline__ void ReduceCopy128bMulti( const int w, const int nw, const int t, int nsrcs, const T* s[MAXSRCS], int ndsts, T* d[MAXDSTS], |