diff options
author | Brecht Van Lommel <brechtvanlommel@gmail.com> | 2016-10-02 15:48:39 +0300 |
---|---|---|
committer | Brecht Van Lommel <brechtvanlommel@gmail.com> | 2016-10-03 23:15:25 +0300 |
commit | a3abb020e37a072eb71fd30de9ab125d1c16623a (patch) | |
tree | b525be7f8a0792eedecb2b95802ede88dc3f330e /intern/cycles/kernel/kernel_compat_cuda.h | |
parent | 49ad4215baf16d850d0e367f003ab688e4a3d08e (diff) |
Fix Cycles CUDA performance on CUDA 8.0.
Mostly this is making inlining match CUDA 7.5 in a few performance critical
places. The end result is that performance is now better than before, possibly
due to less register spilling or other CUDA 8.0 compiler improvements.
On benchmarks scenes, there are 3% to 35% render time reductions. Stack memory
usage is reduced a little too.
Reviewed By: sergey
Differential Revision: https://developer.blender.org/D2269
Diffstat (limited to 'intern/cycles/kernel/kernel_compat_cuda.h')
-rw-r--r-- | intern/cycles/kernel/kernel_compat_cuda.h | 1 |
1 files changed, 1 insertions, 0 deletions
diff --git a/intern/cycles/kernel/kernel_compat_cuda.h b/intern/cycles/kernel/kernel_compat_cuda.h index 9a96cb9f438..e0c7b17c6a0 100644 --- a/intern/cycles/kernel/kernel_compat_cuda.h +++ b/intern/cycles/kernel/kernel_compat_cuda.h @@ -37,6 +37,7 @@ /* Qualifier wrappers for different names on different devices */ #define ccl_device __device__ __inline__ +# define ccl_device_forceinline __device__ __forceinline__ #if (__KERNEL_CUDA_VERSION__ == 80) && (__CUDA_ARCH__ < 500) # define ccl_device_inline __device__ __forceinline__ #else |