Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.blender.org/blender.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorBrecht Van Lommel <brechtvanlommel@gmail.com>2016-10-02 15:48:39 +0300
committerBrecht Van Lommel <brechtvanlommel@gmail.com>2016-10-03 23:15:25 +0300
commita3abb020e37a072eb71fd30de9ab125d1c16623a (patch)
treeb525be7f8a0792eedecb2b95802ede88dc3f330e /intern/cycles/util
parent49ad4215baf16d850d0e367f003ab688e4a3d08e (diff)
Fix Cycles CUDA performance on CUDA 8.0.
Mostly this is making inlining match CUDA 7.5 in a few performance critical places. The end result is that performance is now better than before, possibly due to less register spilling or other CUDA 8.0 compiler improvements. On benchmarks scenes, there are 3% to 35% render time reductions. Stack memory usage is reduced a little too. Reviewed By: sergey Differential Revision: https://developer.blender.org/D2269
Diffstat (limited to 'intern/cycles/util')
-rw-r--r--intern/cycles/util/util_types.h2
1 files changed, 2 insertions, 0 deletions
diff --git a/intern/cycles/util/util_types.h b/intern/cycles/util/util_types.h
index 257c6ad7491..6af65f88a02 100644
--- a/intern/cycles/util/util_types.h
+++ b/intern/cycles/util/util_types.h
@@ -42,6 +42,7 @@
#if defined(_WIN32) && !defined(FREE_WINDOWS)
#define ccl_device_inline static __forceinline
+#define ccl_device_forceinline static __forceinline
#define ccl_align(...) __declspec(align(__VA_ARGS__))
#ifdef __KERNEL_64_BIT__
#define ccl_try_align(...) __declspec(align(__VA_ARGS__))
@@ -56,6 +57,7 @@
#else
#define ccl_device_inline static inline __attribute__((always_inline))
+#define ccl_device_forceinline static inline __attribute__((always_inline))
#define ccl_align(...) __attribute__((aligned(__VA_ARGS__)))
#ifndef FREE_WINDOWS64
#define __forceinline inline __attribute__((always_inline))