Fix Cycles CUDA performance on CUDA 8.0.

Mostly this is making inlining match CUDA 7.5 in a few performance critical places. The end result is that performance is now better than before, possibly due to less register spilling or other CUDA 8.0 compiler improvements. On benchmarks scenes, there are 3% to 35% render time reductions. Stack memory usage is reduced a little too. Reviewed By: sergey Differential Revision: https://developer.blender.org/D2269
author: Brecht Van Lommel <brechtvanlommel@gmail.com> 2016-10-02 15:48:39 +0300
committer: Brecht Van Lommel <brechtvanlommel@gmail.com> 2016-10-03 23:15:25 +0300
commit: a3abb020e37a072eb71fd30de9ab125d1c16623a (patch)
tree: b525be7f8a0792eedecb2b95802ede88dc3f330e /intern/cycles/kernel/bvh/bvh_traversal.h
parent: 49ad4215baf16d850d0e367f003ab688e4a3d08e (diff)
1 files changed, 8 insertions, 13 deletions
diff --git a/intern/cycles/kernel/bvh/bvh_traversal.h b/intern/cycles/kernel/bvh/bvh_traversal.h
index b1a52968a26..a0e478e972b 100644
--- a/intern/cycles/kernel/bvh/bvh_traversal.h
+++ b/intern/cycles/kernel/bvh/bvh_traversal.h
@@ -40,21 +40,16 @@
  *
  */
 
-#ifndef __KERNEL_GPU__
-ccl_device
-#else
-ccl_device_inline
-#endif
-bool BVH_FUNCTION_FULL_NAME(BVH)(KernelGlobals *kg,
-                                 const Ray *ray,
-                                 Intersection *isect,
-                                 const uint visibility
+ccl_device_noinline bool BVH_FUNCTION_FULL_NAME(BVH)(KernelGlobals *kg,
+                                                     const Ray *ray,
+                                                     Intersection *isect,
+                                                     const uint visibility
 #if BVH_FEATURE(BVH_HAIR_MINIMUM_WIDTH)
-                                 , uint *lcg_state,
-                                 float difl,
-                                 float extmax
+                                                     , uint *lcg_state,
+                                                     float difl,
+                                                     float extmax
 #endif
-                                 )
+                                                     )
 {
 	/* todo:
 	 * - test if pushing distance on the stack helps (for non shadow rays)
author	Brecht Van Lommel <brechtvanlommel@gmail.com>	2016-10-02 15:48:39 +0300
committer	Brecht Van Lommel <brechtvanlommel@gmail.com>	2016-10-03 23:15:25 +0300
commit	a3abb020e37a072eb71fd30de9ab125d1c16623a (patch)
tree	b525be7f8a0792eedecb2b95802ede88dc3f330e /intern/cycles/kernel/bvh/bvh_traversal.h
parent	49ad4215baf16d850d0e367f003ab688e4a3d08e (diff)