Fix performance decrease with Scrambling Distance on

With the current code in master, scrambling distance is enabled on non-hardware accelerated ray tracing devices see a measurable performance decrease when compared scrambling distance on vs off. From testing, this performance decrease comes from the large tile sizes scheduled in `tile.cpp`. This patch attempts to address the performance decrease by using different algorithms to calculate the tile size for devices with hardware accelerated ray traversal and devices without. Large tile sizes for hardware accelerated devices and small tile sizes for others. Most of this code is based on proposals from @brecht and @leesonw Reviewed By: brecht, leesonw Differential Revision: https://developer.blender.org/D13042
author: Alaska <Alaska> 2021-11-25 11:20:28 +0300
committer: William Leeson <william@blender.org> 2021-11-25 11:32:26 +0300
commit: b41c72b710d4013fd6d67dc49a8ebb2a416b4462 (patch)
tree: 9e8097c772e69325d072acceb13a67a8b9c38f7d /intern/cycles/kernel
parent: 8f2db94627d50df0d8c40b3b8f17db3e429bbb8d (diff)
1 files changed, 14 insertions, 11 deletions
diff --git a/intern/cycles/kernel/device/gpu/work_stealing.h b/intern/cycles/kernel/device/gpu/work_stealing.h
index fab0915c38e..c3083948057 100644
--- a/intern/cycles/kernel/device/gpu/work_stealing.h
+++ b/intern/cycles/kernel/device/gpu/work_stealing.h
@@ -29,17 +29,20 @@ ccl_device_inline void get_work_pixel(ccl_global const KernelWorkTile *tile,
                                       ccl_private uint *y,
                                       ccl_private uint *sample)
 {
-#if 0
-  /* Keep threads for the same sample together. */
-  uint tile_pixels = tile->w * tile->h;
-  uint sample_offset = global_work_index / tile_pixels;
-  uint pixel_offset = global_work_index - sample_offset * tile_pixels;
-#else
-  /* Keeping threads for the same pixel together.
-   * Appears to improve performance by a few % on CUDA and OptiX. */
-  uint sample_offset = global_work_index % tile->num_samples;
-  uint pixel_offset = global_work_index / tile->num_samples;
-#endif
+  uint sample_offset, pixel_offset;
+
+  if (kernel_data.integrator.scrambling_distance < 0.9f) {
+    /* Keep threads for the same sample together. */
+    uint tile_pixels = tile->w * tile->h;
+    sample_offset = global_work_index / tile_pixels;
+    pixel_offset = global_work_index - sample_offset * tile_pixels;
+  }
+  else {
+    /* Keeping threads for the same pixel together.
+     * Appears to improve performance by a few % on CUDA and OptiX. */
+    sample_offset = global_work_index % tile->num_samples;
+    pixel_offset = global_work_index / tile->num_samples;
+  }
 
   uint y_offset = pixel_offset / tile->w;
   uint x_offset = pixel_offset - y_offset * tile->w;
author	Alaska <Alaska>	2021-11-25 11:20:28 +0300
committer	William Leeson <william@blender.org>	2021-11-25 11:32:26 +0300
commit	b41c72b710d4013fd6d67dc49a8ebb2a416b4462 (patch)
tree	9e8097c772e69325d072acceb13a67a8b9c38f7d /intern/cycles/kernel
parent	8f2db94627d50df0d8c40b3b8f17db3e429bbb8d (diff)