Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.blender.org/blender.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorAlaska <Alaska>2021-11-25 11:20:28 +0300
committerWilliam Leeson <william@blender.org>2021-11-25 11:32:26 +0300
commitb41c72b710d4013fd6d67dc49a8ebb2a416b4462 (patch)
tree9e8097c772e69325d072acceb13a67a8b9c38f7d /intern/cycles/kernel
parent8f2db94627d50df0d8c40b3b8f17db3e429bbb8d (diff)
Fix performance decrease with Scrambling Distance on
With the current code in master, scrambling distance is enabled on non-hardware accelerated ray tracing devices see a measurable performance decrease when compared scrambling distance on vs off. From testing, this performance decrease comes from the large tile sizes scheduled in `tile.cpp`. This patch attempts to address the performance decrease by using different algorithms to calculate the tile size for devices with hardware accelerated ray traversal and devices without. Large tile sizes for hardware accelerated devices and small tile sizes for others. Most of this code is based on proposals from @brecht and @leesonw Reviewed By: brecht, leesonw Differential Revision: https://developer.blender.org/D13042
Diffstat (limited to 'intern/cycles/kernel')
-rw-r--r--intern/cycles/kernel/device/gpu/work_stealing.h25
1 files changed, 14 insertions, 11 deletions
diff --git a/intern/cycles/kernel/device/gpu/work_stealing.h b/intern/cycles/kernel/device/gpu/work_stealing.h
index fab0915c38e..c3083948057 100644
--- a/intern/cycles/kernel/device/gpu/work_stealing.h
+++ b/intern/cycles/kernel/device/gpu/work_stealing.h
@@ -29,17 +29,20 @@ ccl_device_inline void get_work_pixel(ccl_global const KernelWorkTile *tile,
ccl_private uint *y,
ccl_private uint *sample)
{
-#if 0
- /* Keep threads for the same sample together. */
- uint tile_pixels = tile->w * tile->h;
- uint sample_offset = global_work_index / tile_pixels;
- uint pixel_offset = global_work_index - sample_offset * tile_pixels;
-#else
- /* Keeping threads for the same pixel together.
- * Appears to improve performance by a few % on CUDA and OptiX. */
- uint sample_offset = global_work_index % tile->num_samples;
- uint pixel_offset = global_work_index / tile->num_samples;
-#endif
+ uint sample_offset, pixel_offset;
+
+ if (kernel_data.integrator.scrambling_distance < 0.9f) {
+ /* Keep threads for the same sample together. */
+ uint tile_pixels = tile->w * tile->h;
+ sample_offset = global_work_index / tile_pixels;
+ pixel_offset = global_work_index - sample_offset * tile_pixels;
+ }
+ else {
+ /* Keeping threads for the same pixel together.
+ * Appears to improve performance by a few % on CUDA and OptiX. */
+ sample_offset = global_work_index % tile->num_samples;
+ pixel_offset = global_work_index / tile->num_samples;
+ }
uint y_offset = pixel_offset / tile->w;
uint x_offset = pixel_offset - y_offset * tile->w;