Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.blender.org/blender.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMichael Jones <michael_p_jones@apple.com>2022-07-13 22:56:57 +0300
committerMichael Jones <michael_p_jones@apple.com>2022-07-14 16:26:18 +0300
commit4b1d315017ef103f3034160d349b3c3c21a4cd6a (patch)
tree779dd8c27d37e710d3014911e962027b56049084 /intern/cycles/device/metal/util.mm
parent47d4ce498e3f5a11a0210b1efd57053f0b1c85bd (diff)
Cycles: Improve cache usage on Apple GPUs by chunking active indices
This patch partitions the active indices into chunks prior to sorting by material in order to tradeoff some material coherence for better locality. On Apple Silicon GPUs (particularly higher end M1-family GPUs), we observe overall render time speedups of up to 15%. The partitioning is implemented by repeating the range of `shader_sort_key` for each partition, and encoding a "locator" key which distributes the indices into sorted chunks. Reviewed By: brecht Differential Revision: https://developer.blender.org/D15331
Diffstat (limited to 'intern/cycles/device/metal/util.mm')
-rw-r--r--intern/cycles/device/metal/util.mm18
1 files changed, 18 insertions, 0 deletions
diff --git a/intern/cycles/device/metal/util.mm b/intern/cycles/device/metal/util.mm
index a7a5b596b8f..c336dc310c8 100644
--- a/intern/cycles/device/metal/util.mm
+++ b/intern/cycles/device/metal/util.mm
@@ -72,6 +72,24 @@ MetalGPUVendor MetalInfo::get_device_vendor(id<MTLDevice> device)
return METAL_GPU_UNKNOWN;
}
+int MetalInfo::optimal_sort_partition_elements(id<MTLDevice> device)
+{
+ if (auto str = getenv("CYCLES_METAL_SORT_PARTITION_ELEMENTS")) {
+ return atoi(str);
+ }
+
+ /* On M1 and M2 GPUs, we see better cache utilization if we partition the active indices before
+ * sorting each partition by material. Partitioning into chunks of 65536 elements results in an
+ * overall render time speedup of up to 15%. */
+ if (get_device_vendor(device) == METAL_GPU_APPLE) {
+ AppleGPUArchitecture arch = get_apple_gpu_architecture(device);
+ if (arch == APPLE_M1 || arch == APPLE_M2) {
+ return 65536;
+ }
+ }
+ return 0;
+}
+
vector<id<MTLDevice>> const &MetalInfo::get_usable_devices()
{
static vector<id<MTLDevice>> usable_devices;