Cycles: refactor to move part of KernelData definition to template headerarcpatch-D14645

To be used for specialization on Metal in a following commit, turning these members into compile time constants. Cycles: keep track of SVM nodes used in kernels To be used for specialization in Metal, to automatically leave out unused nodes from the kernel. Cycles: Apple Silicon optimizations (~20% uplift on M1 Max) M1 Max samples/min over 30 seconds (macOS 13.0): ``` PSO_GENERIC PSO_SPECIALIZED_INTERSECT PSO_SPECIALIZED_SHADE barbershop_interior 83.4 89.5 93.7 bmw27 1486.1 1671.0 1825.8 classroom 175.2 196.8 206.3 fishy_cat 674.2 704.3 719.3 junkshop 205.4 212.0 257.7 koro 310.1 336.1 342.8 monster 376.7 418.6 424.1 pabellon 273.5 325.4 339.8 sponza 830.6 929.6 1142.4 victor 86.7 96.4 96.3 wdas_cloud 111.8 112.7 183.1 ``` Next steps: [ ] ~~Include SHADER_EVAL kernels in the "must cache" list~~ //(limited benefit to specializing one off shade steps)// [ ] Adapt / merge with dynamic kernel compilation caching patch (D14754) [x] Separate specialization of intersection (fast building) and shading (slow building) kernels [x] Rate-limiting and invalidation of kernel compilation requests [ ] UI for enabling / disabling background compilation --- With this patch, the Metal backend compiles & caches a second set of kernels which are optimized for scene content, enabled for Apple Silicon. The optimized kernels result in faster render times, but are slower to compile. They are compiled in the background and swapped in when ready. The optimizations are: - ~~Aggressive inlining. This is not scene-specific, but hasn't been enabled for the generic kernels because it inflates compile time quite a lot. It results in better register usage, reducing the spill that we're seeing in some kernels. Possible adjustments: 1) take the compile hit for generic kernels since they're only compiled once (and it helps in general), or 2) add a _second_ set of generic_kernels with aggressive inlining enabled.~~ //(enabled by D14923)// - ~8% uplift in isolation for 3 benchmarking scenes - Substitution of KernelData constants. Select members of KernelData struct are replaced with macros that are #defined at the top of source. Only constants pertaining to the rendering algorithm is specialized, rather than constants which might affect artistic look. - ~13% uplift in isolation for 3 benchmarking scenes - Removal of unused SVM nodes in `svm_eval_nodes`. In combination with the other optimizations, this results in a further drop in register usage by eliminating dead code that can't be identified by static analysis. Code contributed by Jason Fielder, Morteza Mostajabodaveh and Michael Jones Differential Revision: https://developer.blender.org/D15456
author: Michael Jones <michael_p_jones@apple.com> 2022-07-14 19:40:21 +0300
committer: Michael Jones <michael_p_jones@apple.com> 2022-07-14 19:47:55 +0300
commit: fd19555be3d78575375aa990de60f1ad375e1f06 (patch)
tree: 191e02d1bddcdd6c90b1a42d475dd9fd928e4e34 /intern/cycles/scene/scene.cpp
parent: 0b53b43f19f66f8a841157b713273ff223d2a5a9 (diff)
1 files changed, 2 insertions, 0 deletions
diff --git a/intern/cycles/scene/scene.cpp b/intern/cycles/scene/scene.cpp
index 1fcc3331337..5a33f4a6cd1 100644
--- a/intern/cycles/scene/scene.cpp
+++ b/intern/cycles/scene/scene.cpp
@@ -369,6 +369,8 @@ void Scene::device_update(Device *device_, Progress &progress)
     device->const_copy_to("data", &dscene.data, sizeof(dscene.data));
   }
 
+  device->optimize_for_scene(this);
+
   if (print_stats) {
     size_t mem_used = util_guarded_get_mem_used();
     size_t mem_peak = util_guarded_get_mem_peak();
author	Michael Jones <michael_p_jones@apple.com>	2022-07-14 19:40:21 +0300
committer	Michael Jones <michael_p_jones@apple.com>	2022-07-14 19:47:55 +0300
commit	fd19555be3d78575375aa990de60f1ad375e1f06 (patch)
tree	191e02d1bddcdd6c90b1a42d475dd9fd928e4e34 /intern/cycles/scene/scene.cpp
parent	0b53b43f19f66f8a841157b713273ff223d2a5a9 (diff)