Realtime Compositor: Implement parallel reduction

This patch implements generic parallel reduction for the realtime compositor and implements the Levels operation as an example. This patch also introduces the notion of a "Compositor Algorithm", which is a reusable operation that can be used to construct other operations. Differential Revision: https://developer.blender.org/D16184 Reviewed By: Clement Foucault
author: Omar Emara <mail@OmarEmara.dev> 2022-10-11 14:22:52 +0300
committer: Omar Emara <mail@OmarEmara.dev> 2022-10-11 14:22:52 +0300
commit: 0037411f55ec3da4cfad79575d5531869ae5dc38 (patch)
tree: 92070a51c4ee039729fe2192649602b76b3af244 /source/blender/gpu
parent: f6a69920313255277ad1c21fcb813ee6ac774db7 (diff)
3 files changed, 176 insertions, 0 deletions
diff --git a/source/blender/gpu/CMakeLists.txt b/source/blender/gpu/CMakeLists.txt
index e2285a3fd3e..34d53eaa230 100644
--- a/source/blender/gpu/CMakeLists.txt
+++ b/source/blender/gpu/CMakeLists.txt
@@ -346,6 +346,7 @@ set(GLSL_SRC
   shaders/compositor/compositor_morphological_distance_feather.glsl
   shaders/compositor/compositor_morphological_distance_threshold.glsl
   shaders/compositor/compositor_morphological_step.glsl
+  shaders/compositor/compositor_parallel_reduction.glsl
   shaders/compositor/compositor_projector_lens_distortion.glsl
   shaders/compositor/compositor_realize_on_domain.glsl
   shaders/compositor/compositor_screen_lens_distortion.glsl
@@ -626,6 +627,7 @@ set(SRC_SHADER_CREATE_INFOS
   shaders/compositor/infos/compositor_morphological_distance_info.hh
   shaders/compositor/infos/compositor_morphological_distance_threshold_info.hh
   shaders/compositor/infos/compositor_morphological_step_info.hh
+  shaders/compositor/infos/compositor_parallel_reduction_info.hh
   shaders/compositor/infos/compositor_projector_lens_distortion_info.hh
   shaders/compositor/infos/compositor_realize_on_domain_info.hh
   shaders/compositor/infos/compositor_screen_lens_distortion_info.hh
diff --git a/source/blender/gpu/shaders/compositor/compositor_parallel_reduction.glsl b/source/blender/gpu/shaders/compositor/compositor_parallel_reduction.glsl
new file mode 100644
index 00000000000..f6f84aa24c1
--- /dev/null
+++ b/source/blender/gpu/shaders/compositor/compositor_parallel_reduction.glsl
@@ -0,0 +1,98 @@
+#pragma BLENDER_REQUIRE(gpu_shader_compositor_texture_utilities.glsl)
+
+/* This shader reduces the given texture into a smaller texture of a size equal to the number of
+ * work groups. In particular, each work group reduces its contents into a single value and writes
+ * that value to a single pixel in the output image. The shader can be dispatched multiple times to
+ * eventually reduce the image into a single pixel.
+ *
+ * The shader works by loading the whole data of each work group into a linear array, then it
+ * reduces the second half of the array onto the first half of the array, then it reduces the
+ * second quarter of the array onto the first quarter or the array, and so on until only one
+ * element remains. The following figure illustrates the process for sum reduction on 8 elements.
+ *
+ *     .---. .---. .---. .---. .---. .---. .---. .---.
+ *     | 0 | | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 7 |  Original data.
+ *     '---' '---' '---' '---' '---' '---' '---' '---'
+ *       |.____|_____|_____|_____|     |     |     |
+ *       ||    |.____|_____|___________|     |     |
+ *       ||    ||    |.____|_________________|     |
+ *       ||    ||    ||    |.______________________|  <--First reduction. Stride = 4.
+ *       ||    ||    ||    ||
+ *     .---. .---. .---. .----.
+ *     | 4 | | 6 | | 8 | | 10 |                       <--Data after first reduction.
+ *     '---' '---' '---' '----'
+ *       |.____|_____|     |
+ *       ||    |.__________|                          <--Second reduction. Stride = 2.
+ *       ||    ||
+ *     .----. .----.
+ *     | 12 | | 16 |                                  <--Data after second reduction.
+ *     '----' '----'
+ *       |.____|
+ *       ||                                           <--Third reduction. Stride = 1.
+ *     .----.
+ *     | 28 |
+ *     '----'                                         <--Data after third reduction.
+ *
+ *
+ * The shader is generic enough to implement many types of reductions. This is done by using macros
+ * that the developer should define to implement a certain reduction operation. Those include,
+ * TYPE, IDENTITY, INITIALIZE, LOAD, and REDUCE. See the implementation below for more information
+ * as well as the compositor_parallel_reduction_info.hh for example reductions operations. */
+
+/* Doing the reduction in shared memory is faster, so create a shared array where the whole data
+ * of the work group will be loaded and reduced. The 2D structure of the work group is irrelevant
+ * for reduction, so we just load the data in a 1D array to simplify reduction. The developer is
+ * expected to define the TYPE macro to be a float or a vec4, depending on the type of data being
+ * reduced. */
+const uint reduction_size = gl_WorkGroupSize.x * gl_WorkGroupSize.y;
+shared TYPE reduction_data[reduction_size];
+
+void main()
+{
+  /* Load the data from the texture, while returning IDENTITY for out of bound coordinates. The
+   * developer is expected to define the IDENTITY macro to be a vec4 that does not affect the
+   * output of the reduction. For instance, sum reductions have an identity of vec4(0.0), while
+   * max value reductions have an identity of vec4(FLT_MIN). */
+  vec4 value = texture_load(input_tx, ivec2(gl_GlobalInvocationID.xy), IDENTITY);
+
+  /* Initialize the shared array given the previously loaded value. This step can be different
+   * depending on whether this is the initial reduction pass or a latter one. Indeed, the input
+   * texture for the initial reduction is the source texture itself, while the input texture to a
+   * latter reduction pass is an intermediate texture after one or more reductions have happened.
+   * This is significant because the data being reduced might be computed from the original data
+   * and different from it, for instance, when summing the luminance of an image, the original data
+   * is a vec4 color, while the reduced data is a float luminance value. So for the initial
+   * reduction pass, the luminance will be computed from the color, reduced, then stored into an
+   * intermediate float texture. On the other hand, for latter reduction passes, the luminance will
+   * be loaded directly and reduced without extra processing. So the developer is expected to
+   * define the INITIALIZE and LOAD macros to be expressions that derive the needed value from the
+   * loaded value for the initial reduction pass and latter ones respectively. */
+  reduction_data[gl_LocalInvocationIndex] = is_initial_reduction ? INITIALIZE(value) : LOAD(value);
+
+  /* Reduce the reduction data by half on every iteration until only one element remains. See the
+   * above figure for an intuitive understanding of the stride value. */
+  for (uint stride = reduction_size / 2; stride > 0; stride /= 2) {
+    barrier();
+
+    /* Only the threads up to the current stride should be active as can be seen in the diagram
+     * above. */
+    if (gl_LocalInvocationIndex >= stride) {
+      continue;
+    }
+
+    /* Reduce each two elements that are stride apart, writing the result to the element with the
+     * lower index, as can be seen in the diagram above. The developer is expected to define the
+     * REDUCE macro to be a commutative and associative binary operator suitable for parallel
+     * reduction. */
+    reduction_data[gl_LocalInvocationIndex] = REDUCE(
+        reduction_data[gl_LocalInvocationIndex], reduction_data[gl_LocalInvocationIndex + stride]);
+  }
+
+  /* Finally, the result of the reduction is available as the first element in the reduction data,
+   * write it to the pixel corresponding to the work group, making sure only the one thread writes
+   * it. */
+  barrier();
+  if (gl_LocalInvocationIndex == 0) {
+    imageStore(output_img, ivec2(gl_WorkGroupID.xy), vec4(reduction_data[0]));
+  }
+}
diff --git a/source/blender/gpu/shaders/compositor/infos/compositor_parallel_reduction_info.hh b/source/blender/gpu/shaders/compositor/infos/compositor_parallel_reduction_info.hh
new file mode 100644
index 00000000000..2e661f280af
--- /dev/null
+++ b/source/blender/gpu/shaders/compositor/infos/compositor_parallel_reduction_info.hh
@@ -0,0 +1,76 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include "gpu_shader_create_info.hh"
+
+GPU_SHADER_CREATE_INFO(compositor_parallel_reduction_shared)
+    .local_group_size(16, 16)
+    .push_constant(Type::BOOL, "is_initial_reduction")
+    .sampler(0, ImageType::FLOAT_2D, "input_tx")
+    .compute_source("compositor_parallel_reduction.glsl");
+
+/* --------------------------------------------------------------------
+ * Sum Reductions.
+ */
+
+GPU_SHADER_CREATE_INFO(compositor_sum_float_shared)
+    .additional_info("compositor_parallel_reduction_shared")
+    .image(0, GPU_R32F, Qualifier::WRITE, ImageType::FLOAT_2D, "output_img")
+    .define("TYPE", "float")
+    .define("IDENTITY", "vec4(0.0)")
+    .define("LOAD(value)", "value.x")
+    .define("REDUCE(lhs, rhs)", "lhs + rhs");
+
+GPU_SHADER_CREATE_INFO(compositor_sum_red)
+    .additional_info("compositor_sum_float_shared")
+    .define("INITIALIZE(value)", "value.r")
+    .do_static_compilation(true);
+
+GPU_SHADER_CREATE_INFO(compositor_sum_green)
+    .additional_info("compositor_sum_float_shared")
+    .define("INITIALIZE(value)", "value.g")
+    .do_static_compilation(true);
+
+GPU_SHADER_CREATE_INFO(compositor_sum_blue)
+    .additional_info("compositor_sum_float_shared")
+    .define("INITIALIZE(value)", "value.b")
+    .do_static_compilation(true);
+
+GPU_SHADER_CREATE_INFO(compositor_sum_luminance)
+    .additional_info("compositor_sum_float_shared")
+    .push_constant(Type::VEC3, "luminance_coefficients")
+    .define("INITIALIZE(value)", "dot(value.rgb, luminance_coefficients)")
+    .do_static_compilation(true);
+
+/* --------------------------------------------------------------------
+ * Sum Of Squared Difference Reductions.
+ */
+
+GPU_SHADER_CREATE_INFO(compositor_sum_squared_difference_float_shared)
+    .additional_info("compositor_parallel_reduction_shared")
+    .image(0, GPU_R32F, Qualifier::WRITE, ImageType::FLOAT_2D, "output_img")
+    .push_constant(Type::FLOAT, "subtrahend")
+    .define("TYPE", "float")
+    .define("IDENTITY", "vec4(subtrahend)")
+    .define("LOAD(value)", "value.x")
+    .define("REDUCE(lhs, rhs)", "lhs + rhs");
+
+GPU_SHADER_CREATE_INFO(compositor_sum_red_squared_difference)
+    .additional_info("compositor_sum_squared_difference_float_shared")
+    .define("INITIALIZE(value)", "pow(value.r - subtrahend, 2.0)")
+    .do_static_compilation(true);
+
+GPU_SHADER_CREATE_INFO(compositor_sum_green_squared_difference)
+    .additional_info("compositor_sum_squared_difference_float_shared")
+    .define("INITIALIZE(value)", "pow(value.g - subtrahend, 2.0)")
+    .do_static_compilation(true);
+
+GPU_SHADER_CREATE_INFO(compositor_sum_blue_squared_difference)
+    .additional_info("compositor_sum_squared_difference_float_shared")
+    .define("INITIALIZE(value)", "pow(value.b - subtrahend, 2.0)")
+    .do_static_compilation(true);
+
+GPU_SHADER_CREATE_INFO(compositor_sum_luminance_squared_difference)
+    .additional_info("compositor_sum_squared_difference_float_shared")
+    .push_constant(Type::VEC3, "luminance_coefficients")
+    .define("INITIALIZE(value)", "pow(dot(value.rgb, luminance_coefficients) - subtrahend, 2.0)")
+    .do_static_compilation(true);
author	Omar Emara <mail@OmarEmara.dev>	2022-10-11 14:22:52 +0300
committer	Omar Emara <mail@OmarEmara.dev>	2022-10-11 14:22:52 +0300
commit	0037411f55ec3da4cfad79575d5531869ae5dc38 (patch)
tree	92070a51c4ee039729fe2192649602b76b3af244 /source/blender/gpu
parent	f6a69920313255277ad1c21fcb813ee6ac774db7 (diff)