diff options
author | Howard Trickey <howard.trickey@gmail.com> | 2022-10-24 20:33:11 +0300 |
---|---|---|
committer | Howard Trickey <howard.trickey@gmail.com> | 2022-10-24 20:33:11 +0300 |
commit | a41a1bfc494e4015406549e137114ef5a450aaf0 (patch) | |
tree | dbdc95584f91aded4b777bac30074f9f78d8c89c /source/blender/gpu/shaders/compositor/compositor_parallel_reduction.glsl | |
parent | fc8f9e420426570dcb3e026ecbe8145cd0fae5ca (diff) | |
parent | 53795877727d67185de858a480c8090ca7eb8e36 (diff) |
Merge branch 'master' into bevelv2
Diffstat (limited to 'source/blender/gpu/shaders/compositor/compositor_parallel_reduction.glsl')
-rw-r--r-- | source/blender/gpu/shaders/compositor/compositor_parallel_reduction.glsl | 98 |
1 files changed, 98 insertions, 0 deletions
diff --git a/source/blender/gpu/shaders/compositor/compositor_parallel_reduction.glsl b/source/blender/gpu/shaders/compositor/compositor_parallel_reduction.glsl new file mode 100644 index 00000000000..f6f84aa24c1 --- /dev/null +++ b/source/blender/gpu/shaders/compositor/compositor_parallel_reduction.glsl @@ -0,0 +1,98 @@ +#pragma BLENDER_REQUIRE(gpu_shader_compositor_texture_utilities.glsl) + +/* This shader reduces the given texture into a smaller texture of a size equal to the number of + * work groups. In particular, each work group reduces its contents into a single value and writes + * that value to a single pixel in the output image. The shader can be dispatched multiple times to + * eventually reduce the image into a single pixel. + * + * The shader works by loading the whole data of each work group into a linear array, then it + * reduces the second half of the array onto the first half of the array, then it reduces the + * second quarter of the array onto the first quarter or the array, and so on until only one + * element remains. The following figure illustrates the process for sum reduction on 8 elements. + * + * .---. .---. .---. .---. .---. .---. .---. .---. + * | 0 | | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 7 | Original data. + * '---' '---' '---' '---' '---' '---' '---' '---' + * |.____|_____|_____|_____| | | | + * || |.____|_____|___________| | | + * || || |.____|_________________| | + * || || || |.______________________| <--First reduction. Stride = 4. + * || || || || + * .---. .---. .---. .----. + * | 4 | | 6 | | 8 | | 10 | <--Data after first reduction. + * '---' '---' '---' '----' + * |.____|_____| | + * || |.__________| <--Second reduction. Stride = 2. + * || || + * .----. .----. + * | 12 | | 16 | <--Data after second reduction. + * '----' '----' + * |.____| + * || <--Third reduction. Stride = 1. + * .----. + * | 28 | + * '----' <--Data after third reduction. + * + * + * The shader is generic enough to implement many types of reductions. This is done by using macros + * that the developer should define to implement a certain reduction operation. Those include, + * TYPE, IDENTITY, INITIALIZE, LOAD, and REDUCE. See the implementation below for more information + * as well as the compositor_parallel_reduction_info.hh for example reductions operations. */ + +/* Doing the reduction in shared memory is faster, so create a shared array where the whole data + * of the work group will be loaded and reduced. The 2D structure of the work group is irrelevant + * for reduction, so we just load the data in a 1D array to simplify reduction. The developer is + * expected to define the TYPE macro to be a float or a vec4, depending on the type of data being + * reduced. */ +const uint reduction_size = gl_WorkGroupSize.x * gl_WorkGroupSize.y; +shared TYPE reduction_data[reduction_size]; + +void main() +{ + /* Load the data from the texture, while returning IDENTITY for out of bound coordinates. The + * developer is expected to define the IDENTITY macro to be a vec4 that does not affect the + * output of the reduction. For instance, sum reductions have an identity of vec4(0.0), while + * max value reductions have an identity of vec4(FLT_MIN). */ + vec4 value = texture_load(input_tx, ivec2(gl_GlobalInvocationID.xy), IDENTITY); + + /* Initialize the shared array given the previously loaded value. This step can be different + * depending on whether this is the initial reduction pass or a latter one. Indeed, the input + * texture for the initial reduction is the source texture itself, while the input texture to a + * latter reduction pass is an intermediate texture after one or more reductions have happened. + * This is significant because the data being reduced might be computed from the original data + * and different from it, for instance, when summing the luminance of an image, the original data + * is a vec4 color, while the reduced data is a float luminance value. So for the initial + * reduction pass, the luminance will be computed from the color, reduced, then stored into an + * intermediate float texture. On the other hand, for latter reduction passes, the luminance will + * be loaded directly and reduced without extra processing. So the developer is expected to + * define the INITIALIZE and LOAD macros to be expressions that derive the needed value from the + * loaded value for the initial reduction pass and latter ones respectively. */ + reduction_data[gl_LocalInvocationIndex] = is_initial_reduction ? INITIALIZE(value) : LOAD(value); + + /* Reduce the reduction data by half on every iteration until only one element remains. See the + * above figure for an intuitive understanding of the stride value. */ + for (uint stride = reduction_size / 2; stride > 0; stride /= 2) { + barrier(); + + /* Only the threads up to the current stride should be active as can be seen in the diagram + * above. */ + if (gl_LocalInvocationIndex >= stride) { + continue; + } + + /* Reduce each two elements that are stride apart, writing the result to the element with the + * lower index, as can be seen in the diagram above. The developer is expected to define the + * REDUCE macro to be a commutative and associative binary operator suitable for parallel + * reduction. */ + reduction_data[gl_LocalInvocationIndex] = REDUCE( + reduction_data[gl_LocalInvocationIndex], reduction_data[gl_LocalInvocationIndex + stride]); + } + + /* Finally, the result of the reduction is available as the first element in the reduction data, + * write it to the pixel corresponding to the work group, making sure only the one thread writes + * it. */ + barrier(); + if (gl_LocalInvocationIndex == 0) { + imageStore(output_img, ivec2(gl_WorkGroupID.xy), vec4(reduction_data[0])); + } +} |