This replace the previous square rings approach by sampling a disk the
footprint of the search area. This avoids sampling in areas in corners
where there isn't any weight.
This results in much less samples needed to acheive a good enough result.
The max number of samples for an area of 11x11 px is hard coded to 16 and
still gives good results with the final clamp.
The number of samples is adaptative and is scaled by the search area (max
CoC).
The High Quality Slight Defocus is not required anymore. If there is a
quality parameter to add, it would be sample count option. But I consider
the temporal stability enough for viewport work and render can still
render many full scene samples. So I don't see a need for that yet.
This adds anti-flicker pass to the slight focus region by using the
temporaly stable output from stabilize pass.
This also fixes the bilateral weight factor which was reversed.
This is a port of the previous implementation but using compute
shaders instead of using the raster pipeline for every steps.
Only the scatter passes is kept as a raster pass for obvious performance
reasons.
Many steps have been rewritten to take advantage of LDS which allows faster
and simpler downsampling and filtering for some passes.
A new stabilize phase has been separated from another setup pass in order
to improve it in the future with better stabilization.
The scatter pass shaders and pipeline also changed. We now use indirect
drawcall to draw quads using triangle strips primitives. This reduces
fragment shader invocation count & overdraw compared to a bounding
triangle. This also reduces the amount of vertex shader invocation
drastically to the bare minimum instead of having always 3 verts per
4 pixels (for each ground).