Age | Commit message (Collapse) | Author |
|
With the new volume rendering code this was no longer accurate, we always
need to use a new dimension for the next volume segment.
|
|
|
|
For baking, replace transparent BSDF with holdout for baking. This ensure no
objects behind are baked, and that the baked image has alpha.
|
|
MSL requires that constant address space literals be declared at program
scope. This patch moves the `blackbody_table_r/g/b` and `cie_colour_match`
constants into separate files so they can be declared at the appropriate scope.
Ref T92212
Differential Revision: https://developer.blender.org/D13241
|
|
This patch contains many small leftover fixes and additions that are
required for Metal-enablement:
- Address space fixes and a few other small compile fixes
- Addition of missing functionality to the Metal adapter headers
- Addition of various scattered `__KERNEL_METAL__` blocks (e.g. for
atomic support & maths functions)
Ref T92212
Differential Revision: https://developer.blender.org/D13263
|
|
|
|
Build HIP kernels with NanoVDB, and patch NanoVDB to work with HIP.
This is a header only library so no rebuild is needed. The changes are being
submitted upstream to openvdb, so this patch should be temporary.
Thanks Thomas for help testing this.
|
|
|
|
This patch adds a CMake option "WITH_CYCLES_DEBUG" which builds cycles with
a feature that allows debugging/selecting the direct-light sampling strategy.
The same option may later be used to add other debugging features that could
affect performance in release builds.
The three options are:
* Forward path tracing (e.g., via BSDF or phase function)
* Next-event estimation
* Multiple importance sampling combination of the previous two methods
Such a feature is useful for debugging light different sampling, evaluation,
and pdf methods (e.g., for light sources and BSDFs).
Differential Revision: https://developer.blender.org/D13152
|
|
Depends on D13243
Differential Revision: https://developer.blender.org/D13244
|
|
Introduce a packed_float3 type for smaller storage that is exactly 3
floats, instead of 4. For computation float3 is still used since it can
use SIMD instructions.
Ref T92212
Differential Revision: https://developer.blender.org/D13243
|
|
|
|
|
|
This patch adapts the existing volumetric read/write lambda functions for Metal. Lambda expressions are not supported on MSL, so two new macros `VOLUME_READ_LAMBDA` and `VOLUME_WRITE_LAMBDA` have been defined with a default implementation which, on Metal, is overridden to use inline function objects.
This patch also removes the last remaining mention of the now-unused `ccl_addr_space`.
Ref T92212
Reviewed By: leesonw
Maniphest Tasks: T92212
Differential Revision: https://developer.blender.org/D13234
|
|
|
|
|
|
|
|
We need to increase GPU memory usage a bit. Unfortunately we can't get away
with writing either reflection or transmission passes because these BSDFs may
scatter in either direction but still must be in a fixed reflection or
transmission category to match up with the color passes.
|
|
|
|
|
|
The issue was caused by splitting happening twice.
Fixed by checking for split flag which is assigned to the both states
during split.
The tricky part was to write catcher data at the moment of split: the
transparency and shadow catcher sample count is to be accumulated at
that point. Now it is happening in the `intersect_closest` kernel.
The downside is that render buffer is to be passed to the kernel, but
the benefit is that extra split bounce check is not needed now.
Had to move the passes write to shadow catcher header, since include
of `film/passes.h` causes all the fun of requirement to have BSDF
data structures available.
Differential Revision: https://developer.blender.org/D13177
|
|
This patch exposes the sampling offset option to Blender. It is located in the "Sampling > Advanced" panel.
For example, this can be useful to parallelize rendering and distribute different chunks of samples for each computer to render.
---
I also had to add this option to `RenderWork` and `RenderScheduler` classes so that the sample count in the status string can be calculated correctly.
Reviewed By: leesonw
Differential Revision: https://developer.blender.org/D13086
|
|
|
|
It's unclear why this fails. Maybe the size of half4 is not the expected
8 bytes and adjacent pixels are overwritten. Or there is some bug in the
HIP compiler writing a struct into global memory, which we probably don't
do elsewhere in the kernel.
Thanks to Thomas, William and Jeroen for helping investigate this.
|
|
rB3a4c8f406a3a3bf0627477c6183a594fa707a6e2 changed the macros that create the film
convert kernel entry points, but in the process accidentally changed the parameter definition
to one of those (which caused CUDA launch and misaligned address errors) and changed the
implementation as well. This restores the correct implementation from before.
In addition, the `ccl_gpu_kernel_threads` macro did not work as intended and caused the
generated launch bounds to end up with an incorrect input for the second parameter (it was
set to "thread_num_registers", rather than the result of the block number calculation). I'm
not entirely sure why, as the macro definition looked sound to me. Decided to simply go with
two separate macros instead, to simplify and solve this.
Also changed how state is captured with the `ccl_gpu_kernel_lambda` macro slightly, to avoid
a compiler warning (expression has no effect) that otherwise occurred.
Maniphest Tasks: T92985
Differential Revision: https://developer.blender.org/D13175
|
|
This patch adapts the shared kernel entrypoints so that they can be compiled as MSL (Metal Shading Language). Where possible, the adaptations avoid changes in common code.
In MSL, kernel function inputs are explicitly bound to resources. In the case of argument buffers, we declare a struct containing the kernel arguments, accessible via device pointer. This differs from CUDA and HIP where kernel function arguments are declared as traditional C-style function parameters. This patch adapts the entrypoints declared in kernel.h so that they can be translated via a new `ccl_gpu_kernel_signature` macro into the required parameter struct + kernel entrypoint pairing for MSL.
MSL buffer attribution must be applied to function parameters or non-static class data members. To allow universal access to the integrator state, kernel data, and texture fetch adapters, we wrap all of the shared kernel code in a `MetalKernelContext` class. This is achieved by bracketing the appropriate kernel headers with "context_begin.h" and "context_end.h" on Metal. When calling deeper into the kernel code, we must reference the context class (e.g. `context.integrator_init_from_camera`). This extra prefixing is performed by a set of defines in "context_end.h". These will require explicit maintenance if entrypoints change. We invite discussion on more maintainable ways to enforce correctness.
Lambda expressions are not supported on MSL, so a new `ccl_gpu_kernel_lambda` macro generates an inline function object and optionally capturing any required state. This yields the same behaviour. This approach is applied to all parallel_... implementations which are templated by operation. The lambda expressions in the film_convert... kernels don't adapt cleanly to use function objects. However, these entrypoints can be macro-generated more concisely to avoid lambda expressions entirely, instead relying on constant folding to handle the pixel/channel conversions.
A separate implementation of `gpu_parallel_active_index_array` is provided for Metal to workaround some subtle differences in SIMD width, and also to encapsulate some required thread parameters which must be declared as explicit entrypoint function parameters.
Ref T92212
Reviewed By: brecht
Maniphest Tasks: T92212
Differential Revision: https://developer.blender.org/D13109
|
|
Thanks to Ilja Razinkov for identifying the problem and solution.
|
|
is destroyed
Adds a pass before denoising that calculates the intensity of the image, which can be
passed into the OptiX denoiser for more optimal results for very dark or very bright images.
In addition this also fixes a crash that sometimes occurred on exit. The OptiX denoiser object
has to be destroyed before the OptiX device context object (since it references that). But in
C++ the destructor function of a class is called before its fields are destructed, so
"~OptiXDevice" was always called before "OptiXDevice::~Denoiser" and therefore
"optixDeviceContextDestroy" was called before "optixDenoiserDestroy", hence the crash.
Differential Revision: https://developer.blender.org/D13160
|
|
|
|
Adds a bunch of CPU kernel function to process on row of pixels, and use those
instead of calling unoptimized implementations.
Fixes T92598
|
|
Changes:
* After hitting a shadow catcher, re-initialize the volume stack taking
into account shadow catcher ray visibility. This ensures that volume objects
are included in the stack only if they are shadow catchers.
* If there is a volume to be shaded in front of the shadow catcher, the split
is now performed in the shade_volume kernel after volume shading is done.
* Previously the background pass behind a shadow catcher was done as part of
the regular path, now it is done as part of the shadow catcher path.
For a shadow catcher path with volumes and visible background, operations are
done in this order now:
* intersect_closest
* shade_volume
* shadow catcher split
* intersect_volume_stack
* shade_background
* shade_surface
The world volume is currently assumed to be CG, that is it does not exist in
the footage. We may consider adding an option to control this, or change the
default. With a volume object this control is already possible.
This includes refactoring to centralize the logic for next kernel scheduling
in intersect_closest.h.
Differential Revision: https://developer.blender.org/D13093
|
|
|
|
Can't cast to float4 because it might not have correct alignment.
|
|
volume
|
|
We need to store the continuation probability used to make the termination
decision in intersect_closest, instead of recomputing it in shade_surface.
Because otherwise a shade_volume in between can change the throughput and
change the probability.
|
|
We run into float precision issues here, clamp the number of octaves to
one less, which has little to no visual difference. This was empirically
determined to work up to 16 before, but with additional inputs like
roughness only 15 appears to work.
Also adds misisng clamp for the geometry nodes implementation.
|
|
Differential Revision: https://developer.blender.org/D13039
|
|
|
|
|
|
|
|
Adds scrambling distance to the PMJ sampler. This is based
on the work by Mathieu Menuet in D12318 who created the original
implementation for the Sobol sampler.
Reviewed By: brecht
Maniphest Tasks: T92181
Differential Revision: https://developer.blender.org/D12854
|
|
saturate is depricated in favour of __saturatef this replaces saturate
with __saturatef on CUDA by createing a saturatef function which replaces
all instances of saturate and are hooked up to the correct function on all
platforms.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D13010
|
|
|
|
|
|
Cycles:Distance Scrambling for Cycles Sobol Sampler
This option implements micro jittering an is based on the INRIA
research paper [[ https://hal.inria.fr/hal-01325702/document | on micro jittering ]]
and work by Lukas Stockner for implementing the scrambling distance.
It works by controlling the correlation between pixels by either using
a user supplied value or an adaptive algorithm to limit the maximum
deviation of the sample values between pixels.
This is a follow up of https://developer.blender.org/D12316
The PMJ version can be found here: https://developer.blender.org/D12511
Reviewed By: leesonw
Differential Revision: https://developer.blender.org/D12318
|
|
Remove prefix of filenames that is the same as the folder name. This used
to help when #includes were using individual files, but now they are always
relative to the cycles root directory and so the prefixes are redundant.
For patches and branches, git merge and rebase should be able to detect the
renames and move over code to the right file.
|
|
* Split render/ into scene/ and session/. The scene/ folder now contains the
scene and its nodes. The session/ folder contains the render session and
associated data structures like drivers and render buffers.
* Move top level kernel headers into new folders kernel/camera/, kernel/film/,
kernel/light/, kernel/sample/, kernel/util/
* Move integrator related kernel headers into kernel/integrator/
* Move OSL shaders from kernel/shaders/ to kernel/osl/shaders/
For patches and branches, git merge and rebase should be able to detect the
renames and move over code to the right file.
|
|
Add a Fast GI Method, either Replace for the existing behavior, or Add
to add ambient occlusion like the old world settings.
This replaces the old Ambient Occlusion settings in the world properties.
|
|
This is still useful in some cases even if not used by OpenImageDenoise. In
the future this may be replaced with a more generic system to control render
passes and filtering, but for now this just does what it did before.
|
|
Similar to the Depth, for compositing the interpolated values between a far
and near object can be non-sensical.
|