Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
We need to increase GPU memory usage a bit. Unfortunately we can't get away
with writing either reflection or transmission passes because these BSDFs may
scatter in either direction but still must be in a fixed reflection or
transmission category to match up with the color passes.
|
|
|
|
|
|
Partially reverts commit rB440a3475b8f5410e5c41bfbed5ce82771b41356f because
"optixDenoiserComputeIntensity" does not currently support input images that are not packed (the
"pixelStrideInBytes" field is not zero). As a result the intensity calculation would take into account
data from other passes in the image, some of which was scaled by the number of samples still and
therefore produce widely incorrect results that then caused artifacts in the denoised image.
Maniphest Tasks: T93029
|
|
|
|
|
|
|
|
The root of the problem lies in bug in OIIO which we can work around
from our side (which does not affect pack memory usage).
Thanks Brecht for finding the root cause!
Differential Revision: https://developer.blender.org/D13186
|
|
Adds a method to profiler that can be used to check if it is active.
This is used to determine if stop_profiling and start_profiling
should be called.
| patch | Juans Scene UI 256 samples | Juans Scene bg 256 samples | junkshop UI | junkshop bg |
| No patch | 6:16.59 | 4:05.37 | 2:08.48 | 1:59.7 |
| D13187 | 4:12.15 | 3:57.36 | 2:07.25 | 1:58.16 |
| D13185 | 4.11.18 |3:54.74 | 2:07.44 | 1:58.03 |
| D13190 | 4:12.39 | 3:55.42 | 2:07.62 | 1:58.68 |
UI - means rendered from within Blender
bg - means rendered from the command line using ##blender -b scene.blend -f 1##
Reviewed By: sergey, brecht
Maniphest Tasks: T92601
Differential Revision: https://developer.blender.org/D13190
|
|
Adds a method to profiler that can be used to check if it is active.
This is used to determine if stop_profiling and start_profiling
should be called.
| patch | Juans Scene UI 256 samples | Juans Scene bg 256 samples | junkshop UI | junkshop bg |
| No patch | 6:16.59 | 4:05.37 | 2:08.48 | 1:59.7 |
| D13187 | 4:12.15 | 3:57.36 | 2:07.25 | 1:58.16 |
| D13185 | 4.11.18 |3:54.74 | 2:07.44 | 1:58.03 |
| D13190 | 4:12.39 | 3:55.42 | 2:07.62 | 1:58.68 |
UI - means rendered from within Blender
bg - means rendered from the command line using ##blender -b scene.blend -f 1##
Reviewed By: sergey, brecht
Maniphest Tasks: T92601
Differential Revision: https://developer.blender.org/D13190
|
|
GHOST API only has a header definition. No implementation or usage.
|
|
|
|
|
|
Remove outdated CUDA comments for bindless textures and cleanup some HIP comments that still mentioned CUDA.
Differential Revision: https://developer.blender.org/D13189
|
|
|
|
The issue was caused by splitting happening twice.
Fixed by checking for split flag which is assigned to the both states
during split.
The tricky part was to write catcher data at the moment of split: the
transparency and shadow catcher sample count is to be accumulated at
that point. Now it is happening in the `intersect_closest` kernel.
The downside is that render buffer is to be passed to the kernel, but
the benefit is that extra split bounce check is not needed now.
Had to move the passes write to shadow catcher header, since include
of `film/passes.h` causes all the fun of requirement to have BSDF
data structures available.
Differential Revision: https://developer.blender.org/D13177
|
|
This patch exposes the sampling offset option to Blender. It is located in the "Sampling > Advanced" panel.
For example, this can be useful to parallelize rendering and distribute different chunks of samples for each computer to render.
---
I also had to add this option to `RenderWork` and `RenderScheduler` classes so that the sample count in the status string can be calculated correctly.
Reviewed By: leesonw
Differential Revision: https://developer.blender.org/D13086
|
|
|
|
Also cleanup some related code, that was falsely copied from CUDA.
Differential Revision: https://developer.blender.org/D13180
|
|
|
|
This is due to a driver bug, so disable it for now until it gets resolved
in a future driver release.
Ref T92972
Differential Revision: https://developer.blender.org/D13167
|
|
It's unclear why this fails. Maybe the size of half4 is not the expected
8 bytes and adjacent pixels are overwritten. Or there is some bug in the
HIP compiler writing a struct into global memory, which we probably don't
do elsewhere in the kernel.
Thanks to Thomas, William and Jeroen for helping investigate this.
|
|
|
|
rB3a4c8f406a3a3bf0627477c6183a594fa707a6e2 changed the macros that create the film
convert kernel entry points, but in the process accidentally changed the parameter definition
to one of those (which caused CUDA launch and misaligned address errors) and changed the
implementation as well. This restores the correct implementation from before.
In addition, the `ccl_gpu_kernel_threads` macro did not work as intended and caused the
generated launch bounds to end up with an incorrect input for the second parameter (it was
set to "thread_num_registers", rather than the result of the block number calculation). I'm
not entirely sure why, as the macro definition looked sound to me. Decided to simply go with
two separate macros instead, to simplify and solve this.
Also changed how state is captured with the `ccl_gpu_kernel_lambda` macro slightly, to avoid
a compiler warning (expression has no effect) that otherwise occurred.
Maniphest Tasks: T92985
Differential Revision: https://developer.blender.org/D13175
|
|
|
|
The issue was that the `object_is_geometry` method was used in two different
contexts that expected the function to behave differently. So a recent change
that fixed `object_is_geometry` for one context, broke it for the other context.
The two contexts are:
* Check if a "real" object can contain a geometry to check if it has to be tagged
for sync after an update.
* Check if an object/instance actually is a geometry that cycles can work with.
I created a new `object_can_have_geometry` method for the first use case, instead
of trying to adapt the existing object_is_geometry method to serve both uses.
Additionally, I changed it so that a BObjectInfo is passed into `object_is_geometry`
to make it more explicit when this method is supposed to be used.
Differential Revision: https://developer.blender.org/D13135
|
|
This patch adapts the shared kernel entrypoints so that they can be compiled as MSL (Metal Shading Language). Where possible, the adaptations avoid changes in common code.
In MSL, kernel function inputs are explicitly bound to resources. In the case of argument buffers, we declare a struct containing the kernel arguments, accessible via device pointer. This differs from CUDA and HIP where kernel function arguments are declared as traditional C-style function parameters. This patch adapts the entrypoints declared in kernel.h so that they can be translated via a new `ccl_gpu_kernel_signature` macro into the required parameter struct + kernel entrypoint pairing for MSL.
MSL buffer attribution must be applied to function parameters or non-static class data members. To allow universal access to the integrator state, kernel data, and texture fetch adapters, we wrap all of the shared kernel code in a `MetalKernelContext` class. This is achieved by bracketing the appropriate kernel headers with "context_begin.h" and "context_end.h" on Metal. When calling deeper into the kernel code, we must reference the context class (e.g. `context.integrator_init_from_camera`). This extra prefixing is performed by a set of defines in "context_end.h". These will require explicit maintenance if entrypoints change. We invite discussion on more maintainable ways to enforce correctness.
Lambda expressions are not supported on MSL, so a new `ccl_gpu_kernel_lambda` macro generates an inline function object and optionally capturing any required state. This yields the same behaviour. This approach is applied to all parallel_... implementations which are templated by operation. The lambda expressions in the film_convert... kernels don't adapt cleanly to use function objects. However, these entrypoints can be macro-generated more concisely to avoid lambda expressions entirely, instead relying on constant folding to handle the pixel/channel conversions.
A separate implementation of `gpu_parallel_active_index_array` is provided for Metal to workaround some subtle differences in SIMD width, and also to encapsulate some required thread parameters which must be declared as explicit entrypoint function parameters.
Ref T92212
Reviewed By: brecht
Maniphest Tasks: T92212
Differential Revision: https://developer.blender.org/D13109
|
|
|
|
|
|
|
|
|
|
Thanks to Ilja Razinkov for identifying the problem and solution.
|
|
|
|
is destroyed
Adds a pass before denoising that calculates the intensity of the image, which can be
passed into the OptiX denoiser for more optimal results for very dark or very bright images.
In addition this also fixes a crash that sometimes occurred on exit. The OptiX denoiser object
has to be destroyed before the OptiX device context object (since it references that). But in
C++ the destructor function of a class is called before its fields are destructed, so
"~OptiXDevice" was always called before "OptiXDevice::~Denoiser" and therefore
"optixDeviceContextDestroy" was called before "optixDenoiserDestroy", hence the crash.
Differential Revision: https://developer.blender.org/D13160
|
|
newer on Linux
Adds a workaround for a driver bug in r495 that causes artifacts with OptiX denoising.
`optixDenoiserSetup` is not working properly there when called with a stream other than the
default stream, so use the default stream for now and force synchronization across the entire
context afterwards to ensure the other stream Cycles uses to enqueue the actual denoising
command cannot execute before the denoising setup has finished.
Maniphest Tasks: T92472
Differential Revision: https://developer.blender.org/D13158
|
|
|
|
|
|
|
|
|
|
|
|
Adds a bunch of CPU kernel function to process on row of pixels, and use those
instead of calling unoptimized implementations.
Fixes T92598
|
|
Changes:
* After hitting a shadow catcher, re-initialize the volume stack taking
into account shadow catcher ray visibility. This ensures that volume objects
are included in the stack only if they are shadow catchers.
* If there is a volume to be shaded in front of the shadow catcher, the split
is now performed in the shade_volume kernel after volume shading is done.
* Previously the background pass behind a shadow catcher was done as part of
the regular path, now it is done as part of the shadow catcher path.
For a shadow catcher path with volumes and visible background, operations are
done in this order now:
* intersect_closest
* shade_volume
* shadow catcher split
* intersect_volume_stack
* shade_background
* shade_surface
The world volume is currently assumed to be CG, that is it does not exist in
the footage. We may consider adding an option to control this, or change the
default. With a volume object this control is already possible.
This includes refactoring to centralize the logic for next kernel scheduling
in intersect_closest.h.
Differential Revision: https://developer.blender.org/D13093
|
|
|
|
|
|
Can't cast to float4 because it might not have correct alignment.
|
|
Evaluated meshes from curves are presented to render engines as
separate instance objects now, just like evaluated meshes from other
object types like point clouds and volumes. For that reason, cycles
should not consider curve objects as geometry (previously it did,
meaning it retrieved a second mesh from the curve object as well
as the temporary evaluated mesh geometry).
Further, avoid adding a curve object's evaluated mesh as data_eval,
since that is special behavior for meshes that is arbitrary. Adding an
evaluated mesh there but not an evalauted pointcloud is arbitrary,
for example. Retrieve the evaluated mesh in from the geometry set
in BKE_object_get_evaluated_mesh now, to support that change.
This gets us closer to a place where all of an object's evaluated data
is stored in geometry_set_eval, and we just have helper functions
to access specific geometry components.
Differential Revision: https://developer.blender.org/D13118
|
|
Evaluated meshes from curves are presented to render engines as
separate instance objects now, just like evaluated meshes from other
object types like point clouds and volumes. For that reason, cycles
should not consider curve objects as geometry (previously it did,
meaning it retrieved a second mesh from the curve object as well
as the temporary evaluated mesh geometry).
Further, avoid adding a curve object's evaluated mesh as data_eval,
since that is special behavior for meshes that is arbitrary. Adding an
evaluated mesh there but not an evalauted pointcloud is arbitrary,
for example. Retrieve the evaluated mesh in from the geometry set
in BKE_object_get_evaluated_mesh now, to support that change.
This gets us closer to a place where all of an object's evaluated data
is stored in geometry_set_eval, and we just have helper functions
to access specific geometry components.
Differential Revision: https://developer.blender.org/D13118
|