Age | Commit message (Collapse) | Author |
|
This is due to a driver bug, so disable it for now until it gets resolved
in a future driver release.
Ref T92972
Differential Revision: https://developer.blender.org/D13167
|
|
|
|
|
|
|
|
is destroyed
Adds a pass before denoising that calculates the intensity of the image, which can be
passed into the OptiX denoiser for more optimal results for very dark or very bright images.
In addition this also fixes a crash that sometimes occurred on exit. The OptiX denoiser object
has to be destroyed before the OptiX device context object (since it references that). But in
C++ the destructor function of a class is called before its fields are destructed, so
"~OptiXDevice" was always called before "OptiXDevice::~Denoiser" and therefore
"optixDeviceContextDestroy" was called before "optixDenoiserDestroy", hence the crash.
Differential Revision: https://developer.blender.org/D13160
|
|
newer on Linux
Adds a workaround for a driver bug in r495 that causes artifacts with OptiX denoising.
`optixDenoiserSetup` is not working properly there when called with a stream other than the
default stream, so use the default stream for now and force synchronization across the entire
context afterwards to ensure the other stream Cycles uses to enqueue the actual denoising
command cannot execute before the denoising setup has finished.
Maniphest Tasks: T92472
Differential Revision: https://developer.blender.org/D13158
|
|
|
|
Adds a bunch of CPU kernel function to process on row of pixels, and use those
instead of calling unoptimized implementations.
Fixes T92598
|
|
|
|
|
|
RDNA2 only for now to be conservative, but testing more hardware is underway.
Ref T92393
Differential Revision: https://developer.blender.org/D12958
|
|
The kernel file names are search for based on the arch name, for example
gfx1010. However HIP's gcnArchName can contain options such as xnack- in
the name. For example gfx1010:sramecc-:xnack-.
This revision tokenizes the info from gcnArchName and just uses the first
token for choosing the Kernel file to use. Kernels are portable across those
features in the arch name.
Also remove the bit for recompiling ptx as clearly that is not relevant.
Differential Revision: https://developer.blender.org/D13117
|
|
|
|
All supported devices support half float now, so we can remove the check.
Differential Revision: https://developer.blender.org/D13021
|
|
Instead of printing debug flags listing various CPU and GPU settings that
may or may not be used, print when we are using them. This include CPU
kernel types, OptiX debugging and CUDA and HIP adaptive compilation. BVH
type was already printed.
|
|
|
|
Remove prefix of filenames that is the same as the folder name. This used
to help when #includes were using individual files, but now they are always
relative to the cycles root directory and so the prefixes are redundant.
For patches and branches, git merge and rebase should be able to detect the
renames and move over code to the right file.
|
|
* Split render/ into scene/ and session/. The scene/ folder now contains the
scene and its nodes. The session/ folder contains the render session and
associated data structures like drivers and render buffers.
* Move top level kernel headers into new folders kernel/camera/, kernel/film/,
kernel/light/, kernel/sample/, kernel/util/
* Move integrator related kernel headers into kernel/integrator/
* Move OSL shaders from kernel/shaders/ to kernel/osl/shaders/
For patches and branches, git merge and rebase should be able to detect the
renames and move over code to the right file.
|
|
* Additional structs added to the hipew loader for device props
* Adds hipRTC functions to the loader for future usage
* Enables CPU+GPU usage for HIP
* Cleanup to the adaptive kernel compilation process
* Fix for kernel compilation failures with HIP with latest master
Ref T92393, D12958
|
|
This triggered a compiler bug where it does not handle the sub.s16 PTX
instruction. Instead refactor the code so we don't need to do uint16_t
subtraction at all.
Also update OptiX device to remove the AO pass direct callable.
Thanks Patrick Mours for figuring this out.
|
|
Similar to main path compaction that happens before adding work tiles, this
compacts shadow paths before launching kernels that may add shadow paths.
Only do it when more than 50% of space is wasted.
It's not a clear win in all scenes, some are up to 1.5% slower. Likely caused
by different order of scheduling kernels having an unpredictable performance
impact. Still feels like compaction is just the right thing to avoid cases
where a few shadow paths can hold up a lot of main paths.
Differential Revision: https://developer.blender.org/D12944
|
|
Ref D12834
|
|
The conversion from double to float was causing a build failure.
Differential Revision: https://developer.blender.org/D12946
|
|
|
|
Ref T87836
|
|
This patch cleans up code for HIP device and makes it more consistent with the CUDA code.
It also fixes the issue with high VRAM usage on AMD cards using HIP allowing better performance and usage on cards like 6600XT.
Added a check in intern/cycles/kernel/bvh/bvh_util.h to prevent compiler error with hipcc
Reviewed By: brecht, leesonw
Maniphest Tasks: T92124
Differential Revision: https://developer.blender.org/D12834
|
|
These transparent shadows can be expansive to evaluate. Especially on the
GPU they can lead to poor occupancy when only some pixels require many kernel
launches to trace and evaluate many layers of transparency.
Baked transparency allows tracing a single ray in many cases by accumulating
the throughput directly in the intersection program without recording hits
or evaluating shaders. Transparency is baked at curve vertices and
interpolated, for most shaders this will look practically the same as actual
shader evaluation.
Fixes T91428, performance regression with spring demo file due to transparent
hair, and makes it render significantly faster than Blender 2.93.
Differential Revision: https://developer.blender.org/D12880
|
|
* Rename struct KernelGlobals to struct KernelGlobalsCPU
* Add KernelGlobals, IntegratorState and ConstIntegratorState typedefs
that every device can define in its own way.
* Remove INTEGRATOR_STATE_ARGS and INTEGRATOR_STATE_PASS macros and
replace with these new typedefs.
* Add explicit state argument to INTEGRATOR_STATE and similar macros
In preparation for decoupling main and shadow paths.
Differential Revision: https://developer.blender.org/D12888
|
|
|
|
Previously the storage here was optimized to avoid indirections in BVH2
traversal. This helps improve performance a bit, but makes performance
and memory usage of Embree and OptiX BVHs a bit worse also. It also adds
code complexity in other parts of the code.
Now decouple triangle and curve primitive storage from BVH2.
* Reduced peak memory usage on all devices
* Bit better performance for OptiX and Embree
* Bit worse performance for CUDA
* Simplified code:
** Intersection.prim/object now matches ShaderData.prim/object
** No more offset manipulation for mesh displacement before a BVH is built
** Remove primitive packing code and flags for Embree and OptiX
** Curve segments are now stored in a KernelCurve struct
* Also happens to fix a bug in baking with incorrect prim/object
Fixes T91968, T91770, T91902
Differential Revision: https://developer.blender.org/D12766
|
|
Implement an overscan support for tiles, so that adaptive sampling can
rely on the pixels neighbourhood.
Differential Revision: https://developer.blender.org/D12599
|
|
|
|
* Split GPUDisplay into two classes. PathTraceDisplay to implement the Cycles side,
and DisplayDriver to implement the host application side. The DisplayDriver is now
a fully abstract base class, embedded in the PathTraceDisplay.
* Move copy_pixels_to_texture implementation out of the host side into the Cycles side,
since it can be implemented in terms of the texture buffer mapping.
* Move definition of DeviceGraphicsInteropDestination into display driver header, so
that we do not need to expose private device headers in the public API.
* Add more detailed comments about how the DisplayDriver should be implemented.
The "driver" terminology might not be obvious, but is also used in other renderers.
Differential Revision: https://developer.blender.org/D12626
|
|
|
|
|
|
NOTE: this feature is not ready for user testing, and not yet enabled in daily
builds. It is being merged now for easier collaboration on development.
HIP is a heterogenous compute interface allowing C++ code to be executed on
GPUs similar to CUDA. It is intended to bring back AMD GPU rendering support
on Windows and Linux.
https://github.com/ROCm-Developer-Tools/HIP.
As of the time of writing, it should compile and run on Linux with existing
HIP compilers and driver runtimes. Publicly available compilers and drivers
for Windows will come later.
See task T91571 for more details on the current status and work remaining
to be done.
Credits:
Sayak Biswas (AMD)
Arya Rafii (AMD)
Brian Savery (AMD)
Differential Revision: https://developer.blender.org/D12578
|
|
Before the visibility test against the visibility flags was performed in an any-hit program in OptiX
(called `__anyhit__kernel_optix_visibility_test`), which was using the `__prim_visibility` array.
This is not entirely correct however, since `__prim_visibility` is filled with the merged visibility
flags of all objects that reference that primitive, so if one object uses different visibility flags
than another object, but they both are instances of the same geometry, they would appear the same
way. The reason that the any-hit program was used rather than the OptiX instance visibility mask is
that the latter is currently limited to 8 bits only, which is not sufficient to contain all Cycles
visibility flags (12 bits).
To mostly fix the problem with multiple instances and different visibility flags, I changed things to
use the OptiX instance visibility mask for a subset of the Cycles visibility flags (`PATH_RAY_CAMERA`
to `PATH_RAY_VOLUME_SCATTER`, which fit into 8 bits) and only fall back to the visibility test any-hit
program if that isn't enough (e.g. the ray visibility mask exceeds 8 bits or when using the built-in
curves from OptiX, since the any-hit program is then also used to skip the curve endcaps).
This may also improve performance in some cases, since by default OptiX can now perform the normal
scene intersection trace calls entirely on RT cores without having to jump back to the SM on every
hit to execute the any-hit program.
Fixes T89801
Differential Revision: https://developer.blender.org/D12604
|
|
|
|
|
|
Protect against integer overflow.
|
|
|
|
This includes much improved GPU rendering performance, viewport interactivity,
new shadow catcher, revamped sampling settings, subsurface scattering anisotropy,
new GPU volume sampling, improved PMJ sampling pattern, and more.
Some features have also been removed or changed, breaking backwards compatibility.
Including the removal of the OpenCL backend, for which alternatives are under
development.
Release notes and code docs:
https://wiki.blender.org/wiki/Reference/Release_Notes/3.0/Cycles
https://wiki.blender.org/wiki/Source/Render/Cycles
Credits:
* Sergey Sharybin
* Brecht Van Lommel
* Patrick Mours (OptiX backend)
* Christophe Hery (subsurface scattering anisotropy)
* William Leeson (PMJ sampling pattern)
* Alaska (various fixes and tweaks)
* Thomas Dinges (various fixes)
For the full commit history, see the cycles-x branch. This squashes together
all the changes since intermediate changes would often fail building or tests.
Ref T87839, T87837, T87836
Fixes T90734, T89353, T80267, T80267, T77185, T69800
|
|
|
|
WITH_CYCLES_DEBUG was used for rendering BVH debugging passes. But since we
mainly use Embree an OptiX now, this information is no longer important.
WITH_CYCLES_DEBUG_NAN will enable additional checks for NaNs and invalid values
in the kernel, for Cycles developers. Previously these asserts where enabled in
all debug builds, but this is too likely to crash Blender in scenes that render
fine regardless of the NaNs. So this is behind a CMake option now.
Fixes T90240
|
|
This fixes a performance regression on Ampere cards, on specific scenes like
classroom. For cycles-x there is little difference, but this is still helpful
for LTS releases, and we need to upgrade at some point anyway.
|
|
|
|
|
|
Currently, the OptiX BVH build options are selected based on whether
we are in background mode (final renders) or not (viewport renders).
In background mode, the BVH is built for fast path tracing and low
memory footprint, while in viewport, it is built for fast updates.
However, on platforms without OpenGL support, the background flag is
always set to true and prevents using fast BVH builds in the viewport.
Now, the BVH options derive from the Scene BVH settings:
* if BVH is static, a fast to trace BVH is built
* if BVH is dynamic, a fast to update BVH is built
Reviewed By: #cycles, brecht
Differential Revision: https://developer.blender.org/D11154
|
|
Using displacement runs the shader eval kernel, but since OptiX modules are not loaded when
baking is active, those were not available and therefore failed to launch. This fixes that by falling
back to the CUDA kernels.
|
|
|