Age | Commit message (Collapse) | Author |
|
This patch cleans up code for HIP device and makes it more consistent with the CUDA code.
It also fixes the issue with high VRAM usage on AMD cards using HIP allowing better performance and usage on cards like 6600XT.
Added a check in intern/cycles/kernel/bvh/bvh_util.h to prevent compiler error with hipcc
Reviewed By: brecht, leesonw
Maniphest Tasks: T92124
Differential Revision: https://developer.blender.org/D12834
|
|
These transparent shadows can be expansive to evaluate. Especially on the
GPU they can lead to poor occupancy when only some pixels require many kernel
launches to trace and evaluate many layers of transparency.
Baked transparency allows tracing a single ray in many cases by accumulating
the throughput directly in the intersection program without recording hits
or evaluating shaders. Transparency is baked at curve vertices and
interpolated, for most shaders this will look practically the same as actual
shader evaluation.
Fixes T91428, performance regression with spring demo file due to transparent
hair, and makes it render significantly faster than Blender 2.93.
Differential Revision: https://developer.blender.org/D12880
|
|
Helps save one OptiX payload and is a bit more efficient.
Differential Revision: https://developer.blender.org/D12909
|
|
* isect Ng is no longer needed for shadows, for main path needed for SSS only
* Reduce rng_offset and queued_kernel to 16 bits
Ref D12889
|
|
* Rename struct KernelGlobals to struct KernelGlobalsCPU
* Add KernelGlobals, IntegratorState and ConstIntegratorState typedefs
that every device can define in its own way.
* Remove INTEGRATOR_STATE_ARGS and INTEGRATOR_STATE_PASS macros and
replace with these new typedefs.
* Add explicit state argument to INTEGRATOR_STATE and similar macros
In preparation for decoupling main and shadow paths.
Differential Revision: https://developer.blender.org/D12888
|
|
|
|
|
|
|
|
|
|
This is the first of a sequence of changes to support compiling Cycles kernels as MSL (Metal Shading Language) in preparation for a Metal GPU device implementation.
MSL requires that all pointer types be declared with explicit address space attributes (device, thread, etc...). There is already precedent for this with Cycles' address space macros (ccl_global, ccl_private, etc...), therefore the first step of MSL-enablement is to apply these consistently. Line-for-line this represents the largest change required to enable MSL. Applying this change first will simplify future patches as well as offering the emergent benefit of enhanced descriptiveness.
The vast majority of deltas in this patch fall into one of two cases:
- Ensuring ccl_private is specified for thread-local pointer types
- Ensuring ccl_global is specified for device-wide pointer types
Additionally, the ccl_addr_space qualifier can be removed. Prior to Cycles X, ccl_addr_space was used as a context-dependent address space qualifier, but now it is either redundant (e.g. in struct typedefs), or can be replaced by ccl_global in the case of pointer types. Associated function variants (e.g. lcg_step_float_addrspace) are also redundant.
In cases where address space qualifiers are chained with "const", this patch places the address space qualifier first. The rationale for this is that the choice of address space is likely to have the greater impact on runtime performance and overall architecture.
The final part of this patch is the addition of a metal/compat.h header. This is partially complete and will be extended in future patches, paving the way for the full Metal implementation.
Ref T92212
Reviewed By: brecht
Maniphest Tasks: T92212
Differential Revision: https://developer.blender.org/D12864
|
|
|
|
There is not enough time before the release to improve Random Walk to handle
all cases this was used for, so restore it for now.
Since there is no more path splitting in cycles-x, this can increase noise in
non-flat areas for the sample number of samples, though fewer rays will be traced
also. This is fundamentally a trade-off we made in the new design and why Random
Walk is a better fit. However the importance resampling we do now does help to
reduce noise.
Differential Revision: https://developer.blender.org/D12800
|
|
Previously the storage here was optimized to avoid indirections in BVH2
traversal. This helps improve performance a bit, but makes performance
and memory usage of Embree and OptiX BVHs a bit worse also. It also adds
code complexity in other parts of the code.
Now decouple triangle and curve primitive storage from BVH2.
* Reduced peak memory usage on all devices
* Bit better performance for OptiX and Embree
* Bit worse performance for CUDA
* Simplified code:
** Intersection.prim/object now matches ShaderData.prim/object
** No more offset manipulation for mesh displacement before a BVH is built
** Remove primitive packing code and flags for Embree and OptiX
** Curve segments are now stored in a KernelCurve struct
* Also happens to fix a bug in baking with incorrect prim/object
Fixes T91968, T91770, T91902
Differential Revision: https://developer.blender.org/D12766
|
|
Before the visibility test against the visibility flags was performed in an any-hit program in OptiX
(called `__anyhit__kernel_optix_visibility_test`), which was using the `__prim_visibility` array.
This is not entirely correct however, since `__prim_visibility` is filled with the merged visibility
flags of all objects that reference that primitive, so if one object uses different visibility flags
than another object, but they both are instances of the same geometry, they would appear the same
way. The reason that the any-hit program was used rather than the OptiX instance visibility mask is
that the latter is currently limited to 8 bits only, which is not sufficient to contain all Cycles
visibility flags (12 bits).
To mostly fix the problem with multiple instances and different visibility flags, I changed things to
use the OptiX instance visibility mask for a subset of the Cycles visibility flags (`PATH_RAY_CAMERA`
to `PATH_RAY_VOLUME_SCATTER`, which fit into 8 bits) and only fall back to the visibility test any-hit
program if that isn't enough (e.g. the ray visibility mask exceeds 8 bits or when using the built-in
curves from OptiX, since the any-hit program is then also used to skip the curve endcaps).
This may also improve performance in some cases, since by default OptiX can now perform the normal
scene intersection trace calls entirely on RT cores without having to jump back to the SM on every
hit to execute the any-hit program.
Fixes T89801
Differential Revision: https://developer.blender.org/D12604
|
|
This includes much improved GPU rendering performance, viewport interactivity,
new shadow catcher, revamped sampling settings, subsurface scattering anisotropy,
new GPU volume sampling, improved PMJ sampling pattern, and more.
Some features have also been removed or changed, breaking backwards compatibility.
Including the removal of the OpenCL backend, for which alternatives are under
development.
Release notes and code docs:
https://wiki.blender.org/wiki/Reference/Release_Notes/3.0/Cycles
https://wiki.blender.org/wiki/Source/Render/Cycles
Credits:
* Sergey Sharybin
* Brecht Van Lommel
* Patrick Mours (OptiX backend)
* Christophe Hery (subsurface scattering anisotropy)
* William Leeson (PMJ sampling pattern)
* Alaska (various fixes and tweaks)
* Thomas Dinges (various fixes)
For the full commit history, see the cycles-x branch. This squashes together
all the changes since intermediate changes would often fail building or tests.
Ref T87839, T87837, T87836
Fixes T90734, T89353, T80267, T80267, T77185, T69800
|
|
The goal: allow to easily use AO approximation in scenes which combines
both small and large scale objects.
The idea: use per-object AO distance which will allow to override world
settings. Instancer object will "propagate" its AO distance to all its
instances unless the instance defines own distance (this allows to
modify AO distance in the shot files, without requiring to modify props
used in the shots.
Available from the new Fats GI Approximation panel in object properties.
Differential Revision: https://developer.blender.org/D12112
|
|
WITH_CYCLES_DEBUG was used for rendering BVH debugging passes. But since we
mainly use Embree an OptiX now, this information is no longer important.
WITH_CYCLES_DEBUG_NAN will enable additional checks for NaNs and invalid values
in the kernel, for Cycles developers. Previously these asserts where enabled in
all debug builds, but this is too likely to crash Blender in scenes that render
fine regardless of the NaNs. So this is behind a CMake option now.
Fixes T90240
|
|
|
|
Also correctly used inverse transposed matrix for normal transform.
|
|
Offset rays from the flat surface to match where they would be for a smooth
surface as specified by the normals. In the shading panel there is now a
Shading Offset (existing option) and Geometry Offset (new).
The Geometry Offset works as follows:
* 0: disabled
* 0.001: only terminated triangles (normal points to the light, geometry
doesn't) are affected
* 0.1 (default): triangles at grazing angles are affected, and the effect
fades out
* 1: all triangles are affected
Limitations:
* The artifact is still visible in some cases, it could be that some quads
require to be treated specifically as quads.
* Inconsistent normals cause artifacts.
* If small objects cast shadows to a big low poly surface, the shadows can
appear to be in a wrong place - because the surface moved slightly above
the geometry. This can be noticed only at grazing angles to light.
* Approximated surfaces of two non-intersecting low-poly objects can overlap
that causes off-the-wall shadows.
Generally, using one or a few levels of subdivision can get rid of artifacts
faster than before.
Differential Revision: https://developer.blender.org/D11065
|
|
|
|
|
|
|
|
Adds support for building multiple BVH types in order to support using both CPU and OptiX
devices for rendering simultaneously. Primitive packing for Embree and OptiX is now
standalone, so it only needs to be run once and can be shared between the two. Additionally,
BVH building was made a device call, so that each device backend can decide how to
perform the building. The multi-device for instance creates a special multi-BVH that holds
references to several sub-BVHs, one for each sub-device.
Reviewed By: brecht, kevindietrich
Differential Revision: https://developer.blender.org/D9718
|
|
This patch adds support for the curve primitive from OptiX to Cycles. It's currently hidden
behind a debug option, since there can be some slight rendering differences still (because no
backface culling is performed and something seems off with endcaps). The curve primitive
was added with the OptiX 7.1 SDK and requires a r450 driver or newer, so this also updates
the codebase to be able to build with the new SDK.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D8223
|
|
Also removing the curve system manager which only stored a few curve intersection
settings. These are all changes towards making shape and subdivision settings
per-object instead of per-scene, but there is more work to do here.
Ref T73778
Depends on D8013
Maniphest Tasks: T73778
Differential Revision: https://developer.blender.org/D8014
|
|
Ref T73778
Depends on D8011
Maniphest Tasks: T73778
Differential Revision: https://developer.blender.org/D8012
|
|
The kernel did not work correctly when these were disabled anyway. The
optimized BVH traversal for the no instances case was also only used on
the CPU, so no longer makes sense to keep.
Ref T73778
Depends on D8010
Maniphest Tasks: T73778
Differential Revision: https://developer.blender.org/D8011
|
|
Triangles were very memory intensive. The only reason they were not removed yet
is that they gave more accurate results, but there will be an accurate 3D curve
primitive added for this.
Line rendering was always poor quality since the ends do not match up. To keep CPU
and GPU compatibility we just remove them entirely. They could be brought back if
an Embree compatible implementation is added, but it's not clear to me that there
is a use case for these that we'd consider important.
Ref T73778
Reviewers: #cycles
Subscribers:
|
|
Differential Revision: https://developer.blender.org/D7890
|
|
|
|
Ref T73778
|
|
Ref T73778
|
|
Embree's local intersection routine was not prepared
for local intersections without per-object BVH.
Now it should be able to handle any kind of local
intersection, such as AO, bevel and SSS.
Differential Revision: https://developer.blender.org/D6602
|
|
The Embree backend did not properly recognize when the camera was
inside a volume and ended up ignoring those.
|
|
This adds all the kernel side changes for the Optix backend.
Ref D5363
|
|
CUDA is working correct without it now, and it's more efficient not to do this.
Ref D5363
|
|
Ref D5363
|
|
Was caused by ray direction becoming NaN after some of the bounces.
|
|
|
|
This never really worked as it was supposed to. The main goal of this is to
turn noise from sampling tiny hairs into multiple layers of transparency that
do not need to be sampled stochastically. However the implementation of this
worked by randomly discarding hair intersections in BVH traversal, which
defeats the purpose.
If it ever comes back, it's best implemented outside the kernel as a preprocess
that changes hair radius before BVH building. This would also make it work with
Embree, where it's not supported now. But it's not so clear anymore that with
many AA samples and GPU rendering this feature is as helpful as it once was for
CPU raytracers with few AA samples.
The benefit of removing this feature is improved hair ray tracing performance,
tested on NVIDIA Titan Xp:
bmw27: +0.37%
classroom: +0.26%
fishy_cat: -7.36%
koro: -12.98%
pabellon: -0.12%
Differential Revision: https://developer.blender.org/D4532
|
|
Apply clang format as proposed in T53211.
For details on usage and instructions for migrating branches
without conflicts, see:
https://wiki.blender.org/wiki/Tools/ClangFormat
|
|
|
|
Was happening when looking for all intersections for transparent shadow rays
in the case the ray is degenerate.
Still quesitonable whether we should consider this a transparent or opaque
configuraiton. Ideally, we should prevent such rays from happening, but that
is another vector of debugging.
|
|
BF-admins agree to remove header information that isn't useful,
to reduce noise.
- BEGIN/END license blocks
Developers should add non license comments as separate comment blocks.
No need for separator text.
- Contributors
This is often invalid, outdated or misleading
especially when splitting files.
It's more useful to git-blame to find out who has developed the code.
See P901 for script to perform these edits.
|
|
various parts of the CPU kernel
This commit adds a sample-based profiler that runs during CPU rendering and collects statistics on time spent in different parts of the kernel (ray intersection, shader evaluation etc.) as well as time spent per material and object.
The results are currently not exposed in the user interface or per Python yet, to see the stats on the console pass the "--cycles-print-stats" argument to Cycles (e.g. "./blender -- --cycles-print-stats").
Unfortunately, there is no clear way to extend this functionality to CUDA or OpenCL, so it is CPU-only for now.
Reviewers: brecht, sergey, swerner
Reviewed By: brecht, swerner
Differential Revision: https://developer.blender.org/D3892
|
|
|
|
|
|
While we shouldn't have logic in an entry point, and since one should
not be making typos when moving lines around, there is bigger entanglement
issue with BVH host code using kernel function. This is bad violation,
but is tricky to get solved moments before the weekly.
In order to keep things in a (less) broken state than before own cleanup
reverting the changes.
This reverts commit 2bad10be96540ff50a149230d656e599775b3f47.
This reverts commit ddabb21d0584e9874e8e5c62c04abe496ec7334b
|
|
There are some more sanitization which would be cool to be done
in the neighbourhood of those functions, but that could also happen
later.
|