Age | Commit message (Collapse) | Author |
|
This includes much improved GPU rendering performance, viewport interactivity,
new shadow catcher, revamped sampling settings, subsurface scattering anisotropy,
new GPU volume sampling, improved PMJ sampling pattern, and more.
Some features have also been removed or changed, breaking backwards compatibility.
Including the removal of the OpenCL backend, for which alternatives are under
development.
Release notes and code docs:
https://wiki.blender.org/wiki/Reference/Release_Notes/3.0/Cycles
https://wiki.blender.org/wiki/Source/Render/Cycles
Credits:
* Sergey Sharybin
* Brecht Van Lommel
* Patrick Mours (OptiX backend)
* Christophe Hery (subsurface scattering anisotropy)
* William Leeson (PMJ sampling pattern)
* Alaska (various fixes and tweaks)
* Thomas Dinges (various fixes)
For the full commit history, see the cycles-x branch. This squashes together
all the changes since intermediate changes would often fail building or tests.
Ref T87839, T87837, T87836
Fixes T90734, T89353, T80267, T80267, T77185, T69800
|
|
Something in this update broke the floor() function in CUDA, instead use
floorf() like we do everywhere else in the kernel code. Thanks to Ray
Molenkamp for identifying the solution.
|
|
|
|
Double floating point precision is an extension of OpenCL, which might
not be implemented by certain drivers, such as Intel Xe graphics.
Cycles does not use double floating point precision, and there is no
need on keeping doubles unless there is an explicit decision to use
them.
This is a simple fix from Cycles side to replace double floating point
type with a type of same size and alignment rules. Inspired by Brecht
and Patrick.
Tested on NVidia Titan V, Radeon RX Vega M, and TGL laptop.
Differential Revision: https://developer.blender.org/D10143
|
|
In my testing this works, but it requires me to remove the min(start_sample...) part in the
adaptive sampling kernel, and I assume there's a reason why it was there?
Reviewed By: brecht
Maniphest Tasks: T82351
Differential Revision: https://developer.blender.org/D9445
|
|
with old SDK
Commit d259e7dcfbbd37cec5a45fdfb554f24de10d0268 increased the instance limit, but only provided
a fall back for the host code for older OptiX SDKs, not for kernel code. This caused a mismatch when
an old SDK was used (as is currently the case on buildbot) and subsequent rendering artifacts. This
fixes that by moving the bit that is checked to a common location that works with both old an new
SDK versions.
|
|
For a while now OptiX had support for 28-bits of instance IDs, instead of the initial 24-bits (see also
value reported by OPTIX_DEVICE_PROPERTY_LIMIT_MAX_INSTANCE_ID). This change makes use of
that and also adds an error reported when the number of instances an OptiX acceleration structure is
created with goes beyond the limit, to make this clear instead of just rendering an image with artifacts.
Manifest Tasks: T81431
|
|
The SVM AO node calls "scene_intersect_local" with a NULL pointer for the intersection
information, which caused a crash with OptiX since it was not checking for this case and
always dereferencing this pointer. This fixes that by checking whether any hit information
was requested first (like is done in the BVH2 intersection routines).
|
|
NanoVDB includes "assert.h" and makes use of "assert" in several places and since the compile
pipeline for CUDA/OptiX kernels does not define "NDEBUG" for release builds, those debug
checks were always added. This is not intended, so this patch disables "assert" for CUDA/OptiX
by defining "NDEBUG" before including NanoVDB headers.
This also fixes a warning about unknown pragmas in NanoVDB thrown by the CUDA compiler.
|
|
Recent changes introduced `acc` parameter into the texture read
functions. When nanovdb isn't enabled this leads to compilation errors
as the `acc` variable wasn't defined. OpenCL only compiles needed
features what made it more prominent.
Reviewed By: Patrick Mours
Differential Revision: https://developer.blender.org/D9629
|
|
There were some changes to the NanoVDB API that broke the way Cycles was previously using it.
With these changes it compiles successfully again and also still compiles with the NanoVDB revision
that is currently part of the Blender dependencies. Ref T81454.
|
|
forceinline attribute is only applicable for function which are
marked inline. Interestingly, it can be used for class methods
without explicit inline statement. But for functions it is another
story.
|
|
Volumes using tricubic sampling were producing different results with NanoVDB compared
to dense textures. This fixes that by using the same tricubic sampling algorithm in both
cases. It also fixes some remaining offset issues and some minor things that broke OpenCL
kernel compilation on NVIDIA.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D9491
|
|
The NanoVDB sampling implementation behaves different from dense texture sampling, so this
adds a small offset to the voxel indices to correct for that.
Also removes the need to modify the sampling coordinates by moving all the necessary
transformations into the image transform. See also T81454.
|
|
|
|
NanoVDB is a platform-independent sparse volume data structure that makes it possible to
use OpenVDB volumes on the GPU. This patch uses it for volume rendering in Cycles,
replacing the previous usage of dense 3D textures.
Since it has a big impact on memory usage and performance and changes the OpenVDB
branch used for the rest of Blender as well, this is not enabled by default yet, which will
happen only after 2.82 was branched off. To enable it, build both dependencies and Blender
itself with the "WITH_NANOVDB" CMake option.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D8794
|
|
Running Blender on Ampere cards was already possible with ptx, this fix is
needed to support building CUDA binaries.
Note the CUDA version used for official Blender builds is still 10, this is
merely the change to make it possible for those using CUDA 11 and specifying
the sm_8x kernels to be compiled.
Found by Milan Jaros.
|
|
Cleanup old tracker task format to the new. e.g: [#34039] to T34039
Ref D8718
|
|
Differential Revision: https://developer.blender.org/D8372
|
|
We forgot to update this code as part of D3108. I'd like to include this in 2.90,
it's entirely broken now so can't really get any worse.
Differential Revision: https://developer.blender.org/D8738
|
|
|
|
This patch adds support for the curve primitive from OptiX to Cycles. It's currently hidden
behind a debug option, since there can be some slight rendering differences still (because no
backface culling is performed and something seems off with endcaps). The curve primitive
was added with the OptiX 7.1 SDK and requires a r450 driver or newer, so this also updates
the codebase to be able to build with the new SDK.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D8223
|
|
|
|
This keeps render results compatible for combined CPU + GPU rendering.
Peformance and quality primitives is quite different than before. There
are now two options:
* Rounded Ribbon: render hair as flat ribbon with (fake) rounded normals, for
fast rendering. Hair curves are subdivided with a fixed number of user
specified subdivisions.
This gives relatively good results, especially when used with the Principled
Hair BSDF and hair viewed from a typical distance. There are artifacts when
viewed closed up, though this was also the case with all previous primitives
(but different ones).
* 3D Curve: render hair as 3D curve, for accurate results when viewing hair
close up. This automatically subdivides the curve until it is smooth.
This gives higher quality than any of the previous primitives, but does come
at a performance cost and is somewhat slower than our previous Thick curves.
The main problem here is performance. For CPU and OpenCL rendering performance
seems usually quite close or better for similar quality results.
However for CUDA and Optix, performance of 3D curve intersection is problematic,
with e.g. 1.45x longer render time in Koro (though there is no equivalent quality
and rounded ribbons seem fine for that scene). Any help or ideas to optimize this
are welcome.
Ref T73778
Depends on D8012
Maniphest Tasks: T73778
Differential Revision: https://developer.blender.org/D8013
|
|
Triangles were very memory intensive. The only reason they were not removed yet
is that they gave more accurate results, but there will be an accurate 3D curve
primitive added for this.
Line rendering was always poor quality since the ends do not match up. To keep CPU
and GPU compatibility we just remove them entirely. They could be brought back if
an Embree compatible implementation is added, but it's not clear to me that there
is a use case for these that we'd consider important.
Ref T73778
Reviewers: #cycles
Subscribers:
|
|
|
|
bright objects
The input data to the OptiX denoiser was clamped to 0..10000 as required, but it could easily
exceed that range with a high number of samples (since the data contains the overall sum). To
fix that, divide by the number of samples first and multiply it back in after the denoiser ran.
|
|
There should be no user visible change from this, except that tile size
now affects performance. The goal here is to simplify bake denoising in
D3099, letting it reuse more denoising tiles and pass code.
A lot of code is now shared with regular rendering, with the two main
differences being that we read some render result passes from the bake API
when starting to render a tile, and call the bake kernel instead of the
path trace kernel.
With this kind of design where Cycles asks for tiles from the bake API,
it should eventually be easier to reduce memory usage, show tiles as
they are baked, or bake multiple passes at once, though there's still
quite some work needed for that.
Reviewers: #cycles
Subscribers: monio, wmatyjewicz, lukasstockner97, michaelknubben
Differential Revision: https://developer.blender.org/D3108
|
|
This is not yet fully supported by automatic volume bounds but works fine in
most cases that will have mostly matching bounds.
Ref T73201
|
|
There was too much image texture specific stuff in device_memory, and too
much code duplication between devices.
|
|
This is legacy code from when we had a fixed number of textures.
|
|
|
|
This feature takes some inspiration from
"RenderMan: An Advanced Path Tracing Architecture for Movie Rendering" and
"A Hierarchical Automatic Stopping Condition for Monte Carlo Global Illumination"
The basic principle is as follows:
While samples are being added to a pixel, the adaptive sampler writes half
of the samples to a separate buffer. This gives it two separate estimates
of the same pixel, and by comparing their difference it estimates convergence.
Once convergence drops below a given threshold, the pixel is considered done.
When a pixel has not converged yet and needs more samples than the minimum,
its immediate neighbors are also set to take more samples. This is done in order
to more reliably detect sharp features such as caustics. A 3x3 box filter that
is run periodically over the tile buffer is used for that purpose.
After a tile has finished rendering, the values of all passes are scaled as if
they were rendered with the full number of samples. This way, any code operating
on these buffers, for example the denoiser, does not need to be changed for
per-pixel sample counts.
Reviewed By: brecht, #cycles
Differential Revision: https://developer.blender.org/D4686
|
|
The OptiX denoiser can be a great help when rendering in the viewport, since it is really fast
and needs few samples to produce convincing results. This patch therefore adds support for
using any Cycles denoiser in the viewport also (but only the OptiX one is selectable because
the NLM one is too slow to be usable currently). It also adds support for denoising on a
different device than rendering (so one can e.g. render with the CPU but denoise with OptiX).
Reviewed By: #cycles, brecht
Differential Revision: https://developer.blender.org/D6554
|
|
This patch adds support for the OptiX denoiser as an alternative to the existing NLM denoiser in Cycles. It's re-using the same denoising architecture based on tiles and therefore implicitly also works with multiple GPUs.
Reviewed By: sergey
Differential Revision: https://developer.blender.org/D6395
|
|
The OptiX intersection program for curves uses "optixGetObjectRayDirection"
to get the ray direction in object space (which was inverse transformed
with the current transformation matrix). OptiX does no additional operations
on it, so if there is a scaling transform, the direction is not normalized.
But the curve intersection routine expects that. In addition, the distances
used in "optixGetRayTmax()" and "optixReportIntersection()" are in world
space, so need to adjust them accordingly.
|
|
This adds all the kernel side changes for the Optix backend.
Ref D5363
|
|
Differential Revision: https://developer.blender.org/D5326
|
|
|
|
Apply clang format as proposed in T53211.
For details on usage and instructions for migrating branches
without conflicts, see:
https://wiki.blender.org/wiki/Tools/ClangFormat
|
|
Part of the cleanup of the OpenCL codebase.
Single program is not effective when using OpenCL, it is slower
to compile and slower during rendering (when used in for example
`barbershop` or `victor`).
Reviewers: brecht, #cycles
Maniphest Tasks: T62267
Differential Revision: https://developer.blender.org/D4481
|
|
The goal of this patch is to have limit the number of times
kernels needs to be compiled and are reused as kernels with
different compile directives can lead to identical same
binaries.
The implementation does this by stripping the compile directives.
and reshuffling kernels so the output is more likely to be the
same.
We focussed on the kernels where it was easy to detect and maintain
(bundle, bake, displace, do_volume and background). More optimizations
could be done but they are probably less obvious.
Merged the data_init and state_buffer_size kernels to split_bundle.
This patch will also remove empty kernels for do_volume and bake
when their features are not enabled.
When using the benchmark files there are less background, bake and
do_volume kernels compiled.
Fix: T61576, T61501, T61466
Reviewed By: brecht, #cycles
Differential Revision: https://developer.blender.org/D4390
|
|
|
|
The bake kernels are also used during mesh displacement and light
importance sampling. We disabled the implementation of these kernels
when baking was not enabled.
|
|
Using OpenCL MegaKernel has been slow and therefore not usefull.
This patch will remove the mega kernel from the OpenCL codebase
and the OpenCLDeviceBase class.
T61736: removal of mega kernel
T61703: baking does not work with mega kernel
Tags: #cycles
Differential Revision: https://developer.blender.org/D4383
|
|
Cycles OpenCL: Split baking kernels in own program
Fix T61463. Before this patch baking was part of the base kernels. There
are 3 baking kernels that and all 3 uses shader evaluation. Only for one
of these kernels the functionality was wrapped in the __NO_BAKING__
compile directive.
When you start baking this leads to long compile times. By separating
in individual programs will reduce the compile times.
Also wrapped all baking kernels with __NO_BAKING__ to reduce the
compilation times.
Impact on compilation time
job | scene_name | previous | new | percentage
--------+-----------------+----------+-------+------------
T61463 | empty | 10.63 | 7.27 | 32%
T61463 | bmw | 17.91 | 14.24 | 20%
T61463 | fishycat | 19.57 | 15.08 | 23%
T61463 | barbershop | 54.10 | 48.18 | 11%
T61463 | classroom | 17.55 | 14.42 | 18%
T61463 | koro | 18.92 | 17.15 | 9%
T61463 | pavillion | 17.43 | 14.23 | 18%
T61463 | splash279 | 16.48 | 15.33 | 7%
T61463 | volume_emission | 36.22 | 34.19 | 6%
Impact on render time
job | scene_name | previous | new | percentage
--------+-----------------+----------+---------+------------
T61463 | empty | 21.06 | 20.54 | 2%
T61463 | bmw | 198.44 | 189.59 | 4%
T61463 | fishycat | 394.20 | 388.50 | 1%
T61463 | barbershop | 1188.16 | 1185.49 | 0%
T61463 | classroom | 341.08 | 339.27 | 1%
T61463 | koro | 472.43 | 360.70 | 24%
T61463 | pavillion | 905.77 | 902.14 | 0%
T61463 | splash279 | 55.26 | 54.92 | 1%
T61463 | volume_emission | 62.59 | 39.09 | 38%
I don't have a grounded explanation why koro and volume_emission is this much
faster; I have done several tests though...
Maniphest Tasks: T61463
Differential Revision: https://developer.blender.org/D4376
|
|
This patch implements a workaround to get the multithreaded compilation from D2231 working.
So far, it only works for Blender, not for Cycles Standalone. Also, I have only tested the Linux codepath in the helper function.
Depends on D2231.
Patch by lukasstockner97, jbakker, brecht
job | scene_name | compilation_time
----------+-----------------+------------------
Baseline | empty | 22.73
D2264 | empty | 13.94
Baseline | bmw | 56.44
D2264 | bmw | 41.32
Baseline | fishycat | 59.50
D2264 | fishycat | 45.19
Baseline | barbershop | 212.28
D2264 | barbershop | 169.81
Baseline | victor | 67.51
D2264 | victor | 53.60
Baseline | classroom | 51.46
D2264 | classroom | 39.02
Baseline | koro | 62.48
D2264 | koro | 49.03
Baseline | pavillion | 54.37
D2264 | pavillion | 38.82
Baseline | splash279 | 47.43
D2264 | splash279 | 37.94
Baseline | volume_emission | 145.22
D2264 | volume_emission | 121.10
This patch reduced compilation time as the split kernels and base
kernels are compiled in parallel. In cycles debug mode (256) you can set
unmark the opencl single program file, what reduces the compilation time
even further (bmw 17 seconds, barbershop 53 seconds).
Reviewers: brecht, dingto, sergey, juicyfruit, lukasstockner97
Reviewed By: brecht
Subscribers: Loner, jbakker, candreacchio, 3dLuver, LazyDodo, bliblubli
Differential Revision: https://developer.blender.org/D2264
|
|
This is the internal implementation, not available from the API or
interface yet. The algorithm takes into account past and future frames,
both to get more coherent animation and reduce noise.
Ref D3889.
|
|
Prefiltering of feature passes will happen during rendering, which can
then be used for denoising immediately or written as a render pass for
later (animation) denoising.
The number of denoising data passes written is reduced because of this,
leaving out the feature variance passes. The passes are now Normal,
Albedo, Depth, Shadowing, Variance and Intensity.
Ref D3889.
|
|
There may still be rendering errors when used for older graphics cards.
|