Age | Commit message (Collapse) | Author |
|
This reduces code duplication between the CUDA and OptiX device implementations: The CUDA device
class is now split into declaration and definition (similar to the OpenCL device) and the OptiX device
class implements that and only overrides the functions it actually has to change, while using the CUDA
implementation for everything else.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D6814
|
|
The OptiX denoiser can be a great help when rendering in the viewport, since it is really fast
and needs few samples to produce convincing results. This patch therefore adds support for
using any Cycles denoiser in the viewport also (but only the OptiX one is selectable because
the NLM one is too slow to be usable currently). It also adds support for denoising on a
different device than rendering (so one can e.g. render with the CPU but denoise with OptiX).
Reviewed By: #cycles, brecht
Differential Revision: https://developer.blender.org/D6554
|
|
In the current OpenCL implementation we have a work-around for platforms
that didn't support NULL pointers. We used to replace all NULLs and
empty arrays with a pointer to a single byte on the OpenCL Device.
During investigation of {T65924} it was asked to remove this work-around
for testing. This change improves the render times.
SCENE | BEFORE | AFTER
--------------------+--------+-------
bmw27 | 108 | 89
barbershop_interior | 867 | 673
classroom | 270 | 173
fishy_cat | 244 | 196
koro | 249 | 207
pavillon_barcelona | 582 | 414
Note that this change does not fix T65924 it just improves the
rendering performance for OpenCL. We haven't tested this patch on all
platforms so we should keep an eye out on the tracker.
Reviewed By: sergey
Differential Revision: https://developer.blender.org/D6391
|
|
OpenCL Parallel compilation only works inside Blender. When using cycles in a different setup (standaline or other software) it failed compiling kernels as they don't have the appropriate Python API and command line arguments.
This change introduces a `running_inside_blender` debug flag, that triggers out of process compilation of the kernels. Compilation still happens in subthread that enabled the preview kernels and compilation of the kernels during BVH building
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D5439
|
|
|
|
|
|
Apply clang format as proposed in T53211.
For details on usage and instructions for migrating branches
without conflicts, see:
https://wiki.blender.org/wiki/Tools/ClangFormat
|
|
|
|
No functional changes, logic here got too complex after many changes over
the years.
|
|
The main goals of this change is faster starting when using foreground
rendering.
This patch will build kernels in parallel to the update process of
the scene. When these optimized kernels are not available (yet) an AO
kernel will be used.
These AO kernels are fast to compile (3-7 seconds) and can be
reused by all scenes. When the final kernels become available we
will switch to these kernels.
In background mode the AO kernels will not be used.
Some kernels are being used during Scene update (displace, background
light). When these kernels are being used the process can halt until
these become available.
Reviewed By: brecht, #cycles
Maniphest Tasks: T61752
Differential Revision: https://developer.blender.org/D4428
|
|
The functions that determine the program name + filename of kernels
were missing some base kernels like denoising and base. For completeness
I added those kernels so the function returns the correct results.
|
|
This patch will reduce the number of times that we need to
recompile kernels. It does this by (en/dis)abling features
by default. So when the user needs them that the kernels are
already available.
Other features are enabled by default for background and foreground
rendering. When in background rendering the user wants the best
render performance. When in foreground rendering the user wants
the least amount of recompilations.
Enabling volumetrics or subdivision evaluation will still trigger
a recompilation during foreground rendering.
Reviewed By: #cycles, brecht
Differential Revision: https://developer.blender.org/D4485
|
|
Part of the cleanup of the OpenCL codebase.
Single program is not effective when using OpenCL, it is slower
to compile and slower during rendering (when used in for example
`barbershop` or `victor`).
Reviewers: brecht, #cycles
Maniphest Tasks: T62267
Differential Revision: https://developer.blender.org/D4481
|
|
|
|
|
|
introduced by rBdabe5cd31add8aa55b9ad4bce1b591ed4e98f1a1
|
|
Displacement and Background kernels are selectively used, but always compiled. This patch will not compile these kernels when they are not needed.
Displacement kernel is only used for true displacement.
Background kernel is only used when there is a (Cycles)Light of type `LIGHT_BACKGROUND`.
Reviewed By: brecht, #cycles
Tags: #cycles
Maniphest Tasks: T61971
Differential Revision: https://developer.blender.org/D4412
|
|
The goal of this patch is to have limit the number of times
kernels needs to be compiled and are reused as kernels with
different compile directives can lead to identical same
binaries.
The implementation does this by stripping the compile directives.
and reshuffling kernels so the output is more likely to be the
same.
We focussed on the kernels where it was easy to detect and maintain
(bundle, bake, displace, do_volume and background). More optimizations
could be done but they are probably less obvious.
Merged the data_init and state_buffer_size kernels to split_bundle.
This patch will also remove empty kernels for do_volume and bake
when their features are not enabled.
When using the benchmark files there are less background, bake and
do_volume kernels compiled.
Fix: T61576, T61501, T61466
Reviewed By: brecht, #cycles
Differential Revision: https://developer.blender.org/D4390
|
|
|
|
|
|
|
|
Using OpenCL MegaKernel has been slow and therefore not usefull.
This patch will remove the mega kernel from the OpenCL codebase
and the OpenCLDeviceBase class.
T61736: removal of mega kernel
T61703: baking does not work with mega kernel
Tags: #cycles
Differential Revision: https://developer.blender.org/D4383
|
|
Cycles OpenCL: Split baking kernels in own program
Fix T61463. Before this patch baking was part of the base kernels. There
are 3 baking kernels that and all 3 uses shader evaluation. Only for one
of these kernels the functionality was wrapped in the __NO_BAKING__
compile directive.
When you start baking this leads to long compile times. By separating
in individual programs will reduce the compile times.
Also wrapped all baking kernels with __NO_BAKING__ to reduce the
compilation times.
Impact on compilation time
job | scene_name | previous | new | percentage
--------+-----------------+----------+-------+------------
T61463 | empty | 10.63 | 7.27 | 32%
T61463 | bmw | 17.91 | 14.24 | 20%
T61463 | fishycat | 19.57 | 15.08 | 23%
T61463 | barbershop | 54.10 | 48.18 | 11%
T61463 | classroom | 17.55 | 14.42 | 18%
T61463 | koro | 18.92 | 17.15 | 9%
T61463 | pavillion | 17.43 | 14.23 | 18%
T61463 | splash279 | 16.48 | 15.33 | 7%
T61463 | volume_emission | 36.22 | 34.19 | 6%
Impact on render time
job | scene_name | previous | new | percentage
--------+-----------------+----------+---------+------------
T61463 | empty | 21.06 | 20.54 | 2%
T61463 | bmw | 198.44 | 189.59 | 4%
T61463 | fishycat | 394.20 | 388.50 | 1%
T61463 | barbershop | 1188.16 | 1185.49 | 0%
T61463 | classroom | 341.08 | 339.27 | 1%
T61463 | koro | 472.43 | 360.70 | 24%
T61463 | pavillion | 905.77 | 902.14 | 0%
T61463 | splash279 | 55.26 | 54.92 | 1%
T61463 | volume_emission | 62.59 | 39.09 | 38%
I don't have a grounded explanation why koro and volume_emission is this much
faster; I have done several tests though...
Maniphest Tasks: T61463
Differential Revision: https://developer.blender.org/D4376
|
|
This patch implements a workaround to get the multithreaded compilation from D2231 working.
So far, it only works for Blender, not for Cycles Standalone. Also, I have only tested the Linux codepath in the helper function.
Depends on D2231.
Patch by lukasstockner97, jbakker, brecht
job | scene_name | compilation_time
----------+-----------------+------------------
Baseline | empty | 22.73
D2264 | empty | 13.94
Baseline | bmw | 56.44
D2264 | bmw | 41.32
Baseline | fishycat | 59.50
D2264 | fishycat | 45.19
Baseline | barbershop | 212.28
D2264 | barbershop | 169.81
Baseline | victor | 67.51
D2264 | victor | 53.60
Baseline | classroom | 51.46
D2264 | classroom | 39.02
Baseline | koro | 62.48
D2264 | koro | 49.03
Baseline | pavillion | 54.37
D2264 | pavillion | 38.82
Baseline | splash279 | 47.43
D2264 | splash279 | 37.94
Baseline | volume_emission | 145.22
D2264 | volume_emission | 121.10
This patch reduced compilation time as the split kernels and base
kernels are compiled in parallel. In cycles debug mode (256) you can set
unmark the opencl single program file, what reduces the compilation time
even further (bmw 17 seconds, barbershop 53 seconds).
Reviewers: brecht, dingto, sergey, juicyfruit, lukasstockner97
Reviewed By: brecht
Subscribers: Loner, jbakker, candreacchio, 3dLuver, LazyDodo, bliblubli
Differential Revision: https://developer.blender.org/D2264
|
|
Multi-device was not passing along profiler to the CPU.
|
|
It is supposed to be two spaces before comment stating which if
else/endif statements corresponds to. Was mainly violated in the
header guards.
|
|
Mainly useful for debugging. Previously, when AVX2 was disabled
in the debug panel but BVH layout was kept on BVH8 nothing was
rendered.
Needed to make it so supported BVH layout mask for devices is
queried in "dynamic", so it is possible to use DebugFlags there.
|
|
This deduplicates the calls for tile (un)mapping and allows to have a target buffer that is different from the source buffer (needed for baking and animation denoising).
|
|
|
|
Also try to move them from headers to implementation files as much as possible.
|
|
Reviewers: sergey, brecht
Differential Revision: https://developer.blender.org/D2920
|
|
|
|
Some drivers may report very large allocation sizes, which could cause
unnecessary memory usage. This is now limited to 2gb which should
still be enough to get the needed performance benefits without waste.
|
|
|
|
* Remove tex_* and pixels_* functions, replace by mem_*.
* Add MEM_TEXTURE and MEM_PIXELS as memory types recognized by devices.
* No longer create device_memory and call mem_* directly, always go
through device_only_memory, device_vector and device_pixels.
|
|
|
|
|
|
|
|
* Use common TextureInfo struct for all devices, except CUDA fermi.
* Move image sampling code to kernels/*/kernel_*_image.h files.
* Use arrays for data textures on Fermi too, so device_vector<Struct> works.
|
|
A little faster on some benchmark scenes, a little slower on others, seems
about performance neutral on average and saves a little memory.
|
|
This is a bit confusing, especially when one mixes OpenCL code where ulong equals
to uint64_t with CPU side code where ulong is expected to be something else from
the naming.
This commit makes it so we use explicit name, common on all platforms.
|
|
Image textures were being packed into a single buffer for OpenCL, which
limited the amount of memory available for images to the size of one
buffer (usually 4gb on AMD hardware). By packing textures into multiple
buffers that limit is removed, while simultaneously reducing the number
of buffers that need to be passed to each kernel.
Benchmarks were within 2%.
Fixes T51554.
Differential Revision: https://developer.blender.org/D2745
|
|
|
|
|
|
|
|
Some of the functions might have been inlined, but others i don't see
how that was possible (don't think virtual functions can be inlined here).
In any case, better be explicitly optimal in the code.
|
|
Technically not passing all buffers used by a kernel is undefined
behavior. We haven't had any issues with this so far on AMD or
Nvidia, but it's known to be a problem with Intel and we received
a report from AMD that this is a problem on newer hardware, so we
need to make this change at some point.
Unfortunately there a cost to being correct, about 5% for the
benchmark scenes. For low sample counts it's even worse, I've
seen up to 50% slowdown. For the latter case I think adjusting
tile updating logic can help, but not sure what that would look
like yet (it would be just a few lines change however).
|
|
This commit contains the first part of the new Cycles denoising option,
which filters the resulting image using information gathered during rendering
to get rid of noise while preserving visual features as well as possible.
To use the option, enable it in the render layer options. The default settings
fit a wide range of scenes, but the user can tweak individual settings to
control the tradeoff between a noise-free image, image details, and calculation
time.
Note that the denoiser may still change in the future and that some features
are not implemented yet. The most important missing feature is animation
denoising, which uses information from multiple frames at once to produce a
flicker-free and smoother result. These features will be added in the future.
Finally, thanks to all the people who supported this project:
- Google (through the GSoC) and Theory Studios for sponsoring the development
- The authors of the papers I used for implementing the denoiser (more details
on them will be included in the technical docs)
- The other Cycles devs for feedback on the code, especially Sergey for
mentoring the GSoC project and Brecht for the code review!
- And of course the users who helped with testing, reported bugs and things
that could and/or should work better!
|
|
|
|
|