Age | Commit message (Collapse) | Author |
|
The calculation based on preserving device occupancy was conflicting
with the fact that time limit needs to render less samples at the last
round of render work.
For example, rendering BMW27 for 30sec on i9-11900k was actually
rendering for almost a minute. Now the render time limit is respected
much more close.
Differential Revision: https://developer.blender.org/D13269
|
|
Avoid integer overflow.
|
|
Need to clamp scaled render buffers window to be above zero
when applying resolution divider.
|
|
Adds a method to profiler that can be used to check if it is active.
This is used to determine if stop_profiling and start_profiling
should be called.
| patch | Juans Scene UI 256 samples | Juans Scene bg 256 samples | junkshop UI | junkshop bg |
| No patch | 6:16.59 | 4:05.37 | 2:08.48 | 1:59.7 |
| D13187 | 4:12.15 | 3:57.36 | 2:07.25 | 1:58.16 |
| D13185 | 4.11.18 |3:54.74 | 2:07.44 | 1:58.03 |
| D13190 | 4:12.39 | 3:55.42 | 2:07.62 | 1:58.68 |
UI - means rendered from within Blender
bg - means rendered from the command line using ##blender -b scene.blend -f 1##
Reviewed By: sergey, brecht
Maniphest Tasks: T92601
Differential Revision: https://developer.blender.org/D13190
|
|
The issue was caused by splitting happening twice.
Fixed by checking for split flag which is assigned to the both states
during split.
The tricky part was to write catcher data at the moment of split: the
transparency and shadow catcher sample count is to be accumulated at
that point. Now it is happening in the `intersect_closest` kernel.
The downside is that render buffer is to be passed to the kernel, but
the benefit is that extra split bounce check is not needed now.
Had to move the passes write to shadow catcher header, since include
of `film/passes.h` causes all the fun of requirement to have BSDF
data structures available.
Differential Revision: https://developer.blender.org/D13177
|
|
|
|
Adds a bunch of CPU kernel function to process on row of pixels, and use those
instead of calling unoptimized implementations.
Fixes T92598
|
|
When reading pixels for virtual passes like diffuse, that sum diffuse direct
and indirect passes, we do not need them to exist with an offset in the render
buffer.
|
|
|
|
Instead of printing debug flags listing various CPU and GPU settings that
may or may not be used, print when we are using them. This include CPU
kernel types, OptiX debugging and CUDA and HIP adaptive compilation. BVH
type was already printed.
|
|
Adds scrambling distance to the PMJ sampler. This is based
on the work by Mathieu Menuet in D12318 who created the original
implementation for the Sobol sampler.
Reviewed By: brecht
Maniphest Tasks: T92181
Differential Revision: https://developer.blender.org/D12854
|
|
Cycles:Distance Scrambling for Cycles Sobol Sampler
This option implements micro jittering an is based on the INRIA
research paper [[ https://hal.inria.fr/hal-01325702/document | on micro jittering ]]
and work by Lukas Stockner for implementing the scrambling distance.
It works by controlling the correlation between pixels by either using
a user supplied value or an adaptive algorithm to limit the maximum
deviation of the sample values between pixels.
This is a follow up of https://developer.blender.org/D12316
The PMJ version can be found here: https://developer.blender.org/D12511
Reviewed By: leesonw
Differential Revision: https://developer.blender.org/D12318
|
|
Remove prefix of filenames that is the same as the folder name. This used
to help when #includes were using individual files, but now they are always
relative to the cycles root directory and so the prefixes are redundant.
For patches and branches, git merge and rebase should be able to detect the
renames and move over code to the right file.
|
|
* Split render/ into scene/ and session/. The scene/ folder now contains the
scene and its nodes. The session/ folder contains the render session and
associated data structures like drivers and render buffers.
* Move top level kernel headers into new folders kernel/camera/, kernel/film/,
kernel/light/, kernel/sample/, kernel/util/
* Move integrator related kernel headers into kernel/integrator/
* Move OSL shaders from kernel/shaders/ to kernel/osl/shaders/
For patches and branches, git merge and rebase should be able to detect the
renames and move over code to the right file.
|
|
|
|
Add a Fast GI Method, either Replace for the existing behavior, or Add
to add ambient occlusion like the old world settings.
This replaces the old Ambient Occlusion settings in the world properties.
|
|
Panning in camera view makes the border to be modified, which was causing
the Cycles display to believe the rendered result is unusable.
The solution is to draw the render result at the display parameters it was
updated for. This allows to avoid flickering during panning, zooming, and
camera FOV changes. The suboptimal aspect of this is that it has some jelly
effect, although it is on the same level as jelly effect of object outline
so it is not terrible.
Differential Revision: https://developer.blender.org/D12970
|
|
|
|
Similar to main path compaction that happens before adding work tiles, this
compacts shadow paths before launching kernels that may add shadow paths.
Only do it when more than 50% of space is wasted.
It's not a clear win in all scenes, some are up to 1.5% slower. Likely caused
by different order of scheduling kernels having an unpredictable performance
impact. Still feels like compaction is just the right thing to avoid cases
where a few shadow paths can hold up a lot of main paths.
Differential Revision: https://developer.blender.org/D12944
|
|
Easy now thanks to the main and shadow path decoupling. Doesn't help
in an benchmark scene except Spring, where it reduces render time by
maybe 2-3%.
Ref T87836
|
|
|
|
Taking advantage of the new decoupled main and shadow paths. For CPU we
just store two nested structs in the integrator state, one for direct light
shadows and one for AO. For the GPU we restrict the number of shade surface
states to be executed based on available space in the shadow paths queue.
This also helps improve performance in benchmark scenes with an AO pass,
since it is no longer needed to use the shader raytracing kernel there,
which has worse performance.
Differential Revision: https://developer.blender.org/D12900
|
|
These transparent shadows can be expansive to evaluate. Especially on the
GPU they can lead to poor occupancy when only some pixels require many kernel
launches to trace and evaluate many layers of transparency.
Baked transparency allows tracing a single ray in many cases by accumulating
the throughput directly in the intersection program without recording hits
or evaluating shaders. Transparency is baked at curve vertices and
interpolated, for most shaders this will look practically the same as actual
shader evaluation.
Fixes T91428, performance regression with spring demo file due to transparent
hair, and makes it render significantly faster than Blender 2.93.
Differential Revision: https://developer.blender.org/D12880
|
|
The motivation for this is twofold. It improves performance (5-10% on most
benchmark scenes), and will help to bring back transparency support for the
ambient occlusion pass.
* Duplicate some members from the main path state in the shadow path state.
* Add shadow paths incrementally to the array similar to what we do for
the shadow catchers.
* For the scheduling, allow running shade surface and shade volume kernels
as long as there is enough space in the shadow paths array. If not, execute
shadow kernels until it is empty.
* Add IntegratorShadowState and ConstIntegratorShadowState typedefs that
can be different between CPU and GPU. For GPU both main and shadow paths
juse have an integer for SoA access. Bt with CPU it's a different pointer
type so we get type safety checks in code shared between CPU and GPU.
* For CPU, add a separate IntegratorShadowStateCPU struct embedded in
IntegratorShadowState.
* Update various functions to take the shadow state, and make SVM take either
type of state using templates.
Differential Revision: https://developer.blender.org/D12889
|
|
The thread affinity setting in OIDN can break multithreading on some CPUs.
While this leads to somewhat worse performance on CPUs that do work correctly,
it's better than having some CPUs use only half the cores.
|
|
This is only used by a single device, not need for thread safety.
|
|
* Rename struct KernelGlobals to struct KernelGlobalsCPU
* Add KernelGlobals, IntegratorState and ConstIntegratorState typedefs
that every device can define in its own way.
* Remove INTEGRATOR_STATE_ARGS and INTEGRATOR_STATE_PASS macros and
replace with these new typedefs.
* Add explicit state argument to INTEGRATOR_STATE and similar macros
In preparation for decoupling main and shadow paths.
Differential Revision: https://developer.blender.org/D12888
|
|
|
|
|
|
The assumption about absent shadow path was wrong.
The rest of the changes are to ensure shadow paths are finished prior
to the split, so that they write to the proper passes.
The issue was caught by running regression tests on OptiX.
Differential Revision: https://developer.blender.org/D12857
|
|
The pixel accessor was not aware of possible offset in the
pixel padding causing some slices of the result not being
properly padded.
|
|
Ensure math happens on size_t type instead of int followed by a cast
to the size_t.
|
|
Need to check allocation size, as the features do not change with
volume stack depth detection.
|
|
It got missed in some of previous development.
Can not see a reason why the line needed to be removed, maybe just some
accident.
|
|
The state template iteration had difficult time dealing with 0-sized
arrays, causing iteration for until integer overflows.
|
|
The overscan change from D12599 lacked proper handling of window
when slicing buffer for multiple devices.
|
|
Make volume stack allocated conditionally, potentially based on the
actual nested level of objects in the scene.
Currently the nested level is estimated by number of volume objects.
This is a non-expensive check which is probably enough in practice
to get almost perfect memory usage and performance.
The conditional allocation is a bit tricky.
For the CPU we declare and define maximum possible volume stack,
because there are only that many integrator states on the CPU.
On the GPU we declare outer SoA to have all volume stack elements,
but only allocate actually needed ones. The actually used volume
stack size is passed as a pre-processor, which seems to be easiest
and fastest for the GPU state copy.
There seems to be no speed regression in the demo files on RTX6000.
Note that scenes with high nested level of volume will now be slower
but correct.
Differential Revision: https://developer.blender.org/D12759
|
|
Implement an overscan support for tiles, so that adaptive sampling can
rely on the pixels neighbourhood.
Differential Revision: https://developer.blender.org/D12599
|
|
Currently was only used for logging, but better to fix the size so
that it matches reality.
The issue was caused by decoupling number of shadow intersections
and using much higher number for CPU. This caused the total state
on GPU to be logged as 10s of gigabytes instead of 100s of megabytes.
Differential Revision: https://developer.blender.org/D12755
|
|
This previously only work for CPU rendering, and isn't that practical to get
working in the new architecture.
|
|
* Add OutputDriver, replacing function callbacks in Session.
* Add PathTraceTile, replacing tile access methods in Session.
* Add more detailed comments about how this driver should be implemented.
* Add OIIOOutputDriver for Cycles standalone to output an image.
Differential Revision: https://developer.blender.org/D12627
|
|
* Split GPUDisplay into two classes. PathTraceDisplay to implement the Cycles side,
and DisplayDriver to implement the host application side. The DisplayDriver is now
a fully abstract base class, embedded in the PathTraceDisplay.
* Move copy_pixels_to_texture implementation out of the host side into the Cycles side,
since it can be implemented in terms of the texture buffer mapping.
* Move definition of DeviceGraphicsInteropDestination into display driver header, so
that we do not need to expose private device headers in the public API.
* Add more detailed comments about how the DisplayDriver should be implemented.
The "driver" terminology might not be obvious, but is also used in other renderers.
Differential Revision: https://developer.blender.org/D12626
|
|
This struct is much bigger now, and does not actually need to be fully zero
initialized.
|
|
So we can do fewer intersection calls, only on the GPU do we need to save
memory and do this in small steps.
Ref T87836
|
|
Noticed while looking into flickering issues in viewport.
Doesn't seem to solve the flicker issue for me, but is something
what is supposed to be happening anyway.
Differential Revision: https://developer.blender.org/D12673
|
|
NOTE: this feature is not ready for user testing, and not yet enabled in daily
builds. It is being merged now for easier collaboration on development.
HIP is a heterogenous compute interface allowing C++ code to be executed on
GPUs similar to CUDA. It is intended to bring back AMD GPU rendering support
on Windows and Linux.
https://github.com/ROCm-Developer-Tools/HIP.
As of the time of writing, it should compile and run on Linux with existing
HIP compilers and driver runtimes. Publicly available compilers and drivers
for Windows will come later.
See task T91571 for more details on the current status and work remaining
to be done.
Credits:
Sayak Biswas (AMD)
Arya Rafii (AMD)
Brian Savery (AMD)
Differential Revision: https://developer.blender.org/D12578
|
|
Only do denoising on the full-frame result. Saves render time.
Can re-consider in the future when/if we'll want to support
denoising during rendering (similar to viewport) to allow artists
to stop rendering when they see image to be good enough. Until
there is a design for that workflow stick to a more time efficient
rendering.
Differential Revision: https://developer.blender.org/D12662
|
|
Expose them to the interface, and stop rendering as soon as possible.
Differential Revision: https://developer.blender.org/D12617
|
|
Protect against integer overflow.
|
|
|