Age | Commit message (Collapse) | Author |
|
OptiX always uses record-all behavior for transparent shadow rays, but did not check
whether the maximum number of hits exceeded the shadow hit stack. This fixes that.
|
|
Custom render passes are added in the Shader AOVs panel in the view layer
settings, with a name and data type. In shader nodes, an AOV Output node
is then used to output either a value or color to the pass.
Arbitrary names can be used for these passes, as long as they don't conflict
with built-in passes that are enabled. The AOV Output node can be used in both
material and world shader nodes.
Implemented by Lukas, with tweaks by Brecht.
Differential Revision: https://developer.blender.org/D4837
|
|
This adds all the kernel side changes for the Optix backend.
Ref D5363
|
|
CUDA is working correct without it now, and it's more efficient not to do this.
Ref D5363
|
|
|
|
This never really worked as it was supposed to. The main goal of this is to
turn noise from sampling tiny hairs into multiple layers of transparency that
do not need to be sampled stochastically. However the implementation of this
worked by randomly discarding hair intersections in BVH traversal, which
defeats the purpose.
If it ever comes back, it's best implemented outside the kernel as a preprocess
that changes hair radius before BVH building. This would also make it work with
Embree, where it's not supported now. But it's not so clear anymore that with
many AA samples and GPU rendering this feature is as helpful as it once was for
CPU raytracers with few AA samples.
The benefit of removing this feature is improved hair ray tracing performance,
tested on NVIDIA Titan Xp:
bmw27: +0.37%
classroom: +0.26%
fishy_cat: -7.36%
koro: -12.98%
pabellon: -0.12%
Differential Revision: https://developer.blender.org/D4532
|
|
Apply clang format as proposed in T53211.
For details on usage and instructions for migrating branches
without conflicts, see:
https://wiki.blender.org/wiki/Tools/ClangFormat
|
|
It is supposed to be two spaces before comment stating which if
else/endif statements corresponds to. Was mainly violated in the
header guards.
|
|
We now continue transparent paths after diffuse/glossy/transmission/volume
bounces are exceeded. This avoids unexpected boundaries in volumes with
transparent boundaries. It is also required for MIS to work correctly with
transparent surfaces, as we also continue through these in shadow rays.
The main visible changes is that volumes will now be lit by the background
even at volume bounces 0, same as surfaces.
Fixes T53914 and T54103.
|
|
With a Titan Xp, reduces path trace local memory from 1092MB to 840MB.
Benchmark performance was within 1% with both RX 480 and Titan Xp.
Original patch was implemented by Sergey.
Differential Revision: https://developer.blender.org/D2249
|
|
|
|
Similar to what we did for area lights previously, this should help
preserve stratification when using multiple BSDFs in theory. Improvements
are not easily noticeable in practice though, because the number of BSDFs
is usually low. Still nice to eliminate one sampling dimension.
|
|
|
|
Also moved code out of deep-inside ifdef block, otherwise it was quite confusing.
|
|
|
|
Need to exit the volume stack when shadow ray laves the medium.
Thanks Brecht for review and help in troubleshooting!
|
|
|
|
This was needed when we accessed OSL closure memory after shader evaluation,
which could get overwritten by another shader evaluation. But all closures
are immediatley converted to ShaderClosure now, so no longer needed.
|
|
Also pass by value and don't write back now that it is just a hash for seeding
and no longer an LCG state. Together this makes CUDA a tiny bit faster in my
tests, but mainly simplifies code.
|
|
Since all the shadow catchers are already assumed to be in the footage,
the shadows they cast on each other are already in the footage too. So
don't just let shadow catchers skip self, but all shadow catchers.
Another justification is that it should not matter if the shadow catcher
is modeled as one object or multiple separate objects, the resulting
render should be the same.
Differential Revision: https://developer.blender.org/D2763
|
|
|
|
|
|
It uses an idea of accumulating all possible light reachable across the
light path (without taking shadow blocked into account) and accumulating
total shaded light across the path. Dividing second figure by first one
seems to be giving good estimate of the shadow.
In fact, to my knowledge, it's something really similar to what is
happening in the denoising branch, so we are aligned here which is good.
The workflow is following:
- Create an object which matches real-life object on which shadow is
to be catched.
- Create approximate similar material on that object.
This is needed to make indirect light properly affecting CG objects
in the scene.
- Mark object as Shadow Catcher in the Object properties.
Ideally, after doing that it will be possible to render the image and
simply alpha-over it on top of real footage.
|
|
This commit enables record-all transparent shadows rays.
Perfromance results:
R9 290 render time (without synchronization), seconds
Before After Change
BMW 261.5 262.5 +0.4 %
Classroom 869.6 867.3 -0.3 %
Fishy Cat 657.4 639.8 -2.7 %
Koro 1909.8 692.8 -63.7 %
Pabellon Barcelona 1633.3 1238.0 -24.2 %
Pabellon Barcelona(*) 1158.1 903.8 -22.0 %
(*) without glossy connected to volume
|
|
Decoupled ray marching is not supported yet.
Transparent shadows are always enabled for volume rendering.
Changes in kernel/bvh and kernel/geom are from Sergey.
This simiplifies code significantly, and prepares it for
record-all transparent shadow function in split kernel.
|
|
|
|
This does a few things at once:
- Refactors host side split kernel logic into a new device
agnostic class `DeviceSplitKernel`.
- Removes tile splitting, a new work pool implementation takes its place and
allows as many threads as will fit in memory regardless of tile size, which
can give performance gains.
- Refactors split state buffers into one buffer, as well as reduces the
number of arguments passed to kernels. Means there's less code to deal
with overall.
- Moves kernel logic out of OpenCL kernel files so they can later be used by
other device types.
- Replaced OpenCL specific APIs with new generic versions
- Tiles can now be seen updating during rendering
|
|
|
|
|
|
|
|
|
|
Seems CUDA failed to de-duplicate the array across multiple inlined
versions of the shadow_blocked(). Helped it a bit with that now.
Gives about 100MB memory improvement on a scenes after previous
commit and brings up memory "regression" to only 100MB comparing to
the master branch now.
|
|
The idea is to record all possible transparent intersections when
shooting transparent ray on GPU (similar to what we were doing on
CPU already).
This avoids need of doing whole ray-to-scene intersections queries
for each intersection and speeds up a lot cases like transparent
hair in the cost of extra memory.
This commit is a base ground for now and this feature is kept
disabled for until some further tweaks.
|
|
|
|
Now we break the traversal cycle and then perform volume attenuation
and check with zero throughput. Not sure it makes any measurable sense
at this moment, but in the future it might help de-duplicating some
extra logic here.
|
|
Fair amount of code was duplicated for CPU and GPU, now we are
using inlined function to avoid such duplication.
|
|
Mostly this is making inlining match CUDA 7.5 in a few performance critical
places. The end result is that performance is now better than before, possibly
due to less register spilling or other CUDA 8.0 compiler improvements.
On benchmarks scenes, there are 3% to 35% render time reductions. Stack memory
usage is reduced a little too.
Reviewed By: sergey
Differential Revision: https://developer.blender.org/D2269
|
|
Just to make it easier to research ways of possible code de-duplication.
|
|
|
|
While they prevent legit write past the array boundary error
those fixes introduced regression in behavior when having exact
max_hits transparent intersections and nothing else.
Previous code would have considered such case a totally opaque,
but it's not correct.
Fixes T48941: Some materials don't get transparent shadows anymore
|
|
It was possible to miss bounces termination criteria in this functions,
mainly when max_hits was set to 0.
Made the check more robust in traversal functions (which should not
affect performance, it's an operation of same complexity AFAIK).
Also avoid doing ray-scene intersection from shadow_blocked when
limit of transparent bounces was already reached.
|
|
Glossy, Anisotropic and Glass BSDFs
This commit adds a new distribution to the Glossy, Anisotropic and Glass BSDFs that implements the
multiple-scattering microfacet model described in the paper "Multiple-Scattering Microfacet BSDFs with the Smith Model".
Essentially, the improvement is that unlike classical GGX, which only models single scattering and assumes
the contribution of multiple bounces to be zero, this new model performs a random walk on the microsurface until
the ray leaves it again, which ensures perfect energy conservation.
In practise, this means that the "darkening problem" - GGX materials becoming darker with increasing
roughness - is solved in a physically correct and efficient way.
The downside of this model is that it has no (known) analytic expression for evalation. However, it can be
evaluated stochastically, and although the correct PDF isn't known either, the properties of MIS and the
balance heuristic guarantee an unbiased result at the cost of slightly higher noise.
Reviewers: dingto, #cycles, brecht
Reviewed By: dingto, #cycles, brecht
Subscribers: bliblubli, ace_dragon, gregzaal, brecht, harvester, dingto, marcog, swerner, jtheninja, Blendify, nutel
Differential Revision: https://developer.blender.org/D2002
|
|
|
|
57% less for path and 48% less for branched path.
|
|
This commit makes it so malloc() is only happening once per volume and
once per transparent shadow query (per thread), improving scalability of
the code to multiple CPU cores.
Hard to measure this with a low-bottom i7 here currently, but from quick
tests seems volume sampling gave about 3-5% speedup.
The idea is to store allocated memory in kernel globals, which are per
thread on CPU already.
Reviewers: dingto, juicyfruit, lukasstockner97, maiself, brecht
Reviewed By: brecht
Subscribers: Blendify, nutel
Differential Revision: https://developer.blender.org/D1996
|
|
The idea is to switch from allocating separate buffers for shader data's
structure of arrays to allocating one huge memory block and do some index
trickery to make it accessed as SOA.
This saves quite reasonable amount of lines of code in device_opencl and
also makes it possible to get rid of special declaration of ShaderData
structure.
As a side effect it also makes it easier to experiment with SOA vs. AOS
for split kernel.
Works fine here on NVidia GTX580, Intel CPU amd AMD Fiji cards.
Reviewers: #cycles, brecht, juicyfruit, dingto
Differential Revision: https://developer.blender.org/D1593
|
|
Use KernelGlobals to access all the global arrays for the intermediate
storage instead of passing all this storage things explicitly.
Tested here with Intel OpenCL, NVIDIA GTX580 and AMD Fiji, didn't see
any artifacts, so guess it's all good.
Reviewers: juicyfruit, dingto, lukasstockner97
Differential Revision: https://developer.blender.org/D1736
|
|
The goal is to make Experimental kernel closer in performance to the
official kernel, avoiding spills and such.
There should not be big impact on official kernel, own tests showed
few percent performance drop on laptop's GPU. CPU was always the
same speed on AVX, AVX2 and SSE4.1 CPUs i've been testing here.
This seems to be the last essential step before we can get rid of
Experimental kernel and enable SSS officially on GPU without causing
some major performance issues.
Surely some more tweaks are possibly required, but that we can do
for until cows go home anyway.
|
|
This commit changes the way how we pass bounce information to the Light
Path node. Instead of manualy copying the bounces into ShaderData, we now
directly pass PathState. This reduces the arguments that we need to pass
around and also makes it easier to extend the feature.
This commit also exposes the Transmission Bounce Depth to the Light Path
node. It works similar to the Transparent Depth Output: Replace a
Transmission lightpath after X bounces with another shader, e.g a Diffuse
one. This can be used to avoid black surfaces, due to low amount of max
bounces.
Reviewed by Sergey and Brecht, thanks for some hlp with this.
I tested compilation and usage on CPU (SVM and OSL), CUDA, OpenCL Split
and Mega kernel. Hopefully this covers all devices. :)
|
|
|