Age | Commit message (Collapse) | Author |
|
The idea is to make include statements more explicit and obvious where the
file is coming from, additionally reducing chance of wrong header being
picked up.
For example, it was not obvious whether bvh.h was refferring to builder
or traversal, whenter node.h is a generic graph node or a shader node
and cases like that.
Surely this might look obvious for the active developers, but after some
time of not touching the code it becomes less obvious where file is coming
from.
This was briefly mentioned in T50824 and seems @brecht is fine with such
explicitness, but need to agree with all active developers before committing
this.
Please note that this patch is lacking changes related on GPU/OpenCL
support. This will be solved if/when we all agree this is a good idea to move
forward.
Reviewers: brecht, lukasstockner97, maiself, nirved, dingto, juicyfruit, swerner
Reviewed By: lukasstockner97, maiself, nirved, dingto
Subscribers: brecht
Differential Revision: https://developer.blender.org/D2586
|
|
The intention of this commit it to address issues mentioned in the
reports T43865,T50164 and T50452.
The code is based on Embree code with some extra vectorization
to speed up single ray to single triangle intersection.
Unfortunately, such a fix is not coming for free. There is some
slowdown for AVX2 processors, mainly due to different vectorization
code, which caused different number of instructions to be executed
and different instructions-per-cycle counters. But on another hand
this commit makes pre-AVX2 platforms such as AVX and SSE4.1 a bit
faster. The prerformance goes as following:
2.78c AVX2 2.78c AVX Patch AVX2 Patch AVX
BMW 05:21.09 06:05.34 05:32.97 (+3.5%) 05:34.97 (-8.5%)
Classroom 16:55.36 18:24.51 17:10.41 (+1.4%) 17:15.87 (-6.3%)
Fishy Cat 08:08.49 08:36.26 08:09.19 (+0.2%) 08:12.25 (-4.7%
Koro 11:22.54 11:45.24 11:13.25 (-1.5%) 11:43.81 (-0.3%)
Barcelone 14:18.32 16:09.46 14:15.20 (-0.4%) 14:25.15 (-10.8%)
On GPU the performance is about 1.5-2% slower in my tests on GTX1080
but afraid we can't do much as a part of this chaneg here and
consider it a price to pay for more proper intersection check.
Made in collaboration with Maxym Dmytrychenko, big thanks to him!
Reviewers: brecht, juicyfruit, lukasstockner97, dingto
Differential Revision: https://developer.blender.org/D1574
|
|
It uses an idea of accumulating all possible light reachable across the
light path (without taking shadow blocked into account) and accumulating
total shaded light across the path. Dividing second figure by first one
seems to be giving good estimate of the shadow.
In fact, to my knowledge, it's something really similar to what is
happening in the denoising branch, so we are aligned here which is good.
The workflow is following:
- Create an object which matches real-life object on which shadow is
to be catched.
- Create approximate similar material on that object.
This is needed to make indirect light properly affecting CG objects
in the scene.
- Mark object as Shadow Catcher in the Object properties.
Ideally, after doing that it will be possible to render the image and
simply alpha-over it on top of real footage.
|
|
|
|
The title says it all actually. Gives up to 10% speedup on test scenes here
on i7-6800K.
Render times on GPU are unreliable here, but there might be some slowdown
caused by watertight nature of intersections.
|
|
This is a preparation work for the followup commit which wil l move
remaining parts of Woop intersection logic to an utility file.
Doing it as a separate commit to keep changes more atomic and easier
to bisect when/if needed.
|
|
|
|
Decoupled ray marching is not supported yet.
Transparent shadows are always enabled for volume rendering.
Changes in kernel/bvh and kernel/geom are from Sergey.
This simiplifies code significantly, and prepares it for
record-all transparent shadow function in split kernel.
|
|
|
|
Was a bug in recent optimization commit.
|
|
|
|
The idea is to record all possible transparent intersections when
shooting transparent ray on GPU (similar to what we were doing on
CPU already).
This avoids need of doing whole ray-to-scene intersections queries
for each intersection and speeds up a lot cases like transparent
hair in the cost of extra memory.
This commit is a base ground for now and this feature is kept
disabled for until some further tweaks.
|
|
We started to run out of bits there, so now we separate flags
which came from __object_flags and which are either runtime or
coming from __shader_flags.
Rule now is: SD_OBJECT_* flags are to be tested against new
object_flags field of ShaderData, all the rest flags are to
be tested against flags field of ShaderData.
There should be no user-visible changes, and time difference
should be minimal. In fact, from tests here can only see hardly
measurable difference and sometimes the new code is somewhat
faster (all within a noise floor, so hard to tell for sure).
Reviewers: brecht, dingto, juicyfruit, lukasstockner97, maiself
Differential Revision: https://developer.blender.org/D2428
|
|
|
|
This way we can stop traversing BVH node early on.
Gives about 2-2.5x times render time improvement with 3 BVH steps.
Hopefully this gives no measurable performance loss for scenes with
single BVH step.
Traversal is currently only implemented for QBVH, meaning old CPUs
and GPU do not benefit from this change.
|
|
This was a missing bit from b53ce9a.
|
|
per node
|
|
|
|
This way it's more clear whether some issue is caused by lots of geometry in
the node or by lots of "transparent" BVH nodes.
|
|
Gives up to ~1% speedup again.
While it seems to be small, still nice since the code now is actually more
clean that it used to be before.
|
|
Just preparing for new optimization to be used in all traversal implementation.
Should be no measurable difference.
|
|
Several ideas here:
- Optimize calculation of near_{x,y,z} in a way that does not require
3 if() statements per update, which avoids negative effect of wrong
branch prediction.
- Optimization of direction clamping for BVH.
- Optimization of point/direction transform.
Brings ~1.5% speedup again depending on a scene (unfortunately, this
speedup can't be sum across all previous commits because speedup of
each of the changes varies from scene to scene, but it still seems to
be nice solid speedup of few percent on Linux and bigger speedup was
reported on Windows).
Once again ,thanks Maxym for inspiration!
Still TODO: We have multiple places where we need to calculate near
x,y,z indices in BVH, for now it's only done for main BVH traversal.
Will try to move this calculation to an utility function and see if
that can be easily re-used across all the BVH flavors.
|
|
Mostly this is making inlining match CUDA 7.5 in a few performance critical
places. The end result is that performance is now better than before, possibly
due to less register spilling or other CUDA 8.0 compiler improvements.
On benchmarks scenes, there are 3% to 35% render time reductions. Stack memory
usage is reduced a little too.
Reviewed By: sergey
Differential Revision: https://developer.blender.org/D2269
|
|
Some of the files were wrongly attributing code to some other
organizations and in few places proper attribution was missing.
This is mainly either a copy-paste error (when new file was
created from an existing one and header wasn't updated) or due
to some refactor which split non-original-BF code with purely
BF code.
Should solve some confusion around.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Makes it simpler to compare different traversal algorithms.
|
|
|
|
Reviewers: brecht, sergey, dingto, juicyfruit
Differential Revision: https://developer.blender.org/D2220
|
|
for QBVH
|
|
|
|
All the changes are mainly giving explicit tips on inlining functions,
so they match how inlining worked with previous toolkit.
This make kernel compiled by CUDA 8 render in average with same speed
as previous kernels. Some scenes are somewhat faster, some of them are
somewhat slower. But slowdown is within 1% so far.
On a positive side it allows us to enable newer generation cards on
buildbots (so GTX 10x0 will be officially supported soon).
|
|
While they prevent legit write past the array boundary error
those fixes introduced regression in behavior when having exact
max_hits transparent intersections and nothing else.
Previous code would have considered such case a totally opaque,
but it's not correct.
Fixes T48941: Some materials don't get transparent shadows anymore
|
|
It was possible to miss bounces termination criteria in this functions,
mainly when max_hits was set to 0.
Made the check more robust in traversal functions (which should not
affect performance, it's an operation of same complexity AFAIK).
Also avoid doing ray-scene intersection from shadow_blocked when
limit of transparent bounces was already reached.
|
|
Seems there's some conflict around `near` identifier in that configuration.
|
|
|
|
Code might have writing past the array boundaries.
|
|
This way restrict can be used for CUDA and OpenCL as well.
From quick tests in areas i've been testing this it might give some
barely measurable %% of speedup, but it increases registers pressure.
So use of this qualifier is still really limited.
|
|
Using camel case for variables is something what didn't came from our original
code, but rather from third party libraries. Let's avoid those as much as possible.
|
|
Matches better naming of volume traversal files, where we've got
optimized versions of a single step of volume intersection and
traversal which will gather all volume intersections.
|
|
BVH traversal is not really that much a geometry and we've got
quite some traversals now. Makes sense to keep them separate in
the name of source structure clarity.
|