Age | Commit message (Collapse) | Author |
|
This is an initial implementation of BVH8 optimization structure
and packated triangle intersection. The aim is to get faster ray
to scene intersection checks.
Scene BVH4 BVH8
barbershop_interior 10:24.94 10:10.74
bmw27 02:41.25 02:38.83
classroom 08:16.49 07:56.15
fishy_cat 04:24.56 04:17.29
koro 06:03.06 06:01.45
pavillon_barcelona 09:21.26 09:02.98
victor 23:39.65 22:53.71
As memory goes, peak usage raises by about 4.7% in a complex
scenes.
Note that BVH8 is disabled when using OSL, this is because OSL
kernel does not get per-microarchitecture optimizations and
hence always considers BVH3 is used.
Original BVH8 patch from Anton Gavrikov.
Batched triangles intersection from Victoria Zhislina.
Extra work and tests and fixes from Maxym Dmytrychenko.
|
|
|
|
This means the shader can now be used for procedural texturing. New
settings on the node are Samples, Inside, Local Only and Distance.
Original patch by Lukas with further changes by Brecht.
Differential Revision: https://developer.blender.org/D3479
|
|
Not sure why exactly it is called a cleanup, the code was much more clear
and robust against possible missing return statements which are MANDATORY.
Missing return statement will:
- Cause two different BVH traversals to be run.
Not is happening currently, but if more BVH layouts are added, it will
become a problem.
- It is already causing assert() statements to fail, since functions are
no longer returning when they are supposed to.
If there is any measurable reason to keep this change, let me know.
Otherwise just stick to reliable/tested/robust code.
This reverts commit ba65f7093b39a8e5f1fb869cbc347fb810a05ab9.
|
|
|
|
This save a little memory and copying in the kernel by storing only a 4x3
matrix instead of a 4x4 matrix. We already did this in a few places, and
those don't need to be special exceptions anymore now.
|
|
Original patch by Stefan with modifications by Brecht.
|
|
This was we can introduce other types of BVH, for example, wider ones, without
causing too much mess around boolean flags.
Thoughs:
- Ideally device info should probably return bitflag of what BVH types it
supports.
It is possible to implement based on simple logic in device/ and mesh.cpp,
rest of the changes will stay the same.
- Not happy with workarounds in util_debug and duplicated enum in kernel.
Maybe enbum should be stores in kernel, but then it's kind of weird to include
kernel types from utils. Soudns some cyclkic dependency.
Reviewers: brecht, maxim_d33
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D3011
|
|
|
|
Empty BVH nodes are set to NaN which must be preserved all the way to the
tnear <= tfar test which can then give false for empty nodes. This needs
strict semantices and careful argument ordering for min() and max(), so
the second argument is used if either of the arguments is NaN.
Fixes T52635: crash in BVH traversal with SSE4.1.
Differential Revision: https://developer.blender.org/D2828
|
|
|
|
Fishy cat benchmark was rendering with wrong shadows. Cause is unclear,
adding printf or rearranging code seems to avoid this issue, possibly a
compiler bug. This reverts the fix and solves the OSL bug elsewhere.
|
|
We should only early out with any hit in BVH traversal if the only visibility
bits used are opaque shadow. Not when opaque shadow is one of multiple bits.
|
|
Those are nothing to do with BVH, and can be used separately.
|
|
Since all the shadow catchers are already assumed to be in the footage,
the shadows they cast on each other are already in the footage too. So
don't just let shadow catchers skip self, but all shadow catchers.
Another justification is that it should not matter if the shadow catcher
is modeled as one object or multiple separate objects, the resulting
render should be the same.
Differential Revision: https://developer.blender.org/D2763
|
|
|
|
|
|
The idea is to make include statements more explicit and obvious where the
file is coming from, additionally reducing chance of wrong header being
picked up.
For example, it was not obvious whether bvh.h was refferring to builder
or traversal, whenter node.h is a generic graph node or a shader node
and cases like that.
Surely this might look obvious for the active developers, but after some
time of not touching the code it becomes less obvious where file is coming
from.
This was briefly mentioned in T50824 and seems @brecht is fine with such
explicitness, but need to agree with all active developers before committing
this.
Please note that this patch is lacking changes related on GPU/OpenCL
support. This will be solved if/when we all agree this is a good idea to move
forward.
Reviewers: brecht, lukasstockner97, maiself, nirved, dingto, juicyfruit, swerner
Reviewed By: lukasstockner97, maiself, nirved, dingto
Subscribers: brecht
Differential Revision: https://developer.blender.org/D2586
|
|
The intention of this commit it to address issues mentioned in the
reports T43865,T50164 and T50452.
The code is based on Embree code with some extra vectorization
to speed up single ray to single triangle intersection.
Unfortunately, such a fix is not coming for free. There is some
slowdown for AVX2 processors, mainly due to different vectorization
code, which caused different number of instructions to be executed
and different instructions-per-cycle counters. But on another hand
this commit makes pre-AVX2 platforms such as AVX and SSE4.1 a bit
faster. The prerformance goes as following:
2.78c AVX2 2.78c AVX Patch AVX2 Patch AVX
BMW 05:21.09 06:05.34 05:32.97 (+3.5%) 05:34.97 (-8.5%)
Classroom 16:55.36 18:24.51 17:10.41 (+1.4%) 17:15.87 (-6.3%)
Fishy Cat 08:08.49 08:36.26 08:09.19 (+0.2%) 08:12.25 (-4.7%
Koro 11:22.54 11:45.24 11:13.25 (-1.5%) 11:43.81 (-0.3%)
Barcelone 14:18.32 16:09.46 14:15.20 (-0.4%) 14:25.15 (-10.8%)
On GPU the performance is about 1.5-2% slower in my tests on GTX1080
but afraid we can't do much as a part of this chaneg here and
consider it a price to pay for more proper intersection check.
Made in collaboration with Maxym Dmytrychenko, big thanks to him!
Reviewers: brecht, juicyfruit, lukasstockner97, dingto
Differential Revision: https://developer.blender.org/D1574
|
|
It uses an idea of accumulating all possible light reachable across the
light path (without taking shadow blocked into account) and accumulating
total shaded light across the path. Dividing second figure by first one
seems to be giving good estimate of the shadow.
In fact, to my knowledge, it's something really similar to what is
happening in the denoising branch, so we are aligned here which is good.
The workflow is following:
- Create an object which matches real-life object on which shadow is
to be catched.
- Create approximate similar material on that object.
This is needed to make indirect light properly affecting CG objects
in the scene.
- Mark object as Shadow Catcher in the Object properties.
Ideally, after doing that it will be possible to render the image and
simply alpha-over it on top of real footage.
|
|
|
|
The title says it all actually. Gives up to 10% speedup on test scenes here
on i7-6800K.
Render times on GPU are unreliable here, but there might be some slowdown
caused by watertight nature of intersections.
|
|
This is a preparation work for the followup commit which wil l move
remaining parts of Woop intersection logic to an utility file.
Doing it as a separate commit to keep changes more atomic and easier
to bisect when/if needed.
|
|
|
|
Decoupled ray marching is not supported yet.
Transparent shadows are always enabled for volume rendering.
Changes in kernel/bvh and kernel/geom are from Sergey.
This simiplifies code significantly, and prepares it for
record-all transparent shadow function in split kernel.
|
|
|
|
Was a bug in recent optimization commit.
|
|
|
|
The idea is to record all possible transparent intersections when
shooting transparent ray on GPU (similar to what we were doing on
CPU already).
This avoids need of doing whole ray-to-scene intersections queries
for each intersection and speeds up a lot cases like transparent
hair in the cost of extra memory.
This commit is a base ground for now and this feature is kept
disabled for until some further tweaks.
|
|
We started to run out of bits there, so now we separate flags
which came from __object_flags and which are either runtime or
coming from __shader_flags.
Rule now is: SD_OBJECT_* flags are to be tested against new
object_flags field of ShaderData, all the rest flags are to
be tested against flags field of ShaderData.
There should be no user-visible changes, and time difference
should be minimal. In fact, from tests here can only see hardly
measurable difference and sometimes the new code is somewhat
faster (all within a noise floor, so hard to tell for sure).
Reviewers: brecht, dingto, juicyfruit, lukasstockner97, maiself
Differential Revision: https://developer.blender.org/D2428
|
|
|
|
This way we can stop traversing BVH node early on.
Gives about 2-2.5x times render time improvement with 3 BVH steps.
Hopefully this gives no measurable performance loss for scenes with
single BVH step.
Traversal is currently only implemented for QBVH, meaning old CPUs
and GPU do not benefit from this change.
|
|
This was a missing bit from b53ce9a.
|
|
per node
|
|
|
|
This way it's more clear whether some issue is caused by lots of geometry in
the node or by lots of "transparent" BVH nodes.
|
|
Gives up to ~1% speedup again.
While it seems to be small, still nice since the code now is actually more
clean that it used to be before.
|
|
Just preparing for new optimization to be used in all traversal implementation.
Should be no measurable difference.
|
|
Several ideas here:
- Optimize calculation of near_{x,y,z} in a way that does not require
3 if() statements per update, which avoids negative effect of wrong
branch prediction.
- Optimization of direction clamping for BVH.
- Optimization of point/direction transform.
Brings ~1.5% speedup again depending on a scene (unfortunately, this
speedup can't be sum across all previous commits because speedup of
each of the changes varies from scene to scene, but it still seems to
be nice solid speedup of few percent on Linux and bigger speedup was
reported on Windows).
Once again ,thanks Maxym for inspiration!
Still TODO: We have multiple places where we need to calculate near
x,y,z indices in BVH, for now it's only done for main BVH traversal.
Will try to move this calculation to an utility function and see if
that can be easily re-used across all the BVH flavors.
|
|
Mostly this is making inlining match CUDA 7.5 in a few performance critical
places. The end result is that performance is now better than before, possibly
due to less register spilling or other CUDA 8.0 compiler improvements.
On benchmarks scenes, there are 3% to 35% render time reductions. Stack memory
usage is reduced a little too.
Reviewed By: sergey
Differential Revision: https://developer.blender.org/D2269
|
|
Some of the files were wrongly attributing code to some other
organizations and in few places proper attribution was missing.
This is mainly either a copy-paste error (when new file was
created from an existing one and header wasn't updated) or due
to some refactor which split non-original-BF code with purely
BF code.
Should solve some confusion around.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Makes it simpler to compare different traversal algorithms.
|
|
|
|
Reviewers: brecht, sergey, dingto, juicyfruit
Differential Revision: https://developer.blender.org/D2220
|