Age | Commit message (Collapse) | Author |
|
|
|
This makes us more sure that header files are more self-sufficient.
|
|
|
|
The idea is to make include statements more explicit and obvious where the
file is coming from, additionally reducing chance of wrong header being
picked up.
For example, it was not obvious whether bvh.h was refferring to builder
or traversal, whenter node.h is a generic graph node or a shader node
and cases like that.
Surely this might look obvious for the active developers, but after some
time of not touching the code it becomes less obvious where file is coming
from.
This was briefly mentioned in T50824 and seems @brecht is fine with such
explicitness, but need to agree with all active developers before committing
this.
Please note that this patch is lacking changes related on GPU/OpenCL
support. This will be solved if/when we all agree this is a good idea to move
forward.
Reviewers: brecht, lukasstockner97, maiself, nirved, dingto, juicyfruit, swerner
Reviewed By: lukasstockner97, maiself, nirved, dingto
Subscribers: brecht
Differential Revision: https://developer.blender.org/D2586
|
|
|
|
Solves memory regression by the default configuration.
|
|
The issue here was mainly coming from minimal pixel width feature
which is quite commonly enabled in production shots.
This feature will use some probabilistic heuristic in the curve
intersection function to check whether we need to return intersection
or not. This probability is calculated for every intersection check.
Now, when we use multiple BVH nodes for curve primitives we increase
probability of that primitive to be considered a good intersection
for us. This is similar to increasing minimal width of curve.
What is worst here is that change in the intersection probability
fully depends on exact layout of BVH, meaning probability might
change differently depending on a view angle, the way how builder
binned the primitives and such. This makes it impossible to do
simple check like dividing probability by number of BVH steps.
Other solution might have been to split BVH into fully independent
trees, but that will increase memory usage of all the static
objects in the scenes, which is also not something desirable.
For now used most simple but robust approach: store BVH primitives
time and test it in curve intersection functions. This solves the
regression, but has two downsides:
- Uses more memory.
which isn't surprising, and ANY solution to this problem will
use more memory.
What we still have to do is to avoid this memory increase for
cases when we don't use BVH motion steps.
- Reduces number of maximum available textures on pre-kepler cards.
There is not much we can do here, hardware gets old but we need
to move forward on more modern hardware..
|
|
|
|
|
|
|
|
This way we can stop traversing BVH node early on.
Gives about 2-2.5x times render time improvement with 3 BVH steps.
Hopefully this gives no measurable performance loss for scenes with
single BVH step.
Traversal is currently only implemented for QBVH, meaning old CPUs
and GPU do not benefit from this change.
|
|
Similar to the previous commit, the statistics goes as:
BVH Steps Render time (sec) Memory usage (MB)
0 46 260
1 27 373
2 18 598
3 15 826
Scene used for the tests is the agent's body from one of the barber
shop scenes (no textures or anything, just a diffuse material).
Once again this is limited to regular (non-spatial split) BVH,
Support of spatial split to this feature will come later.
|
|
The idea is to create several smaller BVH nodes for each of the motion
curve primitives. This acts as a forced spatial split for the single
primitive.
This gives up render time speedup of motion blurred hair in the cost
of extra memory usage. The numbers goes as:
BVH Steps Render time (sec) Memory usage (MB)
0 258 191
1 123 278
2 69 453
3 43 627
Scene used for the tests is the agent's hair from one of the barber
shop scenes.
Currently it's only limited to scenes without spatial split enabled,
since the spatial split builder requires some changes to work properly
with motion steps coordinates.
|
|
|
|
|
|
|
|
|
|
makes code slightly shorter and uses idea of const qualifiers.
|
|
Run into some nasty bugs while trying various things here.
Wouldn't mind enabling -Wshadow for Cycles actually..
|
|
This way we can have different limits for regular and motion curves
which we'll do in one of the upcoming commits in order to gain some
percents of speedup.
The reasoning here is that motion curves are usually intersecting
lots of others bounding boxes, which makes it inefficient to have
single primitive in the leaf node.
|
|
Maximal number of elements is supposed to be inclusive. That is what
it was always meant in this file and what @brecht considered still
the case in 6974b69c6172.
In fact, the commit message to that change mentions that we allowed
up to 2 curve primitives per leaf while in fact it was doing up to 1
curve primitive.
Making it real 2 primitives at a max gives about 5% slowdown for the
koro.blend scene. This is a reason why BVHParams.max_curve_leaf_size
was changed to 1 by this change.
|
|
|
|
It was possible to have non-initialized unaligned BVH split
to be used when regular BVH split SAH was inf. Now we ensure
that unaligned splitter is only used when it's really initialized.
It's a regression and should be in 2.78a.
|
|
|
|
|
|
This is a special builder type which is allowed to orient nodes to
strands direction, hence minimizing their surface area in comparison
with axis-aligned nodes. Such nodes are much more efficient for hair
rendering.
Implementation of BVH builder is based on Embree, and generally idea
there is to calculate axis-aligned SAH and oriented SAH and if SAH
of oriented node is smaller than axis-aligned SAH we create unaligned
node.
We store both aligned and unaligned nodes in the same tree (which
seems to be different from what Embree is doing) so we don't have
any any extra calculations needed to set up hair ray for BVH
traversal, hence avoiding any possible negative effect of this new
BVH nodes type.
This new builder is currently not in use, still need to make BVH
traversal code aware of unaligned nodes.
|
|
|
|
In certain types of animation it's possible to have some objects
scaling to zero. In this case we can save render times by avoid
traversing such instances.
Better to do ti ahead of a time, so traversal stays simple.
Reviewers: lukasstockner97, dingto, brecht
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2048
|
|
Some of these values can get quite large and are hard to read, adding this
makes it easy to read them at a glance.
Reviewed By: sergey
Differential Revision: https://developer.blender.org/D2039
|
|
separate some arrays.
Differential Revision: https://developer.blender.org/D2016
|
|
|
|
|
|
|
|
|
|
Simple fix: release lock earlier.
Reduces spatial split build time from 96 to 53sec on the Bunny.blend
(using studio Intel for benchmark).
NOTE: Timing difference is not that spectacular when comparing numbers
with builds before memory optimization, but even then it's about 20%
faster build.
|
|
This was only visible on systems with lots of threads and root of the issue
was that we've been pre-allocating too much memory for all the threads.
Now we only pre-allocate data for the main thread and rest of the threads
does allocation on-demand.
This brings down memory usage from 36Gig to 6.9Gig when building spatial
split for the Bunny.blend file on our Intel beast.
Originally regression was happened by the threaded spacial split builder
commit.
|
|
|
|
This was caused by recent threading commit. Now because of all children
are set when they're ready need to explicitly update all parent's visibility.
|
|
The title actually covers it all, This commit exploits all the work
being done in previous changes to make it possible to build spatial
splits in threads.
Works quite nicely, but has a downside of some extra memory usage.
In practice it doesn't seem to be a huge problem and that we can
always look into later if it becomes a real showstopper.
In practice it shows some nice speedup:
- BMW27 scene takes 3 now (used to be 4)
- Agent shot takes 5 sec (used to be 80)
Such non-linear speedup is most likely coming from much less amount
of heap re-allocations. A a downside, there's a bit of extra memory
used by BVH arrays. From the tests amount of extra memory is below
0.001% so far, so it's not that bad at all.
Reviewers: brecht, juicyfruit, dingto, lukasstockner97
Differential Revision: https://developer.blender.org/D1820
|
|
Policy here is a bit more complicated, if tree becomes too deep we're
forced to create a leaf node and size of that leaf wouldn't be so well
predicted, which means it's quite tricky to use single stack array for
that.
Made it more official feature that StackAllocator will fall-back to
heap when running out of stack memory.
It's still much better than always using heap allocator.
|
|
Currently they're staying at 1 (actual size over capacity), but we
will be changing it quite soon in order to avoid having too much
memory re-allocation happening at a BVH build time and will be
playing with different policies for that.
|
|
In some files stack memory was overruning the pre-allocated stack.
Perhaps we should fall-back to a hep-allocated stack so release builds
don't crash in works case but just becoming slower.
|
|
There are in fact some missing parts to it (Split BVH builder should
be creating bins from result of Object Split constructor).
Doable, but need to quickly fix issue for the studio here, easier to
revert for now.
|
|
Need to revisit utility headers a bit more carefully and perhaps
move such utilities outside of simd-related headers.
|
|
|
|
|
|
This reduces amount of data being moved back and forth, which should
have positive effect on the performance.
|
|
This has following advantages:
- Localizes all the run-time storage into a single structure,
which could easily be extended further.
- Storage could be created per-thread, so once builder is
threaded we wouldn't have any conflicts between threads.
- Global nature of the storage avoids memory re-allocation
on the runtime, keeping builder as fast as possible.
Currently it's just API changes, which don't affect user at all.
|
|
Uses new StackAllocator from util_stack_allocator. Some tweaks to the stack
storage size are possible, read notes in the code about this.
At this point we might want to rename allocator files to util_allocator_foo.c,
so the stay nicely grouped in the folder.
|
|
This way we can use bitscan() from both vectorized and non-vectorized
code, which applies to both kernel and host code.
|