Age | Commit message (Collapse) | Author |
|
This way we can avoid building the split kernel with NODE_FEATURE_BUMP enabled, in case we don't need it.
|
|
Quite straightforward implementation, but still needs some work for the split
kernel. Includes both regular and split kernel implementation for that.
The pass is not exposed to the interface yet because it's currently not really
easy to have same pass listed in the menu multiple times.
|
|
This features are now based on the scene settings, so scenes without those features
used are rendered even faster.
This gives about 30% speedup on the AMD A10 APU here, but at the same time it does
not mean such an improvement will happen on all the hardware. That being said, the
Tonga device here seems to have no measurable difference.
In any case it seems handy to have for the future, when we'll want to support SSS
in the kernel or to port selective compilation/split kernel to CUDA devices.
|
|
Was missing feature detection in the BumpNode in the previous selective nodes
compilation commit.
|
|
The issue was caused by the reshuffle needed to make objects flags have proper
object's bounding box to solve regressions in SSS objects intersecting volumes.
There's actually a feedback loop happening here, which is now solved in quite
naive way -- for the true displacement we consider all objects are capable of
intersecting volumes, synchronize object flags prior to displacement shader
tasks runs and then re-update object flags for proper bounding box.
Not sure what will be the proper solution here, we can't do preliminary check
of intersection for displacement shader, but on the other hand we don't really
need this flag for displacement shader anyway.
|
|
|
|
This commits finishes initial selective nodes compilation into kernel, which
helps a lot performance-wise for AMD OpenCL kernels.
Split by node groups is based on statistics from simple scenes like BMW and
more complex scenes like mango and gooseberry production files. Further
tweaks are always possible, but it should be a good starting point.
TODO: Still need to ignore unused nodes when calculating requested shader
features.
|
|
|
|
|
|
For now it is unused in the kernel, actual usage will come with
the next commits.
|
|
|
|
In certain configurations (for example when start resolution is set to small
value for background render and progressive refine enabled) number of tiles
might change in the tile manager. This situation will confuse progressive
refine feature and likely cause crash.
We might also add some settings verification in the session constructor, but
having an assert with brief explanation about what's wrong should already be
much better than nothing.
|
|
This changes the progressive refine part, regular update was already checking
for whether callbacks are set.
|
|
configurations
|
|
|
|
|
|
Simple integer overflow issue.
TODO(sergey): Check on CPU cubic sampling, it might also need size_t.
|
|
|
|
This commit contains all the work related on the AMD megakernel split work
which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus
some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely
someone else which we're forgetting to mention.
Currently only AMD cards are enabled for the new split kernel, but it is
possible to force split opencl kernel to be used by setting the following
environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1.
Not all the features are supported yet, and that being said no motion blur,
camera blur, SSS and volumetrics for now. Also transparent shadows are
disabled on AMD device because of some compiler bug.
This kernel is also only implements regular path tracing and supporting
branched one will take a bit. Branched path tracing is exposed to the
interface still, which is a bit misleading and will be hidden there soon.
More feature will be enabled once they're ported to the split kernel and
tested.
Neither regular CPU nor CUDA has any difference, they're generating the
same exact code, which means no regressions/improvements there.
Based on the research paper:
https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf
Here's the documentation:
https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit
Design discussion of the patch:
https://developer.blender.org/T44197
Differential Revision: https://developer.blender.org/D1200
|
|
This way device can actually make a decision of how it can optimize the kernel
in order to make it most efficient.
|
|
The goal is to be able to compile kernel with nodes which are actually needed
to render current scene, hence improving performance of the kernel,
The idea is:
- Have few node groups, starting with a group which contains nodes are used
really often, and then couple of groups which will be extension of this one.
- Have feature-based nodes disabling, so it's possible to disable nodes related
to features which are not used with the currently used nodes group.
This commit only lays down needed routines for this approach, actual split will
happen later after gathering statistics from bunch of production scenes.
|
|
This will be used by split kernel in order to compile most optimal kernel.
Maximum number of closures is actually being cached in the session, so viewport
rendering will not trigger kernel re-loading when number of closures goes down.
|
|
Currently unused but will be needed soon for the split kernel work.
|
|
This is currently unused but crucial for things like calculating amount of
device memory required to deal with the tasks.
Maybe not really best place to store it, but consider it good enough for now.
|
|
Previously we only had experimental flag passed to device's load_kernel() which
was all fine. But since we're gonna to have some extra parameters passed there
it makes sense to wrap them into a single struct, which will make it easier to
pass stuff around.
|
|
Previously it was only set at compilation time which is all fine but does
not let us to check which closure the node corresponds to prior to the
compilation.
|
|
They were added for completeness, but it seems we don't need them.
|
|
Now we calculate color in range 800..12000 using an approximation a/x+bx+c for R and G and ((at + b)t + c)t + d) for B.
Max absolute error for RGB for non-lut function is less than 0.0001, which is enough to get the same 8 bit/channel color as for OSL with a noticeable performance difference.
However there is a slight visible difference between previous non-OSL implementation because of lookup table interpolation and offset-by-one mistake.
The previous implementation gave black color outside of soft range (t > 12000), now it gives the same color as for 12000.
Also blackbody node without input connected is being converted to value input at shader compile time.
Reviewers: dingto, sergey
Reviewed By: dingto
Subscribers: nutel, brecht, juicyfruit
Differential Revision: https://developer.blender.org/D1280
|
|
Object flags are depending on bounding box which is only available after
mesh synchronization.
This was broken since 7fd4c44 which happened quite close to the release
and oddly enough was not sopped by anyone. Render test is coming for this.
Was spotted by Thomas Dinges while working on another patch.
|
|
This patch adds support for light portals: objects that help sampling the
environment light, therefore improving convergence. Using them tor other
lights in a unidirectional pathtracer is virtually useless.
The sampling is done with the area-preserving code already used for area lamps.
MIS is used both for combination of different portals and for combining portal-
and envmap-sampling.
The direction of portals is considered, they aren't used if the sampling point
is behind them.
Reviewers: sergey, dingto, #cycles
Reviewed By: dingto, #cycles
Subscribers: Lapineige, nutel, jtheninja, dsisco11, januz, vitorbalbio, candreacchio, TARDISMaker, lichtwerk, ace_dragon, marcog, mib2berlin, Tunge, lopataasdf, lordodin, sergey, dingto
Differential Revision: https://developer.blender.org/D1133
|
|
This more a workaround for CUDA optimizer which can't optimize clamp(x, 0, 1)
into a single instruction and uses 4 instructions instead.
Original patch by @lockal with own modification:
Don't make changes outside of the kernel. They don't make any difference
anyway and term saturate() has a bit different meaning outside of kernel.
This gives around 2% of speedup in Barcelona file, but in more complex shader
setups with lots of math nodes with clamping speedup could be much nicer.
Subscribers: dingto
Projects: #cycles
Differential Revision: https://developer.blender.org/D1224
|
|
|
|
|
|
This way we can get rid of inefficient memory usage caused by BVH boundbox
part being unused by leaf nodes but still being allocated for them. Doing
such split allows to save 6 of float4 values for QBVH per leaf node and 3
of float4 values for regular BVH per leaf node.
This translates into following memory save using 01.01.01.G rendered
without hair:
Device memory size Device memory peak Global memory peak
Before the patch: 4957 5051 7668
With the patch: 4467 4562 7332
The measurements are done against current master. Still need to run speed tests
and it's hard to predict if it's faster or not: on the one hand leaf nodes are
now much more coherent in cache, on the other hand they're not so much coherent
with regular nodes anymore.
Reviewers: brecht, juicyfruit
Subscribers: venomgfx, eyecandy
Differential Revision: https://developer.blender.org/D1236
|
|
This way memory overhead caused by the BVH building is not so visible and peak
memory usage will be reduced.
Implementing this idea is not so straightforward actually, because we need to
synchronize images used for true displacement before meshes. Detecting whether
image is used for true displacement is not so striaghtforward, so for now all
all displacement types will synchronize images used for them.
Such change brings memory usage from 4.1G to 4.0G with the 01_01_01_D scene
from gooseberry. With 01_01_01_G scene it's 7.6G vs. 6.8G (before and after
the patch).
Reviewers: campbellbarton, juicyfruit, brecht
Subscribers: eyecandy
Differential Revision: https://developer.blender.org/D1217
|
|
Combine all the highpoly pixel arrays into a single array with a lookup
object_id for each of the highpoly objects.
Note: This changes the Bake API, external engines should refer to the
bake_api.c for the latest API.
Many thanks for Sergey Sharybin for the complete review, changes
suggestion and feedback. (you rock!)
Reviewers: sergey
Subscribers: pildanovak, marcclintdion, monio, metalliandy, brecht
Maniphest Tasks: T41092
Differential Revision: https://developer.blender.org/D772
|
|
|
|
private/public
|
|
|
|
Covers number of entities in the scene (objects, meshes etc), also reports
sizes of textures being allocated.
|
|
|
|
surface/hair
There were some synchronization missing in cases when only one of those settings
was disabled.
Also added a render test for such configurations now.
|
|
|
|
Main purpose of this change is to make material preview appearing more
instant after the shader tweaks.
|
|
There were two major problems with the interactivity of material previews:
- Beckmann tables were re-generated on every material tweak.
This is because preview scene is not set to be persistent, so re-triggering
the render leads to the full scene re-sync.
- Images could take rather noticeable time to load with OIIO from the disk
on every tweak.
This patch addressed this two issues in the following way:
- Beckmann tables are now static on CPU memory.
They're couple of hundred kilobytes only, so wouldn't expect this to be
an issue. And they're needed for almost every render anyway.
This actually also makes blackbody table to be static, but it's even smaller
than beckmann table.
Not totally happy with this approach, but others seems to complicate things
quite a bit with all this render engine life time and so..
- For preview rendering all images are considered to be built-in. This means
instead of OIIO which re-loads images on every re-render they're coming
from ImBuf cache which is fully manageable from blender side and unused
images gets freed later.
This would make it impossible to have mipmapping with OSL for now, but we'll
be working on that later anyway and don't think mipmaps are really so crucial
for the material preview.
This seems to be a better alternative to making preview scene persistent,
because of much optimal memory control from blender side.
Reviewers: brecht, juicyfruit, campbellbarton, dingto
Subscribers: eyecandy, venomgfx
Differential Revision: https://developer.blender.org/D1132
|
|
This attribute is not really supported for volumes, so it get's converted to
constant 0 at shader compile time.
TODO: We should consider doing the same for tangent attribute in order to save
some annoying checks at tracing time.
|
|
and continue to depend on boost though
Reviewers: dingto, sergey
Reviewed By: sergey
Subscribers: #cycles
Differential Revision: https://developer.blender.org/D1185
|
|
For as long as code stays in official folders it should follow
our code style.
|
|
This inconsistency drove me totally crazy, it's really confusing
when it's inconsistent especially when you work on both Cycles and
Blender sides.
Shouldn;t cause merge PITA, it's whitespace changes only, Git should
be able to merge it nicely.
|
|
|