Age | Commit message (Collapse) | Author |
|
Goal is to reduce OpenCL kernel recompilations.
Currently viewport renders are still set to use 64 closures as this seems to
be faster and we don't want to cause a performance regression there. Needs
to be investigated.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2775
|
|
The algorithm averages normals from nearby surfaces. It uses the same
sampling strategy as BSSRDFs, casting rays along the normal and two
orthogonal axes, and combining the samples with MIS.
The main concern here is that we are introducing raytracing inside
shader evaluation, which could be quite bad for GPU performance and
stack memory usage. In practice it doesn't seem so bad though.
Note that using this feature can easily slow down renders 20%, and
that if you care about performance then it's better to use a bevel
modifier. Mainly this is useful for baking, and for cases where the
mesh topology makes it difficult for the bevel modifier to work well.
Differential Revision: https://developer.blender.org/D2803
|
|
|
|
This causes some difference in the classroom scene, where ray visibility
tricks are used and break the MIS balance. Otherwise there doesn't seem
to be much effect, but better to use the right formulas. Problem originally
identified by Lukas.
|
|
With a Titan Xp, reduces path trace local memory from 1092MB to 840MB.
Benchmark performance was within 1% with both RX 480 and Titan Xp.
Original patch was implemented by Sergey.
Differential Revision: https://developer.blender.org/D2249
|
|
|
|
|
|
|
|
We are already using the AO distance, so might as well offer this extra
control over the intensity. Useful when an interior scene is supposed to
be significantly darker than the background shader.
|
|
Convention was only followed loosely,
apply to DNA where changes aren't likely to conflict.
(Skipped ModifierType for eg).
|
|
|
|
A little faster on some benchmark scenes, a little slower on others, seems
about performance neutral on average and saves a little memory.
|
|
This makes sharing some code between mega/split in following commits a bit
easier, and also paves the way for rendering multiple tiles later.
|
|
|
|
This is done by storing only a subset of PathRadiance, and by storing
direct light immediately in the main PathRadiance. Saves about 10% of
CUDA stack memory, and simplifies subsurface indirect ray code.
|
|
Similar to what we did for area lights previously, this should help
preserve stratification when using multiple BSDFs in theory. Improvements
are not easily noticeable in practice though, because the number of BSDFs
is usually low. Still nice to eliminate one sampling dimension.
|
|
|
|
|
|
Previously we used a 1D sequence to select a light, and another 2D sequence
to sample a point on the light. For multiple lights this meant each light
would get a random subset of a 2D stratified sequence, which is not
guaranteed to be stratified anymore.
Now we use only a 2D sequence, split into segments along the X axis, one for
each light. The samples that fall within a segment then each are a stratified
sequence, at least in the limit. So for example for two lights, we split up
the unit square into two segments [0,0.5[ x [0,1[ and [0.5,1[ x [0,1[.
This doesn't make much difference in most scenes, mainly helps if you have a
few large area lights or some types of HDR backgrounds.
|
|
This was needed when we accessed OSL closure memory after shader evaluation,
which could get overwritten by another shader evaluation. But all closures
are immediatley converted to ShaderClosure now, so no longer needed.
|
|
|
|
Also pass by value and don't write back now that it is just a hash for seeding
and no longer an LCG state. Together this makes CUDA a tiny bit faster in my
tests, but mainly simplifies code.
|
|
|
|
|
|
|
|
Added some extra tirckery to avoid background being tinted dark with transparent
surface. Maybe a bit hacky, but seems to work fine.
|
|
Since all the shadow catchers are already assumed to be in the footage,
the shadows they cast on each other are already in the footage too. So
don't just let shadow catchers skip self, but all shadow catchers.
Another justification is that it should not matter if the shadow catcher
is modeled as one object or multiple separate objects, the resulting
render should be the same.
Differential Revision: https://developer.blender.org/D2763
|
|
Differential Revision: https://developer.blender.org/D2766
|
|
|
|
transparent object
Tweaked the path radiance summing and alpha to accommodate for possible contribution of
light by transparent surface bounces happening prior to shadow catcher intersection.
This commit will change the way how shadow catcher results looks when was behind semi
transparent object, but the old result seemed to be fully wrong: there were big artifacts
when alpha-overing the result on some actual footage.
|
|
|
|
|
|
threads
Unlike regular path tracing, branched path tracing is usually used with lower
sample counts, at least for primary rays. This means that are less samples for
the GPU to work on in parallel and rendering is slower. As there is less work
overall there is also more inactive threads during rendering with BPT. This
patch makes use of those inactive rays to render branched samples in parallel
with other samples.
Each thread that is preparing for a branched sample will attempt to find an
inactive thread and if one is found the state for the sample is copied to that
thread. Potentially, if there are enough inactive threads, 100s of branched
samples could be generated from the same originating thread and ran in
parallel giving large speed ups.
Gives 70% faster render for pavillion midday scene. 20-60% faster on BMW
with car paint replaced with SSS/volumes.
|
|
The queue will be used to make reuse of inactive threads to keep
the GPU more busy.
|
|
|
|
This commit contains the first part of the new Cycles denoising option,
which filters the resulting image using information gathered during rendering
to get rid of noise while preserving visual features as well as possible.
To use the option, enable it in the render layer options. The default settings
fit a wide range of scenes, but the user can tweak individual settings to
control the tradeoff between a noise-free image, image details, and calculation
time.
Note that the denoiser may still change in the future and that some features
are not implemented yet. The most important missing feature is animation
denoising, which uses information from multiple frames at once to produce a
flicker-free and smoother result. These features will be added in the future.
Finally, thanks to all the people who supported this project:
- Google (through the GSoC) and Theory Studios for sponsoring the development
- The authors of the papers I used for implementing the denoiser (more details
on them will be included in the technical docs)
- The other Cycles devs for feedback on the code, especially Sergey for
mentoring the GSoC project and Brecht for the code review!
- And of course the users who helped with testing, reported bugs and things
that could and/or should work better!
|
|
Reduce thread divergence in kernel_shader_eval.
Rays are sorted in blocks of 2048 according to shader->id.
On R9 290 Classroom is ~30% faster, and Pabellon Barcelone is ~8% faster.
No sorting for CUDA split kernel.
Reviewers: sergey, maiself
Reviewed By: maiself
Differential Revision: https://developer.blender.org/D2598
|
|
This implements branched path tracing for the split kernel.
General approach is to store the ray state at a branch point, trace the
branched ray as normal, then restore the state as necessary before iterating
to the next part of the path. A state machine is used to advance the indirect
loop state, which avoids the need to add any new kernels. Each iteration the
state machine recreates as much state as possible from the stored ray to keep
overall storage down.
Its kind of hard to keep all the different integration loops in sync, so this
needs lots of testing to make sure everything is working correctly. We should
probably start trying to deduplicate the integration loops more now.
Nonbranched BMW is ~2% slower, while classroom is ~2% faster, other scenes
could use more testing still.
Reviewers: sergey, nirved
Reviewed By: nirved
Subscribers: Blendify, bliblubli
Differential Revision: https://developer.blender.org/D2611
|
|
Testing showed no issues so there's no reason to not have this.
|
|
This way we can skip it from compiling into OpenCL kernels by making
this shader compile-time feature.
|
|
|
|
|
|
The idea is to make include statements more explicit and obvious where the
file is coming from, additionally reducing chance of wrong header being
picked up.
For example, it was not obvious whether bvh.h was refferring to builder
or traversal, whenter node.h is a generic graph node or a shader node
and cases like that.
Surely this might look obvious for the active developers, but after some
time of not touching the code it becomes less obvious where file is coming
from.
This was briefly mentioned in T50824 and seems @brecht is fine with such
explicitness, but need to agree with all active developers before committing
this.
Please note that this patch is lacking changes related on GPU/OpenCL
support. This will be solved if/when we all agree this is a good idea to move
forward.
Reviewers: brecht, lukasstockner97, maiself, nirved, dingto, juicyfruit, swerner
Reviewed By: lukasstockner97, maiself, nirved, dingto
Subscribers: brecht
Differential Revision: https://developer.blender.org/D2586
|
|
Solves majority of speed regression on AMD OpenCL.
|
|
The title says it all actually.
|
|
It uses an idea of accumulating all possible light reachable across the
light path (without taking shadow blocked into account) and accumulating
total shaded light across the path. Dividing second figure by first one
seems to be giving good estimate of the shadow.
In fact, to my knowledge, it's something really similar to what is
happening in the denoising branch, so we are aligned here which is good.
The workflow is following:
- Create an object which matches real-life object on which shadow is
to be catched.
- Create approximate similar material on that object.
This is needed to make indirect light properly affecting CG objects
in the scene.
- Mark object as Shadow Catcher in the Object properties.
Ideally, after doing that it will be possible to render the image and
simply alpha-over it on top of real footage.
|
|
|
|
|
|
|
|
This commit enables record-all transparent shadows rays.
Perfromance results:
R9 290 render time (without synchronization), seconds
Before After Change
BMW 261.5 262.5 +0.4 %
Classroom 869.6 867.3 -0.3 %
Fishy Cat 657.4 639.8 -2.7 %
Koro 1909.8 692.8 -63.7 %
Pabellon Barcelona 1633.3 1238.0 -24.2 %
Pabellon Barcelona(*) 1158.1 903.8 -22.0 %
(*) without glossy connected to volume
|