Age | Commit message (Collapse) | Author |
|
* Rename struct KernelGlobals to struct KernelGlobalsCPU
* Add KernelGlobals, IntegratorState and ConstIntegratorState typedefs
that every device can define in its own way.
* Remove INTEGRATOR_STATE_ARGS and INTEGRATOR_STATE_PASS macros and
replace with these new typedefs.
* Add explicit state argument to INTEGRATOR_STATE and similar macros
In preparation for decoupling main and shadow paths.
Differential Revision: https://developer.blender.org/D12888
|
|
This is the first of a sequence of changes to support compiling Cycles kernels as MSL (Metal Shading Language) in preparation for a Metal GPU device implementation.
MSL requires that all pointer types be declared with explicit address space attributes (device, thread, etc...). There is already precedent for this with Cycles' address space macros (ccl_global, ccl_private, etc...), therefore the first step of MSL-enablement is to apply these consistently. Line-for-line this represents the largest change required to enable MSL. Applying this change first will simplify future patches as well as offering the emergent benefit of enhanced descriptiveness.
The vast majority of deltas in this patch fall into one of two cases:
- Ensuring ccl_private is specified for thread-local pointer types
- Ensuring ccl_global is specified for device-wide pointer types
Additionally, the ccl_addr_space qualifier can be removed. Prior to Cycles X, ccl_addr_space was used as a context-dependent address space qualifier, but now it is either redundant (e.g. in struct typedefs), or can be replaced by ccl_global in the case of pointer types. Associated function variants (e.g. lcg_step_float_addrspace) are also redundant.
In cases where address space qualifiers are chained with "const", this patch places the address space qualifier first. The rationale for this is that the choice of address space is likely to have the greater impact on runtime performance and overall architecture.
The final part of this patch is the addition of a metal/compat.h header. This is partially complete and will be extended in future patches, paving the way for the full Metal implementation.
Ref T92212
Reviewed By: brecht
Maniphest Tasks: T92212
Differential Revision: https://developer.blender.org/D12864
|
|
|
|
This includes much improved GPU rendering performance, viewport interactivity,
new shadow catcher, revamped sampling settings, subsurface scattering anisotropy,
new GPU volume sampling, improved PMJ sampling pattern, and more.
Some features have also been removed or changed, breaking backwards compatibility.
Including the removal of the OpenCL backend, for which alternatives are under
development.
Release notes and code docs:
https://wiki.blender.org/wiki/Reference/Release_Notes/3.0/Cycles
https://wiki.blender.org/wiki/Source/Render/Cycles
Credits:
* Sergey Sharybin
* Brecht Van Lommel
* Patrick Mours (OptiX backend)
* Christophe Hery (subsurface scattering anisotropy)
* William Leeson (PMJ sampling pattern)
* Alaska (various fixes and tweaks)
* Thomas Dinges (various fixes)
For the full commit history, see the cycles-x branch. This squashes together
all the changes since intermediate changes would often fail building or tests.
Ref T87839, T87837, T87836
Fixes T90734, T89353, T80267, T80267, T77185, T69800
|
|
|
|
Work around what appears to be a compiler bug, just changing the code a bit
without any functional changes.
|
|
|
|
|
|
This sampling pattern is particularly suited to adaptive sampling, and will
be used for that upcoming feature.
Based on "Progressive Multi-Jittered Sample Sequences" by Per Christensen,
Andrew Kensler and Charlie Kilpatrick.
Ref D4686
|
|
Ref D5363
|
|
The White Noise node hashes the input and returns a random number in the
range [0, 1]. The input can be a 1D, 2D, 3D, or a 4D vector.
Reviewers: brecht, JacquesLucke
Differential Revision: https://developer.blender.org/D5550
|
|
|
|
Apply clang format as proposed in T53211.
For details on usage and instructions for migrating branches
without conflicts, see:
https://wiki.blender.org/wiki/Tools/ClangFormat
|
|
It is supposed to be two spaces before comment stating which if
else/endif statements corresponds to. Was mainly violated in the
header guards.
|
|
|
|
Random numbers for step offset were correlated, now use stratified samples
which reduces noise as well for some types of volumes, mainly procedural
ones where the step size is bigger than the volume features.
|
|
|
|
|
|
A little faster on some benchmark scenes, a little slower on others, seems
about performance neutral on average and saves a little memory.
|
|
|
|
Previously the Sobol pattern suffered from some correlation issues that
made the outline of objects like a smoke domain visible. This helps
simplify the code and also makes some other optimizations possible.
|
|
|
|
|
|
Also pass by value and don't write back now that it is just a hash for seeding
and no longer an LCG state. Together this makes CUDA a tiny bit faster in my
tests, but mainly simplifies code.
|
|
|
|
|
|
Was doing lots of investigation recently, with need to have lots of things
side by side.
|
|
The idea is to make include statements more explicit and obvious where the
file is coming from, additionally reducing chance of wrong header being
picked up.
For example, it was not obvious whether bvh.h was refferring to builder
or traversal, whenter node.h is a generic graph node or a shader node
and cases like that.
Surely this might look obvious for the active developers, but after some
time of not touching the code it becomes less obvious where file is coming
from.
This was briefly mentioned in T50824 and seems @brecht is fine with such
explicitness, but need to agree with all active developers before committing
this.
Please note that this patch is lacking changes related on GPU/OpenCL
support. This will be solved if/when we all agree this is a good idea to move
forward.
Reviewers: brecht, lukasstockner97, maiself, nirved, dingto, juicyfruit, swerner
Reviewed By: lukasstockner97, maiself, nirved, dingto
Subscribers: brecht
Differential Revision: https://developer.blender.org/D2586
|
|
The title says it all actually.
|
|
Simplifies code quite a bit, making it shorter and easier to extend.
Currently no functional changes for users, but is required for the
upcoming work of shadow catcher support with OpenCL.
|
|
|
|
There were two cases where correlation issues were obvious:
- File from T38710 was giving issues in 2.78a again
- File from T50116 was having totally different shadow between
sample 1 and sample 32.
Use some more simplified version of CMJ hash which seems to give
nice randomized value which solves the correlation.
This commit will break all unit test files, but it's a bug fix
so perhaps OK to commit this.
This also fixes T41143: Sobol gives nonuniform noise
Proper science paper about hash function is coming.
Reviewers: brecht
Reviewed By: brecht
Subscribers: lukasstockner97
Differential Revision: https://developer.blender.org/D2385
|
|
|
|
their expected contribution
In scenes with many lights, some of them might have a very small contribution to some pixels, but the shadow rays are traced anyways.
To avoid that, this patch adds probabilistic termination to light samples - if the contribution before checking for shadowing is below a user-defined threshold, the sample will be discarded with probability (1 - (contribution / threshold)) and otherwise kept, but weighted more to remain unbiased.
This is the same approach that's also used in path termination based on length.
Note that the rendering remains unbiased with this option, it just adds a bit of noise - but if the setting is used moderately, the speedup gained easily outweighs the additional noise.
Reviewers: #cycles
Subscribers: sergey, brecht
Differential Revision: https://developer.blender.org/D2217
|
|
Mostly this is making inlining match CUDA 7.5 in a few performance critical
places. The end result is that performance is now better than before, possibly
due to less register spilling or other CUDA 8.0 compiler improvements.
On benchmarks scenes, there are 3% to 35% render time reductions. Stack memory
usage is reduced a little too.
Reviewed By: sergey
Differential Revision: https://developer.blender.org/D2269
|
|
|
|
|
|
Make sure we don't perform any implicit address space conversion.
A bit annoying, but less intrusive approaches (like using temp private
variable in .cl kernel) do not work correct here.
Using generic address space will help from code side here, but will
be somewhat slower due to extra things happening as far as i know.
|
|
As far as I can see, the second issue there was that the functions receive a pointer to a member variable of the
ShaderData, which is stored in global memory. However, this means that the pointer points to global memory as well,
therefore OpenCL requires the ccl_addr_space "keyword" in front of the pointer.
With this commit, the OpenCL kernels build on Linux with the Intel CPU OpenCL runtime - however, they already did
without the change and I don't have an AMD card, so I can't really test whether the AMD runtime is happy as well now.
|
|
This commit contains all the work related on the AMD megakernel split work
which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus
some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely
someone else which we're forgetting to mention.
Currently only AMD cards are enabled for the new split kernel, but it is
possible to force split opencl kernel to be used by setting the following
environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1.
Not all the features are supported yet, and that being said no motion blur,
camera blur, SSS and volumetrics for now. Also transparent shadows are
disabled on AMD device because of some compiler bug.
This kernel is also only implements regular path tracing and supporting
branched one will take a bit. Branched path tracing is exposed to the
interface still, which is a bit misleading and will be hidden there soon.
More feature will be enabled once they're ported to the split kernel and
tested.
Neither regular CPU nor CUDA has any difference, they're generating the
same exact code, which means no regressions/improvements there.
Based on the research paper:
https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf
Here's the documentation:
https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit
Design discussion of the patch:
https://developer.blender.org/T44197
Differential Revision: https://developer.blender.org/D1200
|
|
This was already mixed a bit, but the dot belongs there.
|
|
* Volume multiple importace sampling support to combine equiangular and distance
sampling, for both homogeneous and heterogeneous volumes.
* Branched path "Sample All Direct Lights" and "Sample All Indirect Lights" now
apply to volumes as well as surfaces.
Implementation note:
For simplicity this is all done with decoupled ray marching, the only case we do
not use decoupled is for distance only sampling with one light sample. The
homogeneous case should still compile on the GPU because it only requires fixed
size storage, but the heterogeneous case will be trickier to get working.
|
|
Transparent objects could become subtly visible by the different sampling
patterns for pixels covered and not covered by the object. It still converged
to the right solution but that can take a while. Now we try to use the same
sampling pattern here.
|
|
|
|
|
|
This makes it easier to pass this state around, and wraps some common RNG
dimension computations in utility functions.
|
|
Volumes can now have textured colors and density. There is a Volume Sampling
panel in the Render properties with these settings:
* Step size: distance between volume shader samples when rendering the volume.
Lower values give more accurate and detailed results but also increased render
time.
* Max steps: maximum number of steps through the volume before giving up, to
protect from extremely long render times with big objects or small step sizes.
This is much more compute intensive than homogeneous volume, so when you are not
using a texture you should enable the Homogeneous Volume option in the material
or world for faster rendering.
One important missing feature is that Generated texture coordinates are not yet
working in volumes, and they are the default coordinates for nearly all texture
nodes. So until that works you need to plug in object texture coordinates or a
world space position.
This is work by "storm", Stuart Broadfoot, Thomas Dinges and myself.
|
|
This to avoids build conflicts with libc++ on FreeBSD, these __ prefixed values
are reserved for compilers. I apologize to anyone who has patches or branches
and has to go through the pain of merging this change, it may be easiest to do
these same replacements in your code and then apply/merge the patch.
Ref T37477.
|
|
More information in this post:
http://code.blender.org/
Thanks to all contributes for giving their permission!
|
|
New features:
* Bump mapping now works with SSS
* Texture Blur factor for SSS, see the documentation for details:
http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Nodes/Shaders#Subsurface_Scattering
Work in progress for feedback:
Initial implementation of the "BSSRDF Importance Sampling" paper, which uses
a different importance sampling method. It gives better quality results in
many ways, with the availability of both Cubic and Gaussian falloff functions,
but also tends to be more noisy when using the progressive integrator and does
not give great results with some geometry. It works quite well for the
non-progressive integrator and is often less noisy there.
This code may still change a lot, so unless you're testing it may be best to
stick to the Compatible falloff function.
Skin test render and file that takes advantage of the gaussian falloff:
http://www.pasteall.org/pic/show.php?id=57661
http://www.pasteall.org/pic/show.php?id=57662
http://www.pasteall.org/blend/23501
|