Age | Commit message (Collapse) | Author |
|
* Rename struct KernelGlobals to struct KernelGlobalsCPU
* Add KernelGlobals, IntegratorState and ConstIntegratorState typedefs
that every device can define in its own way.
* Remove INTEGRATOR_STATE_ARGS and INTEGRATOR_STATE_PASS macros and
replace with these new typedefs.
* Add explicit state argument to INTEGRATOR_STATE and similar macros
In preparation for decoupling main and shadow paths.
Differential Revision: https://developer.blender.org/D12888
|
|
This is the first of a sequence of changes to support compiling Cycles kernels as MSL (Metal Shading Language) in preparation for a Metal GPU device implementation.
MSL requires that all pointer types be declared with explicit address space attributes (device, thread, etc...). There is already precedent for this with Cycles' address space macros (ccl_global, ccl_private, etc...), therefore the first step of MSL-enablement is to apply these consistently. Line-for-line this represents the largest change required to enable MSL. Applying this change first will simplify future patches as well as offering the emergent benefit of enhanced descriptiveness.
The vast majority of deltas in this patch fall into one of two cases:
- Ensuring ccl_private is specified for thread-local pointer types
- Ensuring ccl_global is specified for device-wide pointer types
Additionally, the ccl_addr_space qualifier can be removed. Prior to Cycles X, ccl_addr_space was used as a context-dependent address space qualifier, but now it is either redundant (e.g. in struct typedefs), or can be replaced by ccl_global in the case of pointer types. Associated function variants (e.g. lcg_step_float_addrspace) are also redundant.
In cases where address space qualifiers are chained with "const", this patch places the address space qualifier first. The rationale for this is that the choice of address space is likely to have the greater impact on runtime performance and overall architecture.
The final part of this patch is the addition of a metal/compat.h header. This is partially complete and will be extended in future patches, paving the way for the full Metal implementation.
Ref T92212
Reviewed By: brecht
Maniphest Tasks: T92212
Differential Revision: https://developer.blender.org/D12864
|
|
|
|
causing rendering artifacts on simple scenes.
Fix T91632: Stops the sample correlation between dimensions which was causing rendering artefacts on simple scenes.
This is done by increasing the amount of jitter the Cranley Patterson Rotation is allowed to add. Also, it uses the y dimension of the of the sample table for 1D sampling which causes further decorrelation between dimensions. As an additional measure the x and y dimensions are swapped randomly to provide further decorrelation.
Maniphest Tasks: T91632
Differential Revision: https://developer.blender.org/D12610
|
|
This includes much improved GPU rendering performance, viewport interactivity,
new shadow catcher, revamped sampling settings, subsurface scattering anisotropy,
new GPU volume sampling, improved PMJ sampling pattern, and more.
Some features have also been removed or changed, breaking backwards compatibility.
Including the removal of the OpenCL backend, for which alternatives are under
development.
Release notes and code docs:
https://wiki.blender.org/wiki/Reference/Release_Notes/3.0/Cycles
https://wiki.blender.org/wiki/Source/Render/Cycles
Credits:
* Sergey Sharybin
* Brecht Van Lommel
* Patrick Mours (OptiX backend)
* Christophe Hery (subsurface scattering anisotropy)
* William Leeson (PMJ sampling pattern)
* Alaska (various fixes and tweaks)
* Thomas Dinges (various fixes)
For the full commit history, see the cycles-x branch. This squashes together
all the changes since intermediate changes would often fail building or tests.
Ref T87839, T87837, T87836
Fixes T90734, T89353, T80267, T80267, T77185, T69800
|
|
|
|
Work around what appears to be a compiler bug, just changing the code a bit
without any functional changes.
|
|
|
|
This sampling pattern is particularly suited to adaptive sampling, and will
be used for that upcoming feature.
Based on "Progressive Multi-Jittered Sample Sequences" by Per Christensen,
Andrew Kensler and Charlie Kilpatrick.
Ref D4686
|
|
|
|
Apply clang format as proposed in T53211.
For details on usage and instructions for migrating branches
without conflicts, see:
https://wiki.blender.org/wiki/Tools/ClangFormat
|
|
|
|
Old code was working quite unreliable in combination with fast math
flag, especially when compiling with Clang. It seems we were hitting
result of the following bug submitted to Clang [1].
Basically, it was happening so that (int)sqrtf(64) was 7 when Cycles
is built with Clang but was correct 8 when built with GCC.
This commit works this around. Annoying, but don't see other way to
keep sampling pattern the same for Clang and GCC.
[1] https://bugs.llvm.org//show_bug.cgi?id=24063
|
|
There were two cases where correlation issues were obvious:
- File from T38710 was giving issues in 2.78a again
- File from T50116 was having totally different shadow between
sample 1 and sample 32.
Use some more simplified version of CMJ hash which seems to give
nice randomized value which solves the correlation.
This commit will break all unit test files, but it's a bug fix
so perhaps OK to commit this.
This also fixes T41143: Sobol gives nonuniform noise
Proper science paper about hash function is coming.
Reviewers: brecht
Reviewed By: brecht
Subscribers: lukasstockner97
Differential Revision: https://developer.blender.org/D2385
|
|
Max samples 2147483647 was causing integer overflow.
|
|
It was wrongly considering 1 is a power of 2. While it is a correct thing
(1 == 2^0) it's not what the math in some later formulas expects.
|
|
With current formulation of cmj_fast_div_pow2() it should always return 0
in the case of first argument is zero and no assert really needed anymore.
|
|
This inconsistency drove me totally crazy, it's really confusing
when it's inconsistent especially when you work on both Cycles and
Blender sides.
Shouldn;t cause merge PITA, it's whitespace changes only, Git should
be able to merge it nicely.
|
|
Two things:
- Use intrinsics for clz/ctz (ctz is implemented via ffs()).
- Use faster sqrt() function which precision is enough for
integer values.
|
|
This was already mixed a bit, but the dot belongs there.
|
|
Issue was caused by the precision issues which made sdivm by 1 under
it's actual value. We can try to do some eps magic, but from the tests
on laptop and desktop doing integer division is not slower than using
floats here.
|
|
|
|
|
|
|
|
This to avoids build conflicts with libc++ on FreeBSD, these __ prefixed values
are reserved for compilers. I apologize to anyone who has patches or branches
and has to go through the pain of merging this change, it may be easiest to do
these same replacements in your code and then apply/merge the patch.
Ref T37477.
|
|
sampling and subsurface scattering.
|
|
More information in this post:
http://code.blender.org/
Thanks to all contributes for giving their permission!
|
|
* Add CUDA compiler version detection to cmake/scons/runtime
* Remove noinline in kernel_shader.h and reenable --use_fast_math if CUDA 5.x
is used, these were workarounds for CUDA 4.2 bugs
* Change max number of registers to 32 for sm 2.x (based on performance tests
from Martijn Berger and confirmed here), and also for NVidia OpenCL.
Overall it seems that with these changes and the latest CUDA 5.0 download, that
performance is as good as or better than the 2.67b release with the scenes and
graphics cards I tested.
|
|
|
|
and sm_30 cards, so hopefully it should all work now.
Also includes some warnings fixes related to nvcc compiler arguments, should make
no difference otherwise.
|
|
instead of sobol. So far one doesn't seem to be consistently better or worse than
the other for the same number of samples but more testing is needed.
The random number generator itself is slower than sobol for most number of samples,
except 16, 64, 256, .. because they can be computed faster. This can probably be
optimized, but we can do that when/if this actually turns out to be useful.
Paper this implementation is based on:
http://graphics.pixar.com/library/MultiJitteredSampling/
Also includes some refactoring of RNG code, fixing a Sobol correlation issue with
the first BSDF and < 16 samples, skipping some unneeded RNG calls and using a
simpler unit square to unit disk function.
|