Age | Commit message (Collapse) | Author |
|
Since all the shadow catchers are already assumed to be in the footage,
the shadows they cast on each other are already in the footage too. So
don't just let shadow catchers skip self, but all shadow catchers.
Another justification is that it should not matter if the shadow catcher
is modeled as one object or multiple separate objects, the resulting
render should be the same.
Differential Revision: https://developer.blender.org/D2763
|
|
|
|
|
|
Differential Revision: https://developer.blender.org/D2764
|
|
* Remove some unnecessary SSE emulation defines.
* Use full precision float division so we can enable it.
* Add sqrt(), sqr(), fabs(), shuffle variations, mask().
* Optimize reduce_add(), select().
Differential Revision: https://developer.blender.org/D2764
|
|
I need to use some macros defined in util_simd.h for float3/float4, to emulate
SSE4 instructions on SSE2. But due to issues with order of header includes this
was not possible, this does some refactoring to make it work.
Differential Revision: https://developer.blender.org/D2764
|
|
We already detect this automatically based on shading nodes and per shader
settings, and performance of this option is ok now all devices.
Differential Revision: https://developer.blender.org/D2767
|
|
|
|
On Pabellon, 25.8s mega, 35.4s split before, 32.7s split after.
|
|
Differential Revision: https://developer.blender.org/D2766
|
|
Two main things here:
1. Replace all unsafe for #line directive characters into a single loop,
avoiding multiple iterations and multiple temporary strings created.
2. Don't merge token char by char but calculate start and end point and
then copy all substring at once.
This gives about 15% speedup of source processing time. At this point
(with all previous commits from today) we've shrinked down compiled
sources size from 108 MB down to ~5.5 MB and lowered processing time
from 4.5 sec down to 0.047 sec on my laptop running Linux (this was a
constant time which Blender will always spent first time loading kernel,
even if we've got compiled clbin).
|
|
Add a safe version of normalize since all uses of normalize
did zero length checks, move this into a function.
Also avoid unnecessary conversion.
Gives minor speedup here (approx 3-5%).
|
|
Basically gather lines as-is during traversal, avoiding allocating
memory for all the lines in headers.
Brings additional performance improvement abut 20%.
|
|
The idea here is that it is possible to mark certain include statements
as "precompiled" which means all subsequent includes of that file will
be replaced with an empty string.
This is a way to deal with tricky include pattern happening in single
program OpenCL split kernel which was including bunch of headers about
10 times.
This brings preprocessing time from ~1sec to ~0.1sec on my laptop.
|
|
The idea is to re-use files which were already processed. Gives about 4x speedup
of processing time (~4.5sec vs 1.0sec) on my laptop for the whole OpenCL kernel.
For users it will mean lower delay before OpenCL rendering might start.
|
|
|
|
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2747
|
|
Generalizes current conditions, QT implements it the same way.
|
|
The order of evaluation of function arguments is undefined, and the order
was reversed between these compilers. This was causing regressions tests
to give different results between Linux and macOS.
|
|
|
|
This was causing different render results with different compilers. We
can't do much useful with 1 sample, but better for debugging.
|
|
|
|
GCC seems to detect uninitialized into function calls now, but then isn't
always smart enough to see that it is actually initialized. Disabling this
warning entirely seems a bit too much, so initialize a bit more now.
|
|
render.
|
|
|
|
|
|
|
|
GPUs together
This commit unifies the flattened texture slot names for bindless and regular CUDA textures. Texture indices are now identical across all CUDA architectures, where before Fermi used different indices, which lead to problems when rendering on multi-GPU setups mixing Fermi with newer hardware.
|
|
Change the implementation so it no longer takes over the mouse cursor motion
from the OS, instead only move it when warping, similar to Windows and X11.
Probably the reason it was not done this way originally is that you then get
a 500ms delay after warping, but we can use a trick to avoid that and get much
smoother mouse motion than before.
|
|
transparent object
Tweaked the path radiance summing and alpha to accommodate for possible contribution of
light by transparent surface bounces happening prior to shadow catcher intersection.
This commit will change the way how shadow catcher results looks when was behind semi
transparent object, but the old result seemed to be fully wrong: there were big artifacts
when alpha-overing the result on some actual footage.
|
|
In branched path tracing main loop is always a camera ray, with varying
number of transparent bounces.
|
|
This gives speed up for the split kernel in scenes using the principled BSDF
but without subsurface scattering.
|
|
Could have lead to black pixels.
|
|
|
|
This is something which was reported to work fine by Mai, Benjamin and
confirmed by myself. Disabling this workaround gains us some speedup:
Before Now
bmw27 04:28.42 04:07.79
classroom 09:26.48 08:54.53
fishy_cat 08:44.01 08:18.70
koro 09:17.98 08:57.18
pavillon_barcelone 12:26.64 11:52.81
Test environment is:
- Ubuntu 16.04, with all updates installed
- AMD RX 480 GPU
- amdgpu pro driver version 17.10-450821
|
|
|
|
Unfortunately this means disabling the code that ensures the title
bar is properly scaled with DPI, however better to have that as a
cosmetic issue than Blender being unusable with a lot of Intel GPUs.
|
|
|
|
|
|
directory when using refine
Residue of debug code remained form some older bug fix.
|
|
|
|
We already have this in util_algorithm.h
This reverts commit cff172c7621d89773baa99a9460f19056efb5f1e.
|
|
|
|
|
|
|
|
This file uses std::ostream for helper << operators, so need to make sure
corresponding header is included.
|
|
Some of the functions might have been inlined, but others i don't see
how that was possible (don't think virtual functions can be inlined here).
In any case, better be explicitly optimal in the code.
|
|
|
|
panoramic camera settings
The problem here was that when a "invalid" path is generated by the panoramic camera, it was tagged
as RAY_TO_REGENERATE with the intention of generating a new path in kernel_buffer_update.
However, since that state was not handled in kernel_queue_enqueue, kernel_buffer_update did not
process the path which resulted in an infinite loop.
|
|
|