Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Added some extra tirckery to avoid background being tinted dark with transparent
surface. Maybe a bit hacky, but seems to work fine.
|
|
This is a bit confusing, especially when one mixes OpenCL code where ulong equals
to uint64_t with CPU side code where ulong is expected to be something else from
the naming.
This commit makes it so we use explicit name, common on all platforms.
|
|
We don't enable global SSE optimizations in regular kernel, and we
keep those disabled on Linux 32bit.
One possible workaround would be to pass arguments by ccl_ref, but
that is quite a few of code which better be done accurately.
|
|
Was only needed to sue const reference on CPU. Now it is done using ccl_ref.
|
|
It is defined to & for CPU side compilation, and defined to an empty for any GPU
platform. The idea here is to use this macro instead of #ifdef block with bunch
of duplicated lines just to make it so CPU code is efficient.
Eventually we might switch to references on CUDA as well, but that would require
some intensive testing.
|
|
Image textures were being packed into a single buffer for OpenCL, which
limited the amount of memory available for images to the size of one
buffer (usually 4gb on AMD hardware). By packing textures into multiple
buffers that limit is removed, while simultaneously reducing the number
of buffers that need to be passed to each kernel.
Benchmarks were within 2%.
Fixes T51554.
Differential Revision: https://developer.blender.org/D2745
|
|
This way curve file becomes much shorter and it's also easier to write a
benchmark application to check performance before/after future changes.
|
|
|
|
Those are nothing to do with BVH, and can be used separately.
|
|
Still need to verify this is proper thing to do for AMD OpenCL. At least now
i can compile OpenCL kernel on my laptop with sm21 card.
|
|
Since all the shadow catchers are already assumed to be in the footage,
the shadows they cast on each other are already in the footage too. So
don't just let shadow catchers skip self, but all shadow catchers.
Another justification is that it should not matter if the shadow catcher
is modeled as one object or multiple separate objects, the resulting
render should be the same.
Differential Revision: https://developer.blender.org/D2763
|
|
|
|
Differential Revision: https://developer.blender.org/D2764
|
|
* Remove some unnecessary SSE emulation defines.
* Use full precision float division so we can enable it.
* Add sqrt(), sqr(), fabs(), shuffle variations, mask().
* Optimize reduce_add(), select().
Differential Revision: https://developer.blender.org/D2764
|
|
I need to use some macros defined in util_simd.h for float3/float4, to emulate
SSE4 instructions on SSE2. But due to issues with order of header includes this
was not possible, this does some refactoring to make it work.
Differential Revision: https://developer.blender.org/D2764
|
|
|
|
On Pabellon, 25.8s mega, 35.4s split before, 32.7s split after.
|
|
Differential Revision: https://developer.blender.org/D2766
|
|
The idea here is that it is possible to mark certain include statements
as "precompiled" which means all subsequent includes of that file will
be replaced with an empty string.
This is a way to deal with tricky include pattern happening in single
program OpenCL split kernel which was including bunch of headers about
10 times.
This brings preprocessing time from ~1sec to ~0.1sec on my laptop.
|
|
The order of evaluation of function arguments is undefined, and the order
was reversed between these compilers. This was causing regressions tests
to give different results between Linux and macOS.
|
|
|
|
This was causing different render results with different compilers. We
can't do much useful with 1 sample, but better for debugging.
|
|
|
|
|
|
GPUs together
This commit unifies the flattened texture slot names for bindless and regular CUDA textures. Texture indices are now identical across all CUDA architectures, where before Fermi used different indices, which lead to problems when rendering on multi-GPU setups mixing Fermi with newer hardware.
|
|
transparent object
Tweaked the path radiance summing and alpha to accommodate for possible contribution of
light by transparent surface bounces happening prior to shadow catcher intersection.
This commit will change the way how shadow catcher results looks when was behind semi
transparent object, but the old result seemed to be fully wrong: there were big artifacts
when alpha-overing the result on some actual footage.
|
|
In branched path tracing main loop is always a camera ray, with varying
number of transparent bounces.
|
|
Could have lead to black pixels.
|
|
|
|
panoramic camera settings
The problem here was that when a "invalid" path is generated by the panoramic camera, it was tagged
as RAY_TO_REGENERATE with the intention of generating a new path in kernel_buffer_update.
However, since that state was not handled in kernel_queue_enqueue, kernel_buffer_update did not
process the path which resulted in an infinite loop.
|
|
|
|
black.
|
|
As the title says, the normal wasn't set for the Hair BSDF because it wasn't
needed before. However, the denoiser uses it to store the feature passes, so
it needs to be set now.
|
|
If there was any specularity in the Principled BSDF, it would get a sampling
weight of one regardless of its actual impact.
This commit makes Cycles estimate the contribution of the component and adjust
the weighting accordingly, which greatly improves the noise characteristics of
the Principled BSDF in many cases.
Note that this commit might slightly change the brightness of areas when using
MultiGGX and high roughnesses, but the new brightness is more accurate and
closer to the result of Branched Path Tracing. See T51836 for details.
Differential Revision: https://developer.blender.org/D2677
|
|
The PDF of the MultiGGX sampling is approximated by the singlescattering GGX
term as well as a scaled diffuse term that makes up for the energy in the
multiscattering component that's missed by GGX.
However, there were two problems with the glossy terms: The diffuse term missed
a normalization factor, and the singlescattering term was not properly scaled
down based on the albedo estimate.
The glass term was completely wrong and has been rewritten. It uses the fresnel
factor to weight reflection vs. refraction and uses the glossy MultiGGX model
for reflection.
For refraction, the correct singlescattering term is now used, and a new
albedo approximation is used that was derived by evaluating GGX albedo for
roughnesses from 0 to 1 and IORs from 1 to 3 and fitting numerical
approximations to it. The resulting model has a mean relative error of 9e-5,
but could probably be simplified without losing noticable accuracy in the
final render.
The improved PDFs help with glossy highlights (due to better light sampling vs.
closure sampling MIS) and fix the situation described in T51836 where mixing
MultiGGX with other closures (as it happens in e.g. the Principled
BSDF) causes incorrect darkening.
|
|
This is compatible with UE4 and more consistent with specular and transmission
roughness, even if it deviates from the original Disney BRDF.
|
|
|
|
|
|
Was some mismatch in address space. Seems to be caused by recent additions.
Additionally, moved decoupled ray marching functions under ifdef, so they
don't try to use malloc() functions.
Thanks Mai for testing the patch!
|
|
Now, when there is no usable neighboring pixel for denoising, the noisy value
is preserved instead of producing a NaN.
Also, negative results are clamped to zero.
Note that there are just workarounds that don't fix the underlying problems,
but these issues are very rare and I'm not sure if it's even possible to fix
the underlying problems without introducing a significant slowdown or quality
decrease in other situations.
Because of that and since 2.79 is happening very soon, I just went for these
workarounds for now.
|
|
|
|
Technically not passing all buffers used by a kernel is undefined
behavior. We haven't had any issues with this so far on AMD or
Nvidia, but it's known to be a problem with Intel and we received
a report from AMD that this is a problem on newer hardware, so we
need to make this change at some point.
Unfortunately there a cost to being correct, about 5% for the
benchmark scenes. For low sample counts it's even worse, I've
seen up to 50% slowdown. For the latter case I think adjusting
tile updating logic can help, but not sure what that would look
like yet (it would be just a few lines change however).
|
|
threads
Unlike regular path tracing, branched path tracing is usually used with lower
sample counts, at least for primary rays. This means that are less samples for
the GPU to work on in parallel and rendering is slower. As there is less work
overall there is also more inactive threads during rendering with BPT. This
patch makes use of those inactive rays to render branched samples in parallel
with other samples.
Each thread that is preparing for a branched sample will attempt to find an
inactive thread and if one is found the state for the sample is copied to that
thread. Potentially, if there are enough inactive threads, 100s of branched
samples could be generated from the same originating thread and ran in
parallel giving large speed ups.
Gives 70% faster render for pavillion midday scene. 20-60% faster on BMW
with car paint replaced with SSS/volumes.
|