Age | Commit message (Collapse) | Author |
|
Single program generally compiles kernels faster (2-3 times), loads faster,
takes less drive space (2-3 times), and reduces the number of cached kernels.
|
|
Reduces memory allocation for split kernel.
This allows for faster rendering due to bigger global size,
specially when GPU memory is limited.
Perfromance results:
R9 290 total render time
Before After Change
BMW 4:37 4:34 -1.1 %
Classroom 14:43 14:30 -1.5 %
Fishy Cat 11:20 11:04 -2.4 %
Koro 12:11 12:04 -1.0 %
Pabellon Barcelona 22:01 20:44 -5.8 %
Pabellon Barcelona(*) 15:32 15:09 -2.5 %
(*) without glossy connected to volume
|
|
This commit enables record-all transparent shadows rays.
Perfromance results:
R9 290 render time (without synchronization), seconds
Before After Change
BMW 261.5 262.5 +0.4 %
Classroom 869.6 867.3 -0.3 %
Fishy Cat 657.4 639.8 -2.7 %
Koro 1909.8 692.8 -63.7 %
Pabellon Barcelona 1633.3 1238.0 -24.2 %
Pabellon Barcelona(*) 1158.1 903.8 -22.0 %
(*) without glossy connected to volume
|
|
Decoupled ray marching is not supported yet.
Transparent shadows are always enabled for volume rendering.
Changes in kernel/bvh and kernel/geom are from Sergey.
This simiplifies code significantly, and prepares it for
record-all transparent shadow function in split kernel.
|
|
|
|
|
|
|
|
By calculating the size of the state buffer in the kernel rather than the host
less code is needed and the size actually reflects the requested features.
Will also be a little faster in some cases because of larger global work size.
|
|
Pointers to kernels were uninitialized leading to freeing of random memory
addresses. Another reason it would be good to use smart pointers.
|
|
Simple change to make it so that only kernels that have been modified are
rebuilt. Might only be useful during development.
|
|
Because the split kernel can render multiple samples in parallel it is
necessary to have everything initialized before rendering of any samples
begins. The code that normally handles initialization of
`rng_state` (`kernel_path_trace_setup()`) only does so for the first sample,
which was causing artifacts in the split kernel due to uninitialized
`rng_state` for some samples.
Note that because the split kernel can render samples in parallel this
means that the split kernel is incompatible with the LCG.
|
|
This was only needed for the previous implementation of parallel samples. As
we don't have that any more it can be removed.
Real reason for removal tho is this: `per_sample_output_buffers` was being
calculated too small and artifacts resulted. The tile buffer is already
the correct size and calculating the size for `per_sample_output_buffers`
is a bit difficult with the current layout of the code. As
`per_sample_output_buffers` was only needed for `sum_all_radiance`,
removing that kernel and writing output to the tile buffer directly
fixes the artifacts.
|
|
This makes it easier to initialize things correctly in the data_init kernel
before they are needed by path tracing.
|
|
|
|
This is to help debug and track memory usage for generic buffers. We
have similar for textures already since those require a name, but for
buffers the name is only for debugging proposes.
|
|
|
|
|
|
|
|
|
|
Simple workaround for some issues we've been having with AMD drivers hanging
and rendering systems unresponsive. Unfortunately this makes things a bit
slower, but its better than having to do hard reboots. Will be removed when
drivers have been fixed.
Define CYCLES_DISABLE_DRIVER_WORKAROUNDS to disable for testing purposes.
|
|
This does a few things at once:
- Refactors host side split kernel logic into a new device
agnostic class `DeviceSplitKernel`.
- Removes tile splitting, a new work pool implementation takes its place and
allows as many threads as will fit in memory regardless of tile size, which
can give performance gains.
- Refactors split state buffers into one buffer, as well as reduces the
number of arguments passed to kernels. Means there's less code to deal
with overall.
- Moves kernel logic out of OpenCL kernel files so they can later be used by
other device types.
- Replaced OpenCL specific APIs with new generic versions
- Tiles can now be seen updating during rendering
|
|
Transferring memory to the device was very slow and there's really no
need when only zeroing a buffer.
|
|
|
|
This is needed so devices can know the size of a tile buffer before any
tiles are acquired.
|
|
This is useful for when theres no host side memory attched to the buffer
|
|
This was fixed ages ago for the interface case but not for the
command line. The thing here is that currently external engines
are relying on reports system to indicate that error happened
so suppressing reports storage in the background mode prevented
render pipeline from detecting errors happened.
This is all weak and i don't like it, but this is better than
delivering black frames from the farm.
|
|
New logic of split_faces was leaving mesh in a proper state
from Blender's point of view, but Cycles wanted loop normals
to be "flushed" to vertex normals.
Now we do such a flush from Cycles side again, so we don't
leave bad meshes behind.
Thanks Bastien for assistance here!
|
|
This way we can control exact spaces and such added to the cflags
which is crucial to troubleshoot certain drivers.
|
|
Finding which loop should share its vertex with which others is not easy
with regular Mesh data (mostly due to lack of advanced topology info, as
opposed with BMesh case).
Custom loop normals computing already does that - and can return 'loop
normal spaces', which among other things contain definitions of 'smooth
fans' of loops around vertices.
Using those makes it easy to find vertices (and then edges) that needs
splitting.
This commit also adds support of non-autosmooth meshes, where we want to
split out flat faces from smooth ones.
|
|
|
|
Noise texture is now faster when the color socket is unused. Potential for
speedup spotted by @nutel.
Some performance results:
Render Time Before After Difference
Gooseberry benchmark 47:51.34 45:55.57 -4%
Koro 12:24.92 12:18.46 -0.8%
Simple cube (Color socket) 48.53 48.72 +0.3%
Simple cube (Fac socket) 48.74 32.78 -32.7%
Goethe displacement 1:21.18 1:08.47 -15.6%
Cycles brick displacement 3:02.38 2:16.76 -25.0%
Large displacement scene 23:54.12 20:09.62 -15.6%
Reviewed By: sergey
Differential Revision: https://developer.blender.org/D2513
|
|
|
|
The issue seems to be caused by vertex normal being re-calculated
to something else than loop normal, which also caused wrong loop
normals after re-calculation.
For now issue is solved by preserving CD_NORMAL for loops after
split_faces() is finished, so render engine can access original
proper value.
|
|
|
|
cards
Was only visible with Persistent Images option ON.
|
|
|
|
|
|
Logic of handling shapekeys when entering and leaving edit mode for
curves was... utterly broken.
Was leaving actual curve data with edited shapekey applied to it.
|
|
|
|
|
|
of bvhutils
The release of these arrays should be the programmer's discretion since these arrays can continue to be used.
Only the expanded functions `bvhtree_from_mesh_edges_ex` and `bvhtree_from_mesh_looptri_ex` are currently being used in blender (in mesh_remap.c), and from what I could to analyze, these changes can prevent a crash.
|
|
|
|
|
|
Differential Revision: https://developer.blender.org/D2218
|
|
progress when baking with high samples
|
|
|
|
This is supposed to be a temporary layer.
If someone needs loop normals after split it should explicitly
ask for that.
|
|
|
|
This solves assert failure in CustomData_from_bmeshpoly() happening with
broom.blend file from barber shop SVN.
|
|
|