Age | Commit message (Collapse) | Author |
|
declarations
|
|
Mimics how regular triangles are working and makes it more clear where
the stuff is located in the kernel.
Needed to have some forward declarations because of the current placement
of things in the kernel.
|
|
This allows to save a memory copy, which will be particularly useful for network rendering.
Reviewers: sergey, brecht, dingto, juicyfruit, maiself
Differential Revision: https://developer.blender.org/D2323
|
|
Note that volume rendering is not supported yet, this is a step towards that.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2299
|
|
|
|
|
|
Object coordinates can now be used in the displacement shader and will give
correct results, where as before bump mapping was calculated from the displace
positions and resulted in incorrect shading.
This works by evaluating the shader in two parts, first bump then surface, and
setting the shader state to match what it would be if the surface was
undisplaced for the bump shader evaluation. Currently only `P` is set as if
undisplaced, but other shader variables could be set as well, such as `I` or
`time`. Since these aren't set to anything meaningful for displacement I left
them out of this patch, we can decide what to do with them separately.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2156
|
|
|
|
Enables Catmull-Clark subdivision meshes with support for creases and attribute
subdivision. Still waiting on OpenSubdiv to fully support face varying
interpolation for subdividing uv coordinates tho. Also there may be some
inconsistencies with Blender's subdivision which will be resolved at a
later time.
Code for reading patch tables and creating patch maps is borrowed
from OpenSubdiv.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2111
|
|
This is for until we'll solve issues with toolkit 8.0.
|
|
Reviewed By: dingto, sergey
Differential Revision: https://developer.blender.org/D2127
|
|
These are complex nodes, and it's conceivable they may end up constant
in some circumstances within node groups, so folding support is useful.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2084
|
|
This adds support for ngons and attributes on subdivision meshes. Ngons are
needed for proper attribute interpolation as well as correct Catmull-Clark
subdivision. Several changes are made to achieve this:
- new primitive `SubdFace` added to `Mesh`
- 3 more textures are used to store info on patches from subd meshes
- Blender export uses loop interface instead of tessface for subd meshes
- `Attribute` class is updated with a simplified way to pass primitive counts
around and to support ngons.
- extra points for ngons are generated for O(1) attribute interpolation
- curves are temporally disabled on subd meshes to avoid various bugs with
implementation
- old unneeded code is removed from `subd/`
- various fixes and improvements
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2108
|
|
Matches better naming of volume traversal files, where we've got
optimized versions of a single step of volume intersection and
traversal which will gather all volume intersections.
|
|
BVH traversal is not really that much a geometry and we've got
quite some traversals now. Makes sense to keep them separate in
the name of source structure clarity.
|
|
This commit implements traversal of unaligned BVH nodes.
QBVH traversal is fully SIMD optimized and calculates orientation
for all 4 children at a time, regular BVH might probably be optimized
a bit more.
|
|
Glossy, Anisotropic and Glass BSDFs
This commit adds a new distribution to the Glossy, Anisotropic and Glass BSDFs that implements the
multiple-scattering microfacet model described in the paper "Multiple-Scattering Microfacet BSDFs with the Smith Model".
Essentially, the improvement is that unlike classical GGX, which only models single scattering and assumes
the contribution of multiple bounces to be zero, this new model performs a random walk on the microsurface until
the ray leaves it again, which ensures perfect energy conservation.
In practise, this means that the "darkening problem" - GGX materials becoming darker with increasing
roughness - is solved in a physically correct and efficient way.
The downside of this model is that it has no (known) analytic expression for evalation. However, it can be
evaluated stochastically, and although the correct PDF isn't known either, the properties of MIS and the
balance heuristic guarantee an unbiased result at the cost of slightly higher noise.
Reviewers: dingto, #cycles, brecht
Reviewed By: dingto, #cycles, brecht
Subscribers: bliblubli, ace_dragon, gregzaal, brecht, harvester, dingto, marcog, swerner, jtheninja, Blendify, nutel
Differential Revision: https://developer.blender.org/D2002
|
|
The file wasn't included in CMake and therefore not installed into the addon folder.
|
|
the standard cmake CUDA/NVCC arguments flag allowing 2015 build to use
msvc 2013 for cuda
|
|
Compile time per kernel increased alot after recent image commits, re-shuffle some code to fix this.
Patch by "LazyDodo".
Differential Revision: https://developer.blender.org/D2012
|
|
Should we also check whether toolkit exist perhaps?
|
|
after recent refactory.
|
|
This was a hard decision, because going newer CUDA toolkit makes
rendering up to 5% slower. But on another hand, it solves major
speed regressions (up to 30%) with branched path tracing on a
top level cards.
Neither of those regressions have a meaningful and sane workaround
from the code itself.
Toolkit 6.5 could still be used, but it's no longer recommended one.
|
|
It is initialized based on size of pointer, which matches our previous
behavior, but using it in Cycles side allows to cross-compile CUDA
binaries.
|
|
The idea is to switch from allocating separate buffers for shader data's
structure of arrays to allocating one huge memory block and do some index
trickery to make it accessed as SOA.
This saves quite reasonable amount of lines of code in device_opencl and
also makes it possible to get rid of special declaration of ShaderData
structure.
As a side effect it also makes it easier to experiment with SOA vs. AOS
for split kernel.
Works fine here on NVidia GTX580, Intel CPU amd AMD Fiji cards.
Reviewers: #cycles, brecht, juicyfruit, dingto
Differential Revision: https://developer.blender.org/D1593
|
|
This commit removes the experimental CUDA kernel, making SSS and CMJ
regular features.
Several improvements have been made in the past few
weeks (thanks Sergey!) which make SSS render several times faster (2-3x
compared to 2.76b) on the GPU, and the increased VRAM usage has also been
fixed. Therefore the experimental kernel is no longer needed.
Differential Revision: https://developer.blender.org/D1726
Manual has been updated: too:
https://www.blender.org/manual/render/cycles/features.html
|
|
|
|
Main goal is to make kernel signatures editing easier and less prone to the
errors caused by missing function signature update or so.
This will also make it easier to add new CPU architectures.
Reviewers: juicyfruit, dingto, lukasstockner97, brecht
Reviewed By: dingto, lukasstockner97, brecht
Differential Revision: https://developer.blender.org/D1703
|
|
|
|
The idea of this node is to sampling of 3D voxels at a given coordinate
supporting different mapping strategies (world space mapping, object
local space etc).
Currently not in use, it's a preparation step for supporting point density
textures.
|
|
|
|
Code there started becoming a bit too big, by splitting it up it'll make it
easier to do improvements or extending the features in there.
The layout is not totally final yet, would need to try de-duplicating parts
of code from split kernel with non-split integrators,
|
|
This commit re-shuffles code in split kernel once again and makes it so common
parts which is in the headers is only responsible to making all the work needed
for specified ray index. Getting ray index, checking for it's validity and
enqueuing tasks are now happening in the device specified part of the kernel.
This actually makes sense because enqueuing is indeed device-specified and i.e.
with CUDA we'll want to enqueue kernels from kernel and avoid CPU roundtrip.
TODO:
- Kernel comments are still placed in the common header files, but since queue
related stuff is not passed to those functions those comments might need to
be split as well.
Just currently read them considering that they're also covering the way how
all devices are invoking the common code path.
- Arguments might need to be wrapped into KernelGlobals, so we don't ened to
pass all them around as function arguments.
|
|
Since the kernel split work we're now having quite a few of new files, majority
of which are related on the kernel entry points. Keeping those files in the
root kernel folder will eventually make it really hard to follow which files are
actual implementation of Cycles kernel.
Those files are now moved to kernel/kernels/<device_type>. This way adding extra
entry points will be less noisy. It is also nice to have all device-specific
files grouped together.
Another change is in the way how split kernel invokes logic. Previously all the
logic was implemented directly in the .cl files, which makes it a bit tricky to
re-use the logic across other devices. Since we'll likely be looking into doing
same split work for CUDA devices eventually it makes sense to move logic from
.cl files to header files. Those files are stored in kernel/split. This does not
mean the header files will not give error messages when tried to be included
from other devices and their arguments will likely be changed, but having such
separation is a good start anyway.
There should be no functional changes.
Reviewers: juicyfruit, dingto
Differential Revision: https://developer.blender.org/D1314
|
|
No functional changes, just better to keep all atomic function in a single place,
they might become handy later.
|
|
Previously it was explicitly mentioning it's NVidia kernel related option,
but in fact it's also handy for the OpenCL kernel.
|
|
This commit contains all the work related on the AMD megakernel split work
which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus
some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely
someone else which we're forgetting to mention.
Currently only AMD cards are enabled for the new split kernel, but it is
possible to force split opencl kernel to be used by setting the following
environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1.
Not all the features are supported yet, and that being said no motion blur,
camera blur, SSS and volumetrics for now. Also transparent shadows are
disabled on AMD device because of some compiler bug.
This kernel is also only implements regular path tracing and supporting
branched one will take a bit. Branched path tracing is exposed to the
interface still, which is a bit misleading and will be hidden there soon.
More feature will be enabled once they're ported to the split kernel and
tested.
Neither regular CPU nor CUDA has any difference, they're generating the
same exact code, which means no regressions/improvements there.
Based on the research paper:
https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf
Here's the documentation:
https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit
Design discussion of the patch:
https://developer.blender.org/T44197
Differential Revision: https://developer.blender.org/D1200
|
|
This replaces sequential ray moving followed with scene intersection with
single BVH traversal, which gives us all possible intersections.
Only implemented for CPU, due to qsort and a bigger memory usage on GPU
which we rather avoid. GPU still uses the regular bvh volume intersection code, while CPU now uses the new code.
This improves render performance for scenes with:
a) Camera inside volume mesh
b) SSS mesh intersecting a volume mesh/domain
In simple volume files (not much geometry) performance is roughly the same
(slightly faster). In files with a lot of geometry, the performance
increase is larger. bmps.blend with a volume shader and camera inside the
mesh, it renders ~10% faster here.
Patch by Sergey and myself.
Differential Revision: https://developer.blender.org/D1264
|
|
|
|
|
|
It is based on fmath.h from OIIO and could be used to give some speedup
in areas where absolute accuracy is not so critical.
|
|
For SSE checks still could be decoupled to be able to compile SSE2
kernel and not SSE4 depending on the CPU or so.
|
|
This commit implements traversal for QBVH tree, which is based on the old loop
code for traversal itself and Embree for node intersection.
This commit also does some changes to the loop inspired by Embree:
- Visibility flags are only checked for primitives.
Doing visibility check for every node cost quite reasonable amount of time
and in most cases those checks are true-positive.
Other idea here would be to do visibility checks for leaf nodes only, but
this would need to be investigated further.
- For minimum hair width we extend all the nodes' bounding boxes.
Again doing curve visibility check is quite costly for each of the nodes and
those checks returns truth for most of the hierarchy anyway.
There are number of possible optimization still, but current state is good
enough in terms it makes rendering faster a little bit after recent watertight
commit.
Currently QBVH is only implemented for CPU with SSE2 support at least. All
other devices would need to be supported later (if that'd make sense from
performance point of view).
The code is enabled for compilation in kernel. but blender wouldn't use it
still.
|
|
This way extending intersection routines with some pre-calculation step wouldn't
explode the single file size, hopefully keeping them all in a nice maintainable
state.
|
|
Pretty straightforward implementation. Just needed to move some functions
around to make them available at shader compile time.
|
|
|
|
Currently only summed number of traversal steps and intersections used by the
camera ray intersection pass is implemented, but in the future we will support
more debug passes which would help checking what things makes the scene slow.
Example of such extra passes could be number of bounces, time spent on the
shader tree evaluation and so.
Implementation from the Cycles side is pretty much straightforward, could only
mention here that it's a build-time option disabled by default.
From the blender side it's implemented as a PASS_DEBUG with several subtypes
possible. This way we don't need to create an extra DNA pass type for each of
the debug passes, saving us a bits.
Reviewers: campbellbarton
Reviewed By: campbellbarton
Differential Revision: https://developer.blender.org/D813
|
|
|
|
Was hooked up last year for testing purposes, as we already had some code for it, but the closure itself is not really good nor really useful, so let's remove it.
|
|
Now we build 2 .cubins per architecture (e.g. kernel_sm_21.cubin, kernel_experimental_sm_21.cubin).
The experimental kernel can be used by switching to the Experimental Feature Set: http://wiki.blender.org/index.php/Doc:2.6/Manual/Render/Cycles/Experimental_Features
This enables Subsurface Scattering and Correlated Multi Jitter Sampling on GPU, while keeping the stability and performance of the regular kernel.
Differential Revision: https://developer.blender.org/D762
Patch by Sergey and myself.
Developer / Builder Note:
CUDA Toolkit 6.5 is highly recommended for this, also note that building the experimental kernel requires a lot of system memory (~7-8GB).
|