Age | Commit message (Collapse) | Author |
|
This commit implements traversal of unaligned BVH nodes.
QBVH traversal is fully SIMD optimized and calculates orientation
for all 4 children at a time, regular BVH might probably be optimized
a bit more.
|
|
This is a special builder type which is allowed to orient nodes to
strands direction, hence minimizing their surface area in comparison
with axis-aligned nodes. Such nodes are much more efficient for hair
rendering.
Implementation of BVH builder is based on Embree, and generally idea
there is to calculate axis-aligned SAH and oriented SAH and if SAH
of oriented node is smaller than axis-aligned SAH we create unaligned
node.
We store both aligned and unaligned nodes in the same tree (which
seems to be different from what Embree is doing) so we don't have
any any extra calculations needed to set up hair ray for BVH
traversal, hence avoiding any possible negative effect of this new
BVH nodes type.
This new builder is currently not in use, still need to make BVH
traversal code aware of unaligned nodes.
|
|
This seems to be straightforward way to support heterogeneous nodes
in the same tree.
There is some penalty related on 4gig limit of the address space now,
but here's are the thing:
Traversal code was already using ints to store final offset, so
there can't be regressions really.
This is a required commit to make it possible to encode both aligned
and unaligned nodes in the same array. Also, in the future we can use
this to get rid of __leaf_nodes array (which is a bit tricky to do since
trickery in pack_instances().
|
|
There are several internal changes for this:
First idea is to make __tri_verts to behave similar to __tri_storage,
meaning, __tri_verts array now contains all vertices of all triangles
instead of just mesh vertices. This saves some lookup when reading
triangle coordinates in functions like triangle_normal().
In order to make it efficient needed to store global triangle offset
somewhere. So no __tri_vindex.w contains a global triangle index which
can be used to read triangle vertices.
Additionally, the order of vertices in that array is aligned with
primitives from BVH. This is needed to keep cache as much coherent as
possible for BVH traversal. This causes some extra tricks needed to
fill the array in and deal with True Displacement but those trickery
is fully required to prevent noticeable slowdown.
Next idea was to use this __tri_verts instead of __tri_storage in
intersection code. Unfortunately, this is quite tricky to do without
noticeable speed loss. Mainly this loss is caused by extra lookup
happening to access vertex coordinate.
Fortunately, tricks here and there (i,e, some types changes to avoid
casts which are not really coming for free) reduces those losses to
an acceptable level. So now they are within couple of percent only,
On a positive site we've achieved:
- Few percent of memory save with triangle-only scenes. Actual save
in this case is close to size of all vertices.
On a more fine-subdivided scenes this benefit might become more
obvious.
- Huge memory save of hairy scenes. For example, on koro.blend
there is about 20% memory save. Similar figure for bunny.blend.
This memory save was the main goal of this commit to move forward
with Hair BVH which required more memory per BVH node. So while
this sounds exciting, this memory optimization will become invisible
by upcoming Hair BVH work.
But again on a positive side, we can add an option to NOT use Hair
BVH and then we'll have same-ish render times as we've got currently
but will have this 20% memory benefit on hairy scenes.
|
|
It was initially unsupported because initial idea of checking visibility
of all children was slowing scenes down a lot. Now the idea has changed
and we only perform visibility check of current node. This avoids huge
slowdown (from tests here it seems to be withing 1-2%, but more tests
would never hurt) and gives nice speedup of ray traversal for complex
scenes which utilized ray visibility.
Here's timing of koro.blend:
Without visibility check With visibility check
Original file 4min 20sec 4min 23sec
Camera rays only 1min 43 sec 55sec
Unfortunately, this doesn't come for free and requires extra data in
BVH node, which increases memory usage of BVH nodes by 15%. This we
can solve with some future trickery of avoiding __tri_storage created
for curve segments.
|
|
|
|
inclusion list.
This shortens the list, and Blender render specific panels are added less often
than other panels anyway, so less chance to miss things.
|
|
full sample.
Differential Revision: https://developer.blender.org/D2080
|
|
Differential Revision: https://developer.blender.org/D2079
|
|
|
|
|
|
|
|
(See T48720).
|
|
Make sure we don't perform any implicit address space conversion.
A bit annoying, but less intrusive approaches (like using temp private
variable in .cl kernel) do not work correct here.
Using generic address space will help from code side here, but will
be somewhat slower due to extra things happening as far as i know.
|
|
Allows to plug/unplug different tablets while Blender is running.
Also fixes crash unplugging tablet while Blender runs (T48750).
|
|
|
|
As far as I can see, the second issue there was that the functions receive a pointer to a member variable of the
ShaderData, which is stored in global memory. However, this means that the pointer points to global memory as well,
therefore OpenCL requires the ccl_addr_space "keyword" in front of the pointer.
With this commit, the OpenCL kernels build on Linux with the Intel CPU OpenCL runtime - however, they already did
without the change and I don't have an AMD card, so I can't really test whether the AMD runtime is happy as well now.
|
|
Those (one per ID type!) were uselessly duplicated, and badly inconsistent
(some types were actually unlinking before deletion, others were only working if already unlinked!).
Now we use same func and same API for all types, by default deletion is performed only if ID is no more used,
set `do_unlink` parameter to True to always delete ID even if still in use.
Only exception now is with Scene, since we always want to keep at least one!
Note that this will change default behavior of some types (since unlinking is never done anymore by default).
|
|
Use OpenCL "all" builtin type for conversion, according to OpenCL 1.1 spec 6.3e.
|
|
Glossy, Anisotropic and Glass BSDFs
This commit adds a new distribution to the Glossy, Anisotropic and Glass BSDFs that implements the
multiple-scattering microfacet model described in the paper "Multiple-Scattering Microfacet BSDFs with the Smith Model".
Essentially, the improvement is that unlike classical GGX, which only models single scattering and assumes
the contribution of multiple bounces to be zero, this new model performs a random walk on the microsurface until
the ray leaves it again, which ensures perfect energy conservation.
In practise, this means that the "darkening problem" - GGX materials becoming darker with increasing
roughness - is solved in a physically correct and efficient way.
The downside of this model is that it has no (known) analytic expression for evalation. However, it can be
evaluated stochastically, and although the correct PDF isn't known either, the properties of MIS and the
balance heuristic guarantee an unbiased result at the cost of slightly higher noise.
Reviewers: dingto, #cycles, brecht
Reviewed By: dingto, #cycles, brecht
Subscribers: bliblubli, ace_dragon, gregzaal, brecht, harvester, dingto, marcog, swerner, jtheninja, Blendify, nutel
Differential Revision: https://developer.blender.org/D2002
|
|
The function that assigns names to socket types missed an entry, therefore all entries after it were mapped to the wrong name.
Long-term, it might be a better solution to use a map to avoid issues like these, but for now this fix works.
|
|
In the OSL node compilation code for the Environment Texture, is_linear was used as a socket.
However, there was no socket for it, which caused Blender to crash.
Adding a socket doesn't really make sense since it's an internal value and not a parameter
of the node, so it now just uses the variable directly.
|
|
undefined type for lamp objects
The problem here was that there are five path types internally (diffuse, glossy, transmission, subsurface and volume scatter), but subsurface isn't exposed to the user.
This caused some weird behaviour - if all four types are disabled on the lamp, Cycles doesn't even try sampling it, but if any type was active, the lamp would illuminate
the cube since none of the options set subsurface to zero.
In the future, it might be reasonable to add subsurface visibility as an option - but for now the weird and inconsistent behaviour can be fixed simply by setting both
diffuse and subsurface to zero if the user disables diffuse visibility.
|
|
The OpenCL texture code didn't offset the coordinates by half a pixel like the CPU code does.
|
|
Since most of the code for these two nodes was identical, this commit
now instead uses a common base class that implements all the functionality.
|
|
The file wasn't included in CMake and therefore not installed into the addon folder.
|
|
|
|
Invert, brightness & constrast, separate/combine and Mix RGB blend modes
and clamping.
|
|
It is not possible to use a set split by name as valid input to
check_node_input_traversed - it needs a complete set of all nodes visited so
far. On the other hand, the merge comparison loop should only check nodes that
were not just visited, but found unique. This means that there should really be
two separate data structures.
Without the fix, check_node_input_traversed actually never returns true, so
only nodes without any inputs are processed.
|
|
|
|
|
|
|
|
registers.
For non-branched path tracing with a GTX 960 and CUDA 7.5, this gives a small reduction
in stack usage but mainly: 8% faster render on BMW, 5% on pabellon, 13% on classroom.
|
|
|
|
|
|
This is an initial commit for half texture support in Cycles.
It adds the basic infrastructure inside of the ImageManager and support for these textures on CPU.
Supported:
* Half Float OpenEXR images (can be used for e.g HDRs or Normalmaps) now use 1/2 the memory, when loaded via disk (OIIO).
ToDo:
Various things like support for inbuilt half textures, GPU... will come later, step by step.
Part of my GSoC 2016.
|
|
|
|
|
|
The issue was caused by some numerical instability.
|
|
The sockets of the RGB to BW node were set to the wrong type after the recent node refactor.
|
|
|
|
|
|
output of the LightPath node
|
|
Unsigned int is not supported by OSL as far as i concerned, so should not
really matter here. However, might be wrong and perhaps more proper idea
would be so set it as regular int?
|
|
|
|
this in the future.
|
|
|
|
|
|
accordingly.
|
|
You can capture and stream video in the BGE using the DeckLink video
cards from Black Magic Design. You need a card and Desktop Video software
version 10.4 or above to use these features in the BGE.
Many thanks to Nuno Estanquiero who tested the patch extensively
on a variety of Decklink products, it wouldn't have been possible without
his help.
You can find a brief summary of the decklink features here: https://wiki.blender.org/index.php/Dev:Source/GameEngine/Decklink
The full API details and samples are in the Python API documentation.
bge.texture.VideoDeckLink(format, capture=0):
Use this object to capture a video stream. the format argument describes
the video and pixel formats and the capture argument the card number.
This object can be used as a source for bge.texture.Texture so that the frame
is sent to the GPU, or by itself using the new refresh method to get the video
frame in a buffer.
The frames are usually not in RGB but in YUV format (8bit or 10bit); they
require a shader to extract the RGB components in the GPU. Details and sample
shaders in the documentation.
3D video capture is supported: the frames are double height with left and right
eyes in top-bottom order. The 'eye' uniform (see setUniformEyef) can be used to
sample the 3D frame when the BGE is also in stereo mode. This allows to composite
a 3D video stream with a 3D scene and render it in stereo.
In Windows, and if you have a nVidia Quadro GPU, you can benefit of an additional
performance boost by using 'GPUDirect': a method to send a video frame to the GPU
without going through the OGL driver. The 'pinned memory' OGL extension is also
supported (only on high-end AMD GPU) with the same effect.
bge.texture.DeckLink(cardIdx=0, format=""):
Use this object to send video frame to a DeckLink card. Only the immediate mode
is supported, the scheduled mode is not implemented.
This object is similar to bge.texture.Texture: you need to attach a image source
and call refresh() to compute and send the frame to the card.
This object is best suited for video keying: a video stream (not captured) flows
through the card and the frame you send to the card are displayed above it (the
card does the compositing automatically based on the alpha channel).
At the time of this commit, 3D video keying is supported in the BGE but not in the
DeckLink card due to a color space issue.
|