Age | Commit message (Collapse) | Author |
|
Reshuffle cast intrinsics to make XOR to operate on __m128i rather
than on __m128.
Hopefully this does not affect performance.
|
|
This reverts commit d53093953f8f3b58600cb19020ecbe0b5f254b52.
|
|
The latest clang compiler (at least the one in Xcode 9.4.1) warns about the register keyword and macro expansions using defined().
Since these warnings come from third party code, we can't address them directly in Blender. Silencing them via #pramgas will
at least keep the warnings during a build down to the ones that are relevant to Blender code.
|
|
assuming sRGB
I've limited it to just the RGB<->XYZ stuff for now, correct image handling is the next step.
Reviewers: brecht, sergey
Differential Revision: https://developer.blender.org/D3478
|
|
There is one legit place in the code where memcpy was used as an
optimization trick. Was needed for older version of GCC, but now
it should be re-evaluated and checked if it still helps to have
that trick.
In other places it's somewhat lazy programming to zero out all
object members. That is absolutely unsafe, at the moment when
less trivial class is used as a member in that object things
will break.
Other cases were using memcpy into an object which comes from
an external library. We don't control that object, and we can
not guarantee it will always be safe for such memory tricks
and debugging bugs caused by such low level access is far fun.
Ideally we need to use more proper C++, but needs to be done with
big care, including benchmarks of each change, For now do
annoying but simple cast to void*.
|
|
|
|
This patch adds support for IES files, a file format that is commonly used to store the directional intensity distribution of light sources.
The new IES node is supposed to be plugged into the Strength input of the Emission node of the lamp.
Since people generating IES files do not really seem to care about the standard, the parser is flexible enough to accept all test files I have tried.
Some common weirdnesses are distributing values over multiple lines that should go into one line, using commas instead of spaces as delimiters and adding various useless stuff at the end of the file.
The user interface of the node is similar to the script node, the user can either select an internal Text or load a file.
Internally, IES files are handled similar to Image textures: They are stored in slots by the LightManager and each unique IES is assigned to one slot.
The local coordinate system of the lamp is used, so that the direction of the light can be changed. For UI reasons, it's usually best to add an area light,
rotate it and then change its type, since especially the point light does not immediately show its local coordinate system in the viewport.
Reviewers: #cycles, dingto, sergey, brecht
Reviewed By: #cycles, dingto, brecht
Subscribers: OgDEV, crazyrobinhood, secundar, cardboard, pisuke, intrah, swerner, micah_denn, harvester, gottfried, disnel, campbellbarton, duarteframos, Lapineige, brecht, juicyfruit, dingto, marek, rickyblender, bliblubli, lockal, sergey
Differential Revision: https://developer.blender.org/D1543
|
|
The GPU kernel needs to use atomics for accumulation since all offsets are processed in
parallel, but on CPUs that's not the case, so we can disable them there for a considerable speedup.
|
|
|
|
Differential Revision: https://developer.blender.org/D3109
|
|
Use C++11 threads when available, and native critical section on Windows.
Later on we can remove pthread code when C+11 becomes required.
Differential Revision: https://developer.blender.org/D3116
|
|
This is currently unused code, but causes gcc-8 to fail.
|
|
With better directory layout and more proper include
statements we can avoid several local modifications,
such as changing config.h for Windows Glog and the
ones related on pass-through statements in logging
headers in Glog.
This commit also makes unused functions not-a-warning
for external code.
|
|
|
|
|
|
This save a little memory and copying in the kernel by storing only a 4x3
matrix instead of a 4x4 matrix. We already did this in a few places, and
those don't need to be special exceptions anymore now.
|
|
This is in preparation of making Transform affine only, and also gives us
a little extra type safety so we don't accidentally treat it as a regular
4x4 matrix.
|
|
This is in preparation of making Transform affine only.
|
|
The purpose of the previous code refactoring is to make the code more readable,
but combined with this change benchmarks also render about 2-3% faster with an
NVIDIA Titan Xp.
|
|
OpenCL is C based, so no support for operators.
Related commit: 7377d411b47d50cd943cd
|
|
around the volume.
We generate a tight mesh around the active voxels of the volume in order
to effectively skip empty space, and start volume ray marching as close
to interesting volume data as possible. See code comments for details on
how the mesh generation algorithm works.
This gives up to 2x speedups in some scenes.
Reviewed by: brecht, dingto
Reviewers: #cycles
Subscribers: lvxejay, jtheninja, brecht
Differential Revision: https://developer.blender.org/D3038
|
|
|
|
This should be the last Fermi removal commit, unless I missed something.
It's been a pleasure Fermi!
|
|
Did not touch Texture related defines, that comes next.
|
|
These are used for randomization, so it's convenient if the index is already
hashed and consistent with the Object Info node.
|
|
It is basically brute force volume scattering within the mesh, but part
of the SSS code for faster performance. The main difference with actual
volume scattering is that we assume the boundaries are diffuse and that
all lighting is coming through this boundary from outside the volume.
This gives much more accurate results for thin features and low density.
Some challenges remain however:
* Significantly more noisy than BSSRDF. Adding Dwivedi sampling may help
here, but it's unclear still how much it helps in real world cases.
* Due to this being a volumetric method, geometry like eyes or mouth can
darken the skin on the outside. We may be able to reduce this effect,
or users can compensate for it by reducing the scattering radius in
such areas.
* Sharp corners are quite bright. This matches actual volume rendering
and results in some other renderers, but maybe not so much real world
objects.
Differential Revision: https://developer.blender.org/D3054
|
|
This patch changes the huge list of projects in visual studio into a nice tree matching the source folder structure. see D2823 for details.
Differential Revision: http://developer.blender.org/D2823
|
|
|
|
It was doing bit search in an opposite direction comparing to a
vectorized version.
|
|
This was disabled to avoid updating the geometry every time when the
material includes displacement, because there was no way to distinguish
between surface shader and displacement updates.
As a solution, we now compute an MD5 hash of the nodes linked to the
displacement socket, and only update the mesh if that changes.
Differential Revision: https://developer.blender.org/D3018
|
|
|
|
This was we can introduce other types of BVH, for example, wider ones, without
causing too much mess around boolean flags.
Thoughs:
- Ideally device info should probably return bitflag of what BVH types it
supports.
It is possible to implement based on simple logic in device/ and mesh.cpp,
rest of the changes will stay the same.
- Not happy with workarounds in util_debug and duplicated enum in kernel.
Maybe enbum should be stores in kernel, but then it's kind of weird to include
kernel types from utils. Soudns some cyclkic dependency.
Reviewers: brecht, maxim_d33
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D3011
|
|
Mimics to checks in system_cpu_support() checks.
|
|
Debug flags are to be controlling render behavior, nothing to do with low level
system utilities.
it was simple to hack, but logically is wrong. Lets do things where they are
supposed to be done!
|
|
Also try to move them from headers to implementation files as much as possible.
|
|
This was probably harmless besides some unnecessary memory usage due to
aligning allocations too much.
|
|
This was disabled previously due to CUDA compiler bugs, see T32900.
Differential Revision: https://developer.blender.org/D2937
|
|
In that case it can now fall back to CPU memory, at the cost of reduced
performance. For scenes that fit in GPU memory, this commit should not
cause any noticeable slowdowns.
We don't use all physical system RAM, since that can cause OS instability.
We leave at least half of system RAM or 4GB to other software, whichever
is smaller.
For image textures in host memory, performance was maybe 20-30% slower
in our tests (although this is highly hardware and scene dependent). Once
other type of data doesn't fit on the GPU, performance can be e.g. 10x
slower, and at that point it's probably better to just render on the CPU.
Differential Revision: https://developer.blender.org/D2056
|
|
Previously, the NLM kernels would be launched once per offset with one thread per pixel.
However, with the smaller tile sizes that are now feasible, there wasn't enough work to fully occupy GPUs which results in a significant slowdown.
Therefore, the kernels are now launched in a single call that handles all offsets at once.
This has two downsides: Memory accesses to accumulating buffers are now atomic, and more importantly, the temporary memory now has to be allocated for every shift at once, increasing the required memory.
On the other hand, of course, the smaller tiles significantly reduce the size of the memory.
The main bottleneck right now is the construction of the transformation - there is nothing to be parallelized there, one thread per pixel is the maximum.
I tried to parallelize the SVD implementation by storing the matrix in shared memory and launching one block per pixel, but that wasn't really going anywhere.
To make the new code somewhat readable, the handling of rectangular regions was cleaned up a bit and commented, it should be easier to understand what's going on now.
Also, some variables have been renamed to make the difference between buffer width and stride more apparent, in addition to some general style cleanup.
|
|
Reduces render time by about 1-2% in benchmark scenes.
Differential Revision: https://developer.blender.org/D2911
|
|
Denoising of specular shaders
|
|
|
|
|
|
There was some changes about namespaces, which causes ambiguities.
Replaces using namespace with an explicit symbols we need. Is good idea to NOT
pull in the whole namespace anyway!
|
|
This matches Blender Release 2.79.
|
|
* Remove tex_* and pixels_* functions, replace by mem_*.
* Add MEM_TEXTURE and MEM_PIXELS as memory types recognized by devices.
* No longer create device_memory and call mem_* directly, always go
through device_only_memory, device_vector and device_pixels.
|
|
|
|
|
|
* Use common TextureInfo struct for all devices, except CUDA fermi.
* Move image sampling code to kernels/*/kernel_*_image.h files.
* Use arrays for data textures on Fermi too, so device_vector<Struct> works.
|
|
|