Age | Commit message (Collapse) | Author |
|
|
|
This deduplicates the calls for tile (un)mapping and allows to have a target buffer that is different from the source buffer (needed for baking and animation denoising).
|
|
|
|
This reverts commit d53093953f8f3b58600cb19020ecbe0b5f254b52.
|
|
The latest clang compiler (at least the one in Xcode 9.4.1) warns about the register keyword and macro expansions using defined().
Since these warnings come from third party code, we can't address them directly in Blender. Silencing them via #pramgas will
at least keep the warnings during a build down to the ones that are relevant to Blender code.
|
|
Work around what might be a compiler bug.
|
|
Reviewers: sergey, lukasstockner97
Reviewed By: lukasstockner97
Tags: #cycles, #bf_blender
Differential Revision: https://developer.blender.org/D3472
|
|
|
|
Hainan was missing from the list of GCN 1 cards.
|
|
|
|
|
|
This should be the last Fermi removal commit, unless I missed something.
It's been a pleasure Fermi!
|
|
device_cuda.cpp.
Fermi code in Cycles kernel and texture system are coming next.
|
|
This brings separate initialization for libcuda and libnvrtc, which
fixes Cycles nvrtc compilation not working on build machines without
CUDA hardware available.
Differential Revision: https://developer.blender.org/D3045
|
|
We should actually be using CL_DEVICE_MEM_BASE_ADDR_ALIGN for sub buffers,
previous change in this code was incorrect. Renamed the function now to
make the specific purpose of this alignment clear, it's not required for
data types in general.
|
|
This patch changes the huge list of projects in visual studio into a nice tree matching the source folder structure. see D2823 for details.
Differential Revision: http://developer.blender.org/D2823
|
|
nvcc is very picky regarding compiler versions, severely limiting the compiler we can use, this commit adds a nvrtc based compiler that'll allow us to build the cubins even if the host compiler is unsupported. for details see D2913.
Differential Revision: http://developer.blender.org/D2913
|
|
Spotted by Ha Hyung-jin, thanks!
|
|
|
|
This was we can introduce other types of BVH, for example, wider ones, without
causing too much mess around boolean flags.
Thoughs:
- Ideally device info should probably return bitflag of what BVH types it
supports.
It is possible to implement based on simple logic in device/ and mesh.cpp,
rest of the changes will stay the same.
- Not happy with workarounds in util_debug and duplicated enum in kernel.
Maybe enbum should be stores in kernel, but then it's kind of weird to include
kernel types from utils. Soudns some cyclkic dependency.
Reviewers: brecht, maxim_d33
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D3011
|
|
Mimics to checks in system_cpu_support() checks.
|
|
Debug flags are to be controlling render behavior, nothing to do with low level
system utilities.
it was simple to hack, but logically is wrong. Lets do things where they are
supposed to be done!
|
|
Also try to move them from headers to implementation files as much as possible.
|
|
This was probably harmless besides some unnecessary memory usage due to
aligning allocations too much.
|
|
This is not expected to fix all issues. Also adds some more details
to error reporting to investigate failures.
|
|
Ensure each OpenCL device has a unique ID even if the hardware ID is not
unique for some reason.
|
|
In that case it can now fall back to CPU memory, at the cost of reduced
performance. For scenes that fit in GPU memory, this commit should not
cause any noticeable slowdowns.
We don't use all physical system RAM, since that can cause OS instability.
We leave at least half of system RAM or 4GB to other software, whichever
is smaller.
For image textures in host memory, performance was maybe 20-30% slower
in our tests (although this is highly hardware and scene dependent). Once
other type of data doesn't fit on the GPU, performance can be e.g. 10x
slower, and at that point it's probably better to just render on the CPU.
Differential Revision: https://developer.blender.org/D2056
|
|
Fixes a few corner cases found while stress testing host mapped memory.
|
|
|
|
|
|
Previously, the NLM kernels would be launched once per offset with one thread per pixel.
However, with the smaller tile sizes that are now feasible, there wasn't enough work to fully occupy GPUs which results in a significant slowdown.
Therefore, the kernels are now launched in a single call that handles all offsets at once.
This has two downsides: Memory accesses to accumulating buffers are now atomic, and more importantly, the temporary memory now has to be allocated for every shift at once, increasing the required memory.
On the other hand, of course, the smaller tiles significantly reduce the size of the memory.
The main bottleneck right now is the construction of the transformation - there is nothing to be parallelized there, one thread per pixel is the maximum.
I tried to parallelize the SVD implementation by storing the matrix in shared memory and launching one block per pixel, but that wasn't really going anywhere.
To make the new code somewhat readable, the handling of rectangular regions was cleaned up a bit and commented, it should be easier to understand what's going on now.
Also, some variables have been renamed to make the difference between buffer width and stride more apparent, in addition to some general style cleanup.
|
|
Reviewers: sergey, brecht
Differential Revision: https://developer.blender.org/D2920
|
|
|
|
To help catch cases where adding a new kernel is missed for one of the
device implementations.
|
|
|
|
Goal is to reduce OpenCL kernel recompilations.
Currently viewport renders are still set to use 64 closures as this seems to
be faster and we don't want to cause a performance regression there. Needs
to be investigated.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D2775
|
|
|
|
It should not be needed as far as I know, but just in case it fixes any
of the recent issues like T52572.
|
|
|
|
This way we can log the amount of memory used, and it will be important
for host mapped memory support.
|
|
This is a prequisite for getting host memory allocation to work. There appears
to be no support for 3D textures using host memory. The original version of
this code was written by Stefan Werner for D2056.
|
|
|
|
Some drivers may report very large allocation sizes, which could cause
unnecessary memory usage. This is now limited to 2gb which should
still be enough to get the needed performance benefits without waste.
|
|
|
|
It's unlikely the driver can do useful optimizations with this, and if
we sum multiple samples we are reading from the memory anyway.
|
|
Part due to recent changes, part old bug.
|
|
Please check compilation before committing refactor changes!
|
|
|
|
|
|
* Remove tex_* and pixels_* functions, replace by mem_*.
* Add MEM_TEXTURE and MEM_PIXELS as memory types recognized by devices.
* No longer create device_memory and call mem_* directly, always go
through device_only_memory, device_vector and device_pixels.
|