Age | Commit message (Collapse) | Author |
|
This way we can log the amount of memory used, and it will be important
for host mapped memory support.
|
|
This is a prequisite for getting host memory allocation to work. There appears
to be no support for 3D textures using host memory. The original version of
this code was written by Stefan Werner for D2056.
|
|
|
|
Some drivers may report very large allocation sizes, which could cause
unnecessary memory usage. This is now limited to 2gb which should
still be enough to get the needed performance benefits without waste.
|
|
|
|
It's unlikely the driver can do useful optimizations with this, and if
we sum multiple samples we are reading from the memory anyway.
|
|
Part due to recent changes, part old bug.
|
|
Please check compilation before committing refactor changes!
|
|
|
|
|
|
* Remove tex_* and pixels_* functions, replace by mem_*.
* Add MEM_TEXTURE and MEM_PIXELS as memory types recognized by devices.
* No longer create device_memory and call mem_* directly, always go
through device_only_memory, device_vector and device_pixels.
|
|
|
|
|
|
|
|
|
|
CPU rendering will be restricted to a BVH2, which is not ideal for raytracing
performance but can be shared with the GPU. Decoupled volume shading will be
disabled to match GPU volume sampling.
The number of CPU rendering threads is reduced to leave one core dedicated to
each GPU. Viewport rendering will also only use GPU rendering still. So along
with the BVH2 usage, perfect scaling should not be expected.
Go to User Preferences > System to enable the CPU to render alongside the GPU.
Differential Revision: https://developer.blender.org/D2873
|
|
|
|
|
|
|
|
This change affects CUDA GPUs not connected to a display or connected to a
display but supporting compute preemption so that the display does not
freeze. I couldn't find an official list, but compute preemption seems to be
only supported with GTX 1070+ and Linux (not GTX 1060- or Windows).
This helps improve small tile rendering performance further if there are
sufficient samples x number of pixels in a single tile to keep the GPU busy.
|
|
Best guess is that cuInit() somehow interferes with the AMD graphics driver
on Windows, and switching the initialization order to do OpenCL first seems
to solve the issue.
|
|
|
|
* Use common TextureInfo struct for all devices, except CUDA fermi.
* Move image sampling code to kernels/*/kernel_*_image.h files.
* Use arrays for data textures on Fermi too, so device_vector<Struct> works.
|
|
|
|
|
|
OpenCL.
The work size is still very conservative, and this doesn't help for progressive
refine. For that we will need to render multiple tiles at the same time. But this
should already help for denoising renders that require too much memory with big
tiles, and just generally soften the performance dropoff with small tiles.
Differential Revision: https://developer.blender.org/D2856
|
|
There is no significant difference in denoised benchmark scenes and
denoising ctests, so might as well make it all consistent.
|
|
A little faster on some benchmark scenes, a little slower on others, seems
about performance neutral on average and saves a little memory.
|
|
This makes sharing some code between mega/split in following commits a bit
easier, and also paves the way for rendering multiple tiles later.
|
|
Makes it possible to call a function like mem_alloc() when the context is
already active. Also fixes some missing pops in case of errors.
|
|
|
|
|
|
I don't know if this will actually work, needs testing. Ref T52064.
|
|
|
|
This is a bit confusing, especially when one mixes OpenCL code where ulong equals
to uint64_t with CPU side code where ulong is expected to be something else from
the naming.
This commit makes it so we use explicit name, common on all platforms.
|
|
Problem was that some code checks to see if device_pointer is null or
not and the new allocator wasn't even setting the pointer to anything
as it tracks memory location separately. Setting the pointer to non
null keeps all users of device_pointer happy.
|
|
- Apparently MSVC does not support compound literals
in C++ (at least by the looks of it).
- Not sure how opencl_device_assert was managing to
set protected property of the Device class.
|
|
Common folks, nobody considered master a C++11 only branch. Such decision is to
be done officially and will involve changes in quite a few infrastructure related
areas.
|
|
Image textures were being packed into a single buffer for OpenCL, which
limited the amount of memory available for images to the size of one
buffer (usually 4gb on AMD hardware). By packing textures into multiple
buffers that limit is removed, while simultaneously reducing the number
of buffers that need to be passed to each kernel.
Benchmarks were within 2%.
Fixes T51554.
Differential Revision: https://developer.blender.org/D2745
|
|
|
|
I need to use some macros defined in util_simd.h for float3/float4, to emulate
SSE4 instructions on SSE2. But due to issues with order of header includes this
was not possible, this does some refactoring to make it work.
Differential Revision: https://developer.blender.org/D2764
|
|
On Pabellon, 25.8s mega, 35.4s split before, 32.7s split after.
|
|
This is something which was reported to work fine by Mai, Benjamin and
confirmed by myself. Disabling this workaround gains us some speedup:
Before Now
bmw27 04:28.42 04:07.79
classroom 09:26.48 08:54.53
fishy_cat 08:44.01 08:18.70
koro 09:17.98 08:57.18
pavillon_barcelone 12:26.64 11:52.81
Test environment is:
- Ubuntu 16.04, with all updates installed
- AMD RX 480 GPU
- amdgpu pro driver version 17.10-450821
|
|
|
|
|
|
|
|
|
|
Some of the functions might have been inlined, but others i don't see
how that was possible (don't think virtual functions can be inlined here).
In any case, better be explicitly optimal in the code.
|
|
|
|
|