Age | Commit message (Collapse) | Author |
|
Moved all split kernel related stuff out of `Device` as it doesnt belong
there. Those functions are now apart of `DeviceSplitKernel` which now
must be implemented for each device type supporting the split kernel. No
functional changes.
|
|
Allocating the buffer is the job of the device implementation, not the split
kernel, so makes more sense to separate that code.
|
|
This is to allow devices to suggest a good global work size. Only
implemented for OpenCL devices right now.
|
|
Parallel samples never actually worked, was producing incorrect results
or crashes, and wasn't any faster than work stealing, so removing it.
|
|
This is to help debug and track memory usage for generic buffers. We
have similar for textures already since those require a name, but for
buffers the name is only for debugging proposes.
|
|
|
|
|
|
|
|
These methods shouldn't really be called from anywhere except
`DeviceSplitKernel`.
|
|
It was a bit confusing having two versions of this function, only one is
needed really. Added `device_memory::resize()` to take over the behavior
the overload provided.
|
|
The Progress system in Cycles had two limitations so far:
- It just counted tiles, but ignored their size. For example, when rendering a 600x500 image with 512x512 tiles, the right 88x500 tile would count for 50% of the progress, although it only covers 15% of the image.
- Scene update time was incorrectly counted as rendering time - therefore, the remaining time started very long and gradually decreased.
This patch fixes both problems:
First of all, the Progress now has a function to ignore time spans, and that is used to ignore scene update time.
The larger change is the tile size: Instead of counting samples per tile, so that the final value is num_samples*num_tiles, the code now counts every sample for every pixel, so that the final value is num_samples*num_pixels.
Along with that, some unused variables were removed from the Progress and Session classes.
Reviewers: brecht, sergey, #cycles
Subscribers: brecht, candreacchio, sergey
Differential Revision: https://developer.blender.org/D2214
|
|
Less likely to make mistakes later if something here needs to change.
|
|
|
|
selection
Previously, it was only possible to choose a single GPU or all of that type (CUDA or OpenCL).
Now, a toggle button is displayed for every device.
These settings are tied to the PCI Bus ID of the devices, so they're consistent across hardware addition and removal (but not when swapping/moving cards).
From the code perspective, the more important change is that now, the compute device properties are stored in the Addon preferences of the Cycles addon, instead of directly in the User Preferences.
This allows for a cleaner implementation, removing the Cycles C API functions that were called by the RNA code to specify the enum items.
Note that this change is neither backwards- nor forwards-compatible, but since it's only a User Preference no existing files are broken.
Reviewers: #cycles, brecht
Reviewed By: #cycles, brecht
Subscribers: brecht, juicyfruit, mib2berlin, Blendify
Differential Revision: https://developer.blender.org/D2338
|
|
|
|
CPU device needs to allocate and free thread specific data, these functions
are used to do that.
|
|
The new class `DeviceSplitKernel` will handle all logic for enqueueing of
kernels and memory memory management for the split kernel. Devices that
support the split kernel will create an instance of this class and call
its methods to run the split kernel.
There's still some work to do yet to make this device independent and to
deal with tile splitting.
|
|
The `enqueue_split_kernel_data_init()` function will allow each device type to
set up the various data buffers how ever they need to without concerning the
rest of the split kernel logic.
|
|
SplitKernelFunction can represent a split kernel function for any device its
been implemented for. Currently this is only for OpenCL to simplify the
enqueueing of the split kernels and move another step closer to a split
kernel that can run on any device.
|
|
Working towards using only device agnostic types and methods in the host.
|
|
Kernels can now be built without patch evaluation when not needed by the
scene (Catmull-Clark subdivision not in use), giving a performance boost
for some devices.
|
|
This adds support for CUDA Texture objects (also known as Bindless textures) for Kepler GPUs (Geforce 6xx and above).
This is used for all 2D/3D textures, data still uses arrays as before.
User benefits:
* No more limits of image textures on Kepler.
We had 5 float4 and 145 byte4 slots there before, now we have 1024 float4 and 1024 byte4.
This can be extended further if we need to (just change the define).
* Single channel textures slots (byte and float) are now supported on Kepler as well (1024 slots for each type).
ToDo / Issues:
* 3D textures don't work yet, at least don't show up during render. I have no idea whats wrong yet.
* Dynamically allocate bindless_mapping array?
I hope Fermi still works fine, but that should be tested on a Fermi card before pushing to master.
Part of my GSoC 2016.
Reviewers: sergey, #cycles, brecht
Subscribers: swerner, jtheninja, brecht, sergey
Differential Revision: https://developer.blender.org/D1999
|
|
* When Baking wasn't used we got an error.
* On top of Volume Nodes (NODES_FEATURE_VOLUME), we now also check if we need volume sampling code,
so we can disable that as well and save some further compilation time.
|
|
We don't have vectors re-allocation happening multiple times from inside
a loop anymore, so we can safely switch to a memory guarded allocator for
vectors and keep track on the memory usage at various stages of rendering.
Additionally, when building from inside Blender repository, Cycles will
use Blender's guarded allocator, so actual memory usage will be displayed
in the Space Info header.
There are couple of tricky aspects of the patch:
- TaskScheduler::exit() now explicitly frees memory used by `threads`.
This is needed because `threads` is a static member which destructor
isn't getting called on Blender's exit which caused memory leak print
to happen.
This shouldn't give any measurable speed issues, reallocation of that
vector is only one of fewzillion other allocations happening during
synchronization.
- Use regular guarded malloc (not aligned one). No idea why it was
made to be aligned in the first place. Perhaps some corner case tests
or so. Vector was never expected to be aligned anyway. Let's see if
we'll have actual bugs with this.
Reviewers: dingto, lukasstockner97, juicyfruit, brecht
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D1774
|
|
This panel is only visible when debug_value is set to 256 and has no
affect in other cases. However, if debug value is not set to this
value, environment variables will be used to control which features
are enabled, so there's no visible changes to anyone in fact.
There are some changes needed to prevent devices re-enumeration on
every Cycles session create.
Reviewers: juicyfruit, lukasstockner97, dingto, brecht
Reviewed By: lukasstockner97, dingto
Differential Revision: https://developer.blender.org/D1720
|
|
This gives few percent extra memory saving for the CUDA kernel when
using regular path tracing.
Still more like an experiment, but will be handy in the future.
|
|
This way we'll be able to disable SSS for the scene-adaptive kernel.
|
|
This way it's easier to re-use requested features logic across multiple
device implementations.
|
|
Basically just replace boolean periodic flag with extension type enum in the
device API.
|
|
Useful to have this always logged because otherwise it's needed to remove cached
kernels and check build flags to see which features are enabled.
|
|
This means render devices now might skip building baking kernels in cases when
only actual render-related functionality is used.
For now it's only implemented for OpenCL split kernel device and mainly needed
to work around some compiler-specific bugs which crashes on building the kernel.
Using OpenCL for baking might still crash the driver, but at least there is now
higher probability of that GPU will be usable to render the scene.
Real fix should actually be done in the driver side.
|
|
This features are now based on the scene settings, so scenes without those features
used are rendered even faster.
This gives about 30% speedup on the AMD A10 APU here, but at the same time it does
not mean such an improvement will happen on all the hardware. That being said, the
Tonga device here seems to have no measurable difference.
In any case it seems handy to have for the future, when we'll want to support SSS
in the kernel or to port selective compilation/split kernel to CUDA devices.
|
|
|
|
Not terribly necessary in this case, since we are just drawing a quad,
but makes blender overall more GL 3.x core ready.
|
|
This commit contains all the work related on the AMD megakernel split work
which was mainly done by Varun Sundar, George Kyriazis and Lenny Wang, plus
some help from Sergey Sharybin, Martijn Berger, Thomas Dinges and likely
someone else which we're forgetting to mention.
Currently only AMD cards are enabled for the new split kernel, but it is
possible to force split opencl kernel to be used by setting the following
environment variable: CYCLES_OPENCL_SPLIT_KERNEL_TEST=1.
Not all the features are supported yet, and that being said no motion blur,
camera blur, SSS and volumetrics for now. Also transparent shadows are
disabled on AMD device because of some compiler bug.
This kernel is also only implements regular path tracing and supporting
branched one will take a bit. Branched path tracing is exposed to the
interface still, which is a bit misleading and will be hidden there soon.
More feature will be enabled once they're ported to the split kernel and
tested.
Neither regular CPU nor CUDA has any difference, they're generating the
same exact code, which means no regressions/improvements there.
Based on the research paper:
https://research.nvidia.com/sites/default/files/publications/laine2013hpg_paper.pdf
Here's the documentation:
https://docs.google.com/document/d/1LuXW-CV-sVJkQaEGZlMJ86jZ8FmoPfecaMdR-oiWbUY/edit
Design discussion of the patch:
https://developer.blender.org/T44197
Differential Revision: https://developer.blender.org/D1200
|
|
This way device can actually make a decision of how it can optimize the kernel
in order to make it most efficient.
|
|
Previously we only had experimental flag passed to device's load_kernel() which
was all fine. But since we're gonna to have some extra parameters passed there
it makes sense to wrap them into a single struct, which will make it easier to
pass stuff around.
|
|
and continue to depend on boost though
Reviewers: dingto, sergey
Reviewed By: sergey
Subscribers: #cycles
Differential Revision: https://developer.blender.org/D1185
|
|
|
|
For CPU it gives available instructions set (SSE, AVX and so).
For GPU CUDA it reports most of the attribute values returned by
cuDeviceGetAttribute(). Ideally we need to only use set of those
which are driver-specific (so we don't clutter system info with
values which we can get from GPU specifications and be sure they
stay the same because driver can't affect on them).
|
|
This was already mixed a bit, but the dot belongs there.
|
|
Baking progress preview is not possible, in parts due to the way the API
was designed. But at least you get to see the progress bar while baking.
Reviewers: sergey
Differential Revision: https://developer.blender.org/D656
|
|
Instead of 95, we can use 145 images now. This only affects Kepler and above (sm30, sm_35 and sm_50).
This can be increased further if needed, but let's first test if this does not come with a performance impact.
Originally developed during my GSoC 2013.
|
|
Issue was caused by the wrong usage of OCIO GLSL binding API. To make it
work properly on pre-GLSL-1.3 drivers shader is to be enabled after the
texture is binded to the opengl context. Otherwise it wouldn't know the
proper texture size.
This is actually a regression in 2.70 and to be ported to 'a'.
|
|
All textures are sampled bi-linear currently with the exception of OSL there texture sampling is fixed and set to smart bi-cubic.
This patch adds user control to this setting.
Added:
- bits to DNA / RNA in the form of an enum for supporting multiple interpolations types
- changes to the image texture node drawing code ( add enum)
- to ImageManager (this needs to know to allocate second texture when interpolation type is different)
- to node compiler (pass on interpolation type)
- to device tex_alloc this also needs to get the concept of multiple interpolation types
- implementation for doing non interpolated lookup for cuda and cpu
- implementation where we pass this along to osl ( this makes OSL also do linear untill I add smartcubic to the interface / DNA/ RNA)
Reviewers: brecht, dingto
Reviewed By: brecht
CC: dingto, venomgfx
Differential Revision: https://developer.blender.org/D317
|
|
This actually works somewhat now, although viewport rendering is broken and any
kind of network error or connection failure will kill Blender.
* Experimental WITH_CYCLES_NETWORK cmake option
* Networked Device is shown as an option next to CPU and GPU Compute
* Various updates to work with the latest Cycles code
* Locks and thread safety for RPC calls and tiles
* Refactored pointer mapping code
* Fix error in CPU brand string retrieval code
This includes work by Doug Gale, Martijn Berger and Brecht Van Lommel.
Reviewers: brecht
Differential Revision: http://developer.blender.org/D36
|
|
More information in this post:
http://code.blender.org/
Thanks to all contributes for giving their permission!
|
|
avoid crashes.
|
|
because main thread works as well.
|
|
This commit adds memory usage information while rendering.
It reports memory used by device, meaning:
- For CPU it'll report real memory consumption
- For GPU rendering it'll report GPU memory consumption, but it'll
also mean the same memory is used from host side.
This information displays information about memory requested by Cycles,
not memory really allocated on a device. Real memory usage might be
higher because of memory fragmentation or optimistic memory allocator.
There's really nothing we can do against this.
Also in contrast with blender internal's render cycles memory usage
does not include memory used by scene, only memory needed by cycles
itself will be displayed. So don't freak out if memory usage reported
by cycles would be much lower than blender internal's.
This commit also adds RenderEngine.update_memory_stats callback which
is used to tell memory consumption from external engine to blender.
This information is used to generate information line after rendering
is finished.
|