Age | Commit message (Collapse) | Author |
|
* OneAPI: remove separate float3 definition
* OneAPI: disable operator[] to match other GPUs
* OneAPI: make int3 compact to match other GPUs
* Use #pragma once
* Add __KERNEL_NATIVE_VECTOR_TYPES__ to simplify checks
* Remove unused vector3
|
|
Simplifies intersection code a little and slightly improves precision regarding
self intersection.
The parametric texture coordinate in shader nodes is still the same as before
for compatibility.
|
|
|
|
Runtime switching between Embree and BVH2 got lost.
|
|
The number of Execution Units and resident "threads" (simd width * threads
per EUs) are now exposed and used to select the number of states using
a simplified heuristic.
|
|
|
|
float8 is a reserved type in Metal, but is not implemented. So rename to
float8_t for now.
Also move back intersection handlers to kernel.metal, they can't be in the
class that encapsulates the other Metal kernel functions.
|
|
The issue was introduced by rBad5e3d30a2d2 which made possible to use
unbounded elevation angle.
In order to not touch the shading code, we just remap the value to the
expected range the shading code expects. This means that elevation angles
above +/-PI/2 effectively flip the sun rotation angle.
|
|
This was tested in some places to check if code was being compiled for the
CPU, however this is only defined in the kernel. Checking __KERNEL_GPU__
always works.
|
|
This patch adds required math functions for float8 to make it possible
using float8 instead of float3 for color data.
Differential Revision: https://developer.blender.org/D15525
|
|
Having the OptiX/MetalRT/Embree/MetalRT implementations all in one file with
many #ifdefs became too confusing. Instead split it up per device, and also
move it together with device specific hit/filter/intersect functions and
associated data types.
|
|
No need anymore to have a difference between CPU/GPU, all distances
remain in world space.
|
|
All our intersections functions now work with unnormalized ray direction,
which means we no longer need to transform ray distance between world and
object space, they can all remain in world space.
There doesn't seem to be any real performance difference one way or the
other, but it does simplify the code.
|
|
This helps with debugging, and gives a slightly closer match between CPU
and CUDA/HIP/Metal renders when it comes to ray tracing precision.
|
|
-march=native is not supported for all architectures.
|
|
|
|
|
|
Similar to e9f82d3dc7eebadcc52, but for point clouds instead.
Differential Revision: https://developer.blender.org/D15487
|
|
These mutable pointers present problems with ownership in relation to
proper copy-on-write for attributes. The simplest solution is to just
remove them and retrieve the layers from `CustomData` when they are
needed. This also removes the complexity and redundancy of having to
update the pointers as the curves change. A similar change will apply
to meshes and point clouds.
One downside of this change is that it makes random access with RNA
slower. However, it's simple to just use the RNA attribute API instead,
which is unaffected. In this patch I updated Cycles to do that. With
the future attribute CoW changes, this generic approach makes sense
because Cycles can just request ownership of the existing arrays.
Differential Revision: https://developer.blender.org/D15486
|
|
After becb1530b1c81a408e20 the new curves object type isn't hidden
behind an experimental flag anymore, and other areas depend on this,
so disabling curves at compile time doesn't make sense anymore.
|
|
|
|
For transparency, volume and light intersection rays, adjust these distances
rather than the ray start position. This way we increment the start distance
by the smallest possible float increment to avoid self intersections, and be
sure it works as the distance compared to be will be exactly the same as
before, due to the ray start position and direction remaining the same.
Fix T98764, T96537, hair ray tracing precision issues.
Differential Revision: https://developer.blender.org/D15455
|
|
|
|
|
|
This was added for Metal, but also gives good results with CUDA and OptiX.
Also enable it for future Apple GPUs instead of only M1 and M2, since this has
been shown to help across multiple GPUs so the better bet seems to enable
rather than disable it.
Also moves some of the logic outside of the Metal device code, and always
enables the code in the kernel since other devices don't do dynamic compile.
Time per sample with OptiX + RTX A6000:
new old
barbershop_interior 0.0730s 0.0727s
bmw27 0.0047s 0.0053s
classroom 0.0428s 0.0464s
fishy_cat 0.0102s 0.0108s
junkshop 0.0366s 0.0395s
koro 0.0567s 0.0578s
monster 0.0206s 0.0223s
pabellon 0.0158s 0.0174s
sponza 0.0088s 0.0100s
spring 0.1267s 0.1280s
victor 0.0524s 0.0531s
wdas_cloud 0.0817s 0.0816s
Ref D15331, T87836
|
|
The Metal backend now compiles and caches a second set of kernels which are
optimized for scene contents, enabled for Apple Silicon.
The implementation supports doing this both for intersection and shading
kernels. However this is currently only enabled for intersection kernels that
are quick to compile, and already give a good speedup. Enabling this for
shading kernels would be faster still, however this also causes a long wait
times and would need a good user interface to control this.
M1 Max samples per minute (macOS 13.0):
PSO_GENERIC PSO_SPECIALIZED_INTERSECT PSO_SPECIALIZED_SHADE
barbershop_interior 83.4 89.5 93.7
bmw27 1486.1 1671.0 1825.8
classroom 175.2 196.8 206.3
fishy_cat 674.2 704.3 719.3
junkshop 205.4 212.0 257.7
koro 310.1 336.1 342.8
monster 376.7 418.6 424.1
pabellon 273.5 325.4 339.8
sponza 830.6 929.6 1142.4
victor 86.7 96.4 96.3
wdas_cloud 111.8 112.7 183.1
Code contributed by Jason Fielder, Morteza Mostajabodaveh and Michael Jones
Differential Revision: https://developer.blender.org/D14645
|
|
To be used for specialization in Metal, to automatically leave out unused nodes
from the kernel.
Ref D14645
|
|
To be used for specialization on Metal in a following commit, turning these
members into compile time constants.
Ref D14645
|
|
This is useful when using an armature as a camera rig, to avoid creating and
targetting an empty object.
Differential Revision: https://developer.blender.org/D7012
|
|
|
|
When the solve is successful, the light sample needs to be updated since the
effective shading point is now on the last refractive interface. Spread was
not taken into account, creating false caustics.
Differential Revision: https://developer.blender.org/D15449
|
|
|
|
With choices Default, Lower Memory and Faster Render. For convenience, and
to help communicate what the various settings do.
Differential Revision: https://developer.blender.org/D15446
|
|
This patch partitions the active indices into chunks prior to sorting by material in order to tradeoff some material coherence for better locality. On Apple Silicon GPUs (particularly higher end M1-family GPUs), we observe overall render time speedups of up to 15%. The partitioning is implemented by repeating the range of `shader_sort_key` for each partition, and encoding a "locator" key which distributes the indices into sorted chunks.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D15331
|
|
Also remove duplicate comments in bmesh_log.h, caused by automated
comment relocation in [0].
[0]: c4e041da23b9c45273fcd4874308c536b6a315d1
|
|
Measurements shown on average a 1.08x speedup for a 1.04x increase in
memory usage which is an acceptable trade off for a default setting,
although discoverability of such settings influencing memory usage could
be improved.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D15429
|
|
The current specific CentOS7 workaround we have for AoT, which is to
disable __FAST_MATH__ by using -fhonor-nans, now also fixes the
compilation issue for JIT as well since at least driver 23570.
|
|
Previously it was inactive but still clickable.
Ref D15316
|
|
To avoid Cycles not showing any hair by default, and to avoid very slow render
due to many overlaps with the previous 1 meter default in the node.
Fixes T97584, T99319
Differential Revision: https://developer.blender.org/D15405
|
|
with a very high min-driver version requirement, placeholder until JIT
CentOS runtime compilation issue gets fixed in a defined version.
min-driver version check can be worked around by setting
CYCLES_ONEAPI_ALL_DEVICES environment variable.
|
|
We used it only to access device id for explicitly allowing Arc GPUs.
It made the backend require ze_loader.dll which could be problematic if
we end up using direct linking.
I've replaced filtering based on PCI device id by using other HW properties
instead (EUs, threads per EU), that are now available through Level-Zero.
|
|
|
|
Initially oneAPI implementation have waited after each memory
operation, even if there was no need for this. Now, the implementation
will wait only if it is really necessary - it have improved
performance noticeble for some scenes and a bit for the rest of them.
|
|
Identical Intel GPUs ended up with the same id.
Added PCI BDF to the id to make it unique.
|
|
|
|
Ensure full render report is printed with default verbosity.
|
|
Add more math functions for float4 to make them on par with float3 ones. It
makes it possible to change the types of float3 variables to float4 without
additional work.
Differential Revision: https://developer.blender.org/D15318
|
|
|
|
|
|
This patch adds a new Cycles device with similar functionality to the
existing GPU devices. Kernel compilation and runtime interaction happen
via oneAPI DPC++ compiler and SYCL API.
This implementation is primarly focusing on Intel® Arc™ GPUs and other
future Intel GPUs. The first supported drivers are 101.1660 on Windows
and 22.10.22597 on Linux.
The necessary tools for compilation are:
- A SYCL compiler such as oneAPI DPC++ compiler or
https://github.com/intel/llvm
- Intel® oneAPI Level Zero which is used for low level device queries:
https://github.com/oneapi-src/level-zero
- To optionally generate prebuilt graphics binaries: Intel® Graphics
Compiler All are included in Linux precompiled libraries on svn:
https://svn.blender.org/svnroot/bf-blender/trunk/lib The same goes for
Windows precompiled binaries but for the graphics compiler, available
as "Intel® Graphics Offline Compiler for OpenCL™ Code" from
https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html,
for which path can be set as OCLOC_INSTALL_DIR.
Being based on the open SYCL standard, this implementation could also be
extended to run on other compatible non-Intel hardware in the future.
Reviewed By: sergey, brecht
Differential Revision: https://developer.blender.org/D15254
Co-authored-by: Nikita Sirgienko <nikita.sirgienko@intel.com>
Co-authored-by: Stefan Werner <stefan.werner@intel.com>
|