Age | Commit message (Collapse) | Author |
|
Host device is deprecated in SYCL 2020 spec, cpu device or standard C++
should be used instead.
|
|
|
|
Known Issues:
- Command buffer failures when using binary archives (binary archives is disabled for Intel GPUs as a workaround)
- Wrong texture sampler being applied (to be addressed in the future)
Ref T92212
Reviewed By: brecht
Maniphest Tasks: T92212
Differential Revision: https://developer.blender.org/D16253
|
|
sycl::build triggers compilation even if prebuilt binaries are
available, we'll have to find a better way in this case.
|
|
Since SYCL 2020 API, sycl/sycl.hpp is the way.
|
|
Test kernel will now test functionalities related to kernel execution
with USM memory allocations instead of with SYCL buffers and accessors
as these aren't currently used in the backend.
|
|
JIT compilation of oneAPI kernels now happens during load stage
and proper message gets shown in the GUI during compilation.
Also, this implementation skips kernels that aren't needed for
the used scene, reducing overall (re)compilation time.
|
|
This is a minimal set of changes, allowing a lot of cleanup that can
happen afterward as it allows sycl method and objects to be used outside
of kernel.cpp.
Reviewed By: brecht, sergey
Differential Revision: https://developer.blender.org/D15397
|
|
This patch optimises the Metal inlining policy. It gives a small speedup (2-3% on M1 Max) with no notable compilation slowdown vs what is already in master. Previously noted compilation slowdowns (as reported in T100102) were caused by forcing inlining for `ccl_device`, but we get better rendering perf by relying on compiler heuristics in these cases.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D16081
|
|
This adds path guiding features into Cycles by integrating Intel's Open Path
Guiding Library. It can be enabled in the Sampling > Path Guiding panel in the
render properties.
This feature helps reduce noise in scenes where finding a path to light is
difficult for regular path tracing.
The current implementation supports guiding directional sampling decisions on
surfaces, when the material contains a least one diffuse component, and in
volumes with isotropic and anisotropic Henyey-Greenstein phase functions.
On surfaces, the guided sampling decision is proportional to the product of
the incident radiance and the normal-oriented cosine lobe and in volumes it
is proportional to the product of the incident radiance and the phase function.
The incident radiance field of a scene is learned and updated during rendering
after each per-frame rendering iteration/progression.
At the moment, path guiding is only supported by the CPU backend. Support for
GPU backends will be added in future versions of OpenPGL.
Ref T92571
Differential Revision: https://developer.blender.org/D15286
|
|
Windows drivers 101.3430 fix an important GUI-related crash and it's
best to prevent users from running into it.
Linux drivers weren't affected but still had relevant gpu binary
compatibility fixes, so it makes sense to keep the min-supported version
aligned across OSes.
|
|
Now explicitly including math.h first before #defining funcitons.
This avoids undefined behavior and improves compatibility with
different SYCL compilers and backends.
|
|
The recent revert of Apple silicon inlining changes to avoid long compile times
worked on macOS 12, but in macOS 13 Beta it results in render errors. This may
be a compiler bug and perhaps get fixed in time, but try to be on the safe side
and ensure Blender 3.3.0 works regardless.
This brings part of the inlining back, which brings improved performance but
also longer compiler times again. Compile time is around 2min now, where the
previous full inlining was about 5-7min.
Patch by Michael Jones.
Differential Revision: https://developer.blender.org/D15897
|
|
|
|
|
|
|
|
sycl/L0 runtime reports compute-runtime version since Intel graphics
driver 101.3268 on Windows, when querying driver version from sycl.
Prior to this driver, it was 0. Now we can bump minimum requirement to
this one and filter-out devices returning 0.
Maniphest Tasks: T100648
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This gave a 1.1x speedup, however also leads to very long compile times
that make it seems like Blender has stopped working.
This can be brought back in the future behind an option that users can
explicitly enabled.
Fix T100102
Ref D14923, D14763, T92212
|
|
|
|
|
|
Now all the same ones are available on CPU and GPU, which was previously not
possible due to lack of operator overloadng in OpenCL. Print functions are
no-ops on some GPUs.
Ref D15535
|
|
|
|
|
|
|
|
Emrbee
It should consistently use the Cycles pirmitive ID for self intersection detection,
not the one from the OptiX or Embree acceleration structure.
Differential Revision: https://developer.blender.org/D15632
|
|
|
|
Intel® Arc™ GPUs
Recently, performance with oneAPI have regressed due some recent
changes in Blender itself. This commit's changes is resolving this
and also improve compilation time for oneAPI backend first
execution (or Blender compilation time in case of AoT).
Regression have appeared after 5152c7c152e and not related to the
changes itself, but increase of kernels complexity introduced with
it. Changes in this commit is marking some Blender functions as
noinlined for oneAPI backend, which helps GPU compiler to deal with
this complexity without any negative side-effects on performance.
|
|
* OneAPI: remove separate float3 definition
* OneAPI: disable operator[] to match other GPUs
* OneAPI: make int3 compact to match other GPUs
* Use #pragma once
* Add __KERNEL_NATIVE_VECTOR_TYPES__ to simplify checks
* Remove unused vector3
|
|
Simplifies intersection code a little and slightly improves precision regarding
self intersection.
The parametric texture coordinate in shader nodes is still the same as before
for compatibility.
|
|
Runtime switching between Embree and BVH2 got lost.
|
|
The number of Execution Units and resident "threads" (simd width * threads
per EUs) are now exposed and used to select the number of states using
a simplified heuristic.
|
|
|
|
float8 is a reserved type in Metal, but is not implemented. So rename to
float8_t for now.
Also move back intersection handlers to kernel.metal, they can't be in the
class that encapsulates the other Metal kernel functions.
|
|
This was tested in some places to check if code was being compiled for the
CPU, however this is only defined in the kernel. Checking __KERNEL_GPU__
always works.
|
|
Having the OptiX/MetalRT/Embree/MetalRT implementations all in one file with
many #ifdefs became too confusing. Instead split it up per device, and also
move it together with device specific hit/filter/intersect functions and
associated data types.
|
|
All our intersections functions now work with unnormalized ray direction,
which means we no longer need to transform ray distance between world and
object space, they can all remain in world space.
There doesn't seem to be any real performance difference one way or the
other, but it does simplify the code.
|
|
For transparency, volume and light intersection rays, adjust these distances
rather than the ray start position. This way we increment the start distance
by the smallest possible float increment to avoid self intersections, and be
sure it works as the distance compared to be will be exactly the same as
before, due to the ray start position and direction remaining the same.
Fix T98764, T96537, hair ray tracing precision issues.
Differential Revision: https://developer.blender.org/D15455
|
|
|
|
The Metal backend now compiles and caches a second set of kernels which are
optimized for scene contents, enabled for Apple Silicon.
The implementation supports doing this both for intersection and shading
kernels. However this is currently only enabled for intersection kernels that
are quick to compile, and already give a good speedup. Enabling this for
shading kernels would be faster still, however this also causes a long wait
times and would need a good user interface to control this.
M1 Max samples per minute (macOS 13.0):
PSO_GENERIC PSO_SPECIALIZED_INTERSECT PSO_SPECIALIZED_SHADE
barbershop_interior 83.4 89.5 93.7
bmw27 1486.1 1671.0 1825.8
classroom 175.2 196.8 206.3
fishy_cat 674.2 704.3 719.3
junkshop 205.4 212.0 257.7
koro 310.1 336.1 342.8
monster 376.7 418.6 424.1
pabellon 273.5 325.4 339.8
sponza 830.6 929.6 1142.4
victor 86.7 96.4 96.3
wdas_cloud 111.8 112.7 183.1
Code contributed by Jason Fielder, Morteza Mostajabodaveh and Michael Jones
Differential Revision: https://developer.blender.org/D14645
|
|
The current specific CentOS7 workaround we have for AoT, which is to
disable __FAST_MATH__ by using -fhonor-nans, now also fixes the
compilation issue for JIT as well since at least driver 23570.
|
|
with a very high min-driver version requirement, placeholder until JIT
CentOS runtime compilation issue gets fixed in a defined version.
min-driver version check can be worked around by setting
CYCLES_ONEAPI_ALL_DEVICES environment variable.
|
|
We used it only to access device id for explicitly allowing Arc GPUs.
It made the backend require ze_loader.dll which could be problematic if
we end up using direct linking.
I've replaced filtering based on PCI device id by using other HW properties
instead (EUs, threads per EU), that are now available through Level-Zero.
|
|
|