Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
|
|
|
|
After barycentric convention changes, the differentials used for bump mapping
were wrong leading to artifacts with long thin triangles.
|
|
This patch generalizes the OSL support in Cycles to include GPU
device types and adds an implementation for that in the OptiX
device. There are some caveats still, including simplified texturing
due to lack of OIIO on the GPU and a few missing OSL intrinsics.
Note that this is incomplete and missing an update to the OSL
library before being enabled! The implementation is already
committed now to simplify further development.
Maniphest Tasks: T101222
Differential Revision: https://developer.blender.org/D15902
|
|
|
|
|
|
If no OPTIX_ROOT is set, nvcc fails to compile because there is a stray "-I"
in the arguments. Detect if the include path is empty and act accordingly.
Differential Revision: https://developer.blender.org/D16308
|
|
The distinction existed for legacy reasons, to easily port of Embree
intersection code without affecting the main vector types. However we are now
using SIMD for these types as well, so no good reason to keep the distinction.
Also more consistently pass these vector types by value in inline functions.
Previously it was partially changed for functions used by Metal to avoid having
to add address space qualifiers, simple to do it everywhere.
Also removes function declarations for vector math headers, serves no real
purpose.
Differential Revision: https://developer.blender.org/D16146
|
|
This patch tunes maximum threads-per-threadgroup and threads-per-block for faster renders on Apple GPUs. Appropriate tuning is selected based on the GPU architecture (M1 or M2). We see a benchmark uplift of around 5-10% on M1 family chips. Similar uplift is expected on M2 with upcoming OS changes. (Ref T101931)
Reviewed By: brecht
Maniphest Tasks: T101931
Differential Revision: https://developer.blender.org/D16299
|
|
The new Xcode declares the `sprintf()` function deprecated and
suggests to sue `snprintf()` as a safer alternative.
This change actually moves away from any formatted printing and
uses inlined byte-to-hex-string conversion which is also safe
and is (unmesurably) faster.
Differential Revision: https://developer.blender.org/D16378
|
|
|
|
|
|
|
|
This can be used for example for VR video formats that use this projection
instead of perspective projection for cubemap faces.
Differential Revision: https://developer.blender.org/D13525
|
|
|
|
CYCLES_ONEAPI_ALL_DEVICES environment variable wasn't working as
intended after 305b92e05f748a0fd9cb62b9829791d717ba2d57.
|
|
Intel documentation for Ubuntu 22.04 does list all runtime components
needed by the driver and oneAPI Cycles device but end-users getting
drivers from (other) sources can easily end-up missing required
Level-Zero Loader and struggle root causing what's wrong in their
system. Calling this requirement out in the UI will hopefull help them.
oneAPI Level-Zero incl. Loader: https://github.com/oneapi-src/level-zero
Common package names: level-zero, level-zero-loader
|
|
This fixes a 15% performance regression silently introduced by
79ab76e156d4bde937335be784cdf220294600d5 that aligned the compact
float3 on 16 bytes for oneAPI.
Current change is minimalist, there are further cleanup opportunities
such as removing packed_float3 definition for oneAPI but for some
reason, it cuts the recovered speedup in half, so we're starting with
this small fix for now.
Reviewed by: brecht
Differential Revision: https://developer.blender.org/D16340
|
|
This patch fixes T101790 by adding a macOS version check for deciding whether to show the caustics settings in the UI (MNEE kernels don't compile on macOS < 13.0)
Reviewed By: brecht
Maniphest Tasks: T101790
Differential Revision: https://developer.blender.org/D16339
|
|
|
|
This patch tunes the integrator state sizing for Metal (`num_concurrent_states` and `num_concurrent_busy_states`).
On all GPUs architecture, we adjust the busy:total states ratio to be 1:4 which gives better rendering performance than the previous 1:16 ratio (independent of total state count). This gives a small performance uplift (e.g. 2-3% on M1 Ultra).
Additionally for M2 architectures, we double the overall state size if there is available headroom. Inclusive of the first change, we can expect uplift of close to 10% in future, as this results in larger dispatch sizes and minimises work submission overheads. In order to make an accurate determination of available headroom, we defer the calculation of `num_concurrent_states` and `num_concurrent_busy_states` until the time of integrator state allocation (i.e. after all of the scene data has been allocated). We also refactor `alloc_integrator_soa` to calculate an *exact* single-state-size in a first pass, right before allocating the integrator SoA buffers in a second pass.
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D16313
|
|
This is to help ensure buildbot builds are correct, while still gracefully
disabling features in user/developer builds.
* Add WITH_STRICT_BUILD_OPTIONS to give an error when features can't be
enabled due to missing libraries or other reasons. Add new macro
set_and_warn_library_found used everywhere features were being
automatically disabled.
* Remove code from Windows and macOS for various libraries that would
automatically disable features. set_and_warn_library_found could be
used here also, but we are generally assuming the precompiled libraries
are complete and only test for availability when libraries are just
added.
Differential Revision: https://developer.blender.org/D16104
|
|
Buildbot infrastructure relies on the fact that it can enable and
disable `WITH_CYCLES_<COMPUTE>_BINARIES` without affecting speed of
incremental builds. This allows buildbot to skip GPU kernels when
doing CI regression tests which do not need GPU kernels, as well as
it allows to move GPU kernels compilation to a separate step where
all the resources are available to the GPU kernel builders.
For the oneAPI compute enabling and disabling AoT kernels has much
higher implications due to the kernels being a part of the device
implementation from the build target perspective.
This change makes it so different target names are used for JIT and
AoT configurations, which allows CMake to more fully benefit from
"caching" the compiled result.
The end goal of this change is to make it so sequential build of the
same code base on the buildbot happens super fast,
Blender binary still needs to be re-linked when the AOT of oneAPI
option is toggled, but that's already the case in the buildbot due
to the WITH_BUILDINFO.
Differential Revision: https://developer.blender.org/D16312
|
|
sycl::info::device::ext_intel_* descriptors are deprecated,
replaced with sycl::ext::intel::info::device:: that are available from
6.0+, for which we now check version in CMake.
|
|
Host device is deprecated in SYCL 2020 spec, cpu device or standard C++
should be used instead.
|
|
Patch by Xavier Hallade. Committing next to the actual libraries update
in the svn.
|
|
It was already done for (TM) which is present in some GPU names on Windows.
Names on Linux ended up using (tm) instead:
https://gitlab.freedesktop.org/mesa/mesa/-/commit/91fec2657a0dcf9bc2e6d080d84a114ecde44382
|
|
|
|
|
|
In T93382, the problem was that the Blender-side rendering code was
still generating the subsurface passes because the old render pass
flags were set, even though Cycles doesn't generate them anymore.
After a closer look, it turns out that the entire hardcoded pass
creation code can be removed. We already have an Engine API function
to query the list of render passes from the engine, so we might as
well just call that and create the returned passes.
Turns out that Eevee already did this anyways. On the Cycles side, it
allows to deduplicate a lot of `BlenderSync::sync_render_passes`.
Before, passes were defined in engine.py and in sync.cpp. Now, all
passes that engine.py returns are created automatically, so sync.cpp
only needs to handle a few special cases.
I'm not really concerned about affecting external renderer addons,
since they already needed to handle the old "builtin passes" in
their Engine API implementation anyways to make them show up in the
compositor. So, unless they missed that for like 10 releases, they
should not notice any difference.
Differential Revision: https://developer.blender.org/D16295
|
|
This will only work once we upgrade to the macOS 13 SDK.
Ref D16253
|
|
Known Issues:
- Command buffer failures when using binary archives (binary archives is disabled for Intel GPUs as a workaround)
- Wrong texture sampler being applied (to be addressed in the future)
Ref T92212
Reviewed By: brecht
Maniphest Tasks: T92212
Differential Revision: https://developer.blender.org/D16253
|
|
Previously, a first build using ninja would throw "ninja: error:
'intern/cycles/kernel/cycles_kernel_oneapi.lib', needed by
'bin/blender.exe', missing and no known rule to make it".
|
|
sycl::build triggers compilation even if prebuilt binaries are
available, we'll have to find a better way in this case.
|
|
Since SYCL 2020 API, sycl/sycl.hpp is the way.
|
|
|
|
MSVC Tools version doesn't match MSVC Redist version on some systems and
it's not populated when using Ninja outside of Visual Studio shell,
trying another way.
|
|
Remove redundant separators & redundant references to parent paths.
|
|
|
|
DDS files coming through OIIO needed a similar treatment as TGA in
T99565; just for DDS OIIO just never set the "unassociated alpha"
attribute. Fixes T101850.
Reviewed By: Brecht Van Lommel
Differential Revision: https://developer.blender.org/D16270
|
|
Thanks to Lukas for tracking down the cause in OIIO.
|
|
|
|
This was a floating point precision issue - or, to be more precise,
an issue with how Cycles split floats into the integer and fractional
parts for Perlin noise.
For coordinates below -2^24, the integer could be wrong, leading to
the fractional part being outside of 0-1 range, which breaks all sorts
of other things. 2^24 sounds like a lot, but due to how the detail
octaves work, it's not that hard to reach when combined with a large
scale.
Since this code is originally based on OSL, I checked if they changed
it in the meantime, and sure enough, there's a fix for it:
https://github.com/OpenImageIO/oiio/commit/5c9dc68391e9
So, this basically just ports over that change to Cycles.
The original code mentions being faster, but as pointed out in the
linked commit, the performance impact is actually irrelevant.
I also checked in a simple scene with eight Noise textures at
detail 15 (with >90% of render time being spent on the noise), and
the render time went from 13.06sec to 13.05sec. So, yeah, no issue.
|
|
Missing from recent commit to change the default in a6db2c2.
|
|
Test kernel will now test functionalities related to kernel execution
with USM memory allocations instead of with SYCL buffers and accessors
as these aren't currently used in the backend.
|
|
This change removes CMake code for automatic calculation of the number
of offline device compiler instances, to hand over control to developers
instead as it incurs a rather large memory usage with around 8GB per
instance at peak.
Use SYCL_OFFLINE_COMPILER_PARALLEL_JOBS CMake variable to configure it.
|
|
Some unused parameters were left after changing the oneAPI device code
to be a direclty linked shared library.
|