Age | Commit message (Collapse) | Author |
|
Add more math functions for float4 to make them on par with float3 ones. It
makes it possible to change the types of float3 variables to float4 without
additional work.
Differential Revision: https://developer.blender.org/D15318
|
|
|
|
|
|
This patch adds a new Cycles device with similar functionality to the
existing GPU devices. Kernel compilation and runtime interaction happen
via oneAPI DPC++ compiler and SYCL API.
This implementation is primarly focusing on Intel® Arc™ GPUs and other
future Intel GPUs. The first supported drivers are 101.1660 on Windows
and 22.10.22597 on Linux.
The necessary tools for compilation are:
- A SYCL compiler such as oneAPI DPC++ compiler or
https://github.com/intel/llvm
- Intel® oneAPI Level Zero which is used for low level device queries:
https://github.com/oneapi-src/level-zero
- To optionally generate prebuilt graphics binaries: Intel® Graphics
Compiler All are included in Linux precompiled libraries on svn:
https://svn.blender.org/svnroot/bf-blender/trunk/lib The same goes for
Windows precompiled binaries but for the graphics compiler, available
as "Intel® Graphics Offline Compiler for OpenCL™ Code" from
https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html,
for which path can be set as OCLOC_INSTALL_DIR.
Being based on the open SYCL standard, this implementation could also be
extended to run on other compatible non-Intel hardware in the future.
Reviewed By: sergey, brecht
Differential Revision: https://developer.blender.org/D15254
Co-authored-by: Nikita Sirgienko <nikita.sirgienko@intel.com>
Co-authored-by: Stefan Werner <stefan.werner@intel.com>
|
|
This could result in wrong skipping of SVM nodes in the graph. Now make the
logic consistent with the clamping in the OSL implementation and constant
folding.
Thanks to Christophe Hery for finding the problem and providing the fix.
|
|
Enables Vega and Vega II GPUs as well as Vega APU, using changes in HIP code
to support 64-bit waves and a new HIP SDK version.
Tested with Radeon WX9100, Radeon VII GPUs and Ryzen 7 PRO 5850U with Radeon
Graphics APU.
Ref T96740, T91571
Differential Revision: https://developer.blender.org/D15242
|
|
If there is an error we should stop rendering, instead of finishing with a
wrong render result or reporting a wrong benchmark time.
Ref T96519
Differential Revision: https://developer.blender.org/D15287
|
|
|
|
This change helps decrease Intel GPU binaries compile time by 5-10
minutes without impacting other backends.
Reviewed By: sergey, brecht
Differential Revision: http://developer.blender.org/D15273
|
|
This patch unifies the names of math functions for different data types and uses
overloading instead. The goal is to make it possible to swap out all the float3
variables containing RGB data with something else, with as few as possible
changes to the code. It's a requirement for future spectral rendering patches.
Differential Revision: https://developer.blender.org/D15276
|
|
Reviewed By: sergey
Differential Revision: https://developer.blender.org/D15268
|
|
Reviewed By: sergey
Differential Revision: https://developer.blender.org/D15267
|
|
This patch suffixes Apple GPU device names with `(GPU - # cores)` so that variant GPUs with the same chipset can be distinguished. Currently benchmark scores for these M1 family GPUs are being incorrectly merged:
- M1: 7 or 8 cores
- M1 Pro: 14 or 16 cores
- M1 Max: 24 or 32 cores
- M1 Ultra: 48 or 64 cores
Reviewed By: brecht, sergey
Differential Revision: https://developer.blender.org/D15257
|
|
* Rename "texture" to "data array". This has not used textures for a long time,
there are just global memory arrays now. (On old CUDA GPUs there was a cache
for textures but not global memory, so we used to put all data in textures.)
* For CUDA and HIP, put globals in KernelParams struct like other devices.
* Drop __ prefix for data array names, no possibility for naming conflict now that
these are in a struct.
|
|
And use them more consistently than before.
|
|
|
|
This patch adds a new mode of gpu capture (env var `CYCLES_DEBUG_METAL_CAPTURE_SAMPLES`) to capture a block of dispatches between "reset" calls. It also fixes member data naming inconsistencies and adds some missing OS version checks.
Screenshot showing .gputrace capture in Xcode 14.0 beta (using `CYCLES_DEBUG_METAL_CAPTURE_SAMPLES="1"` and `CYCLES_DEBUG_METAL_CAPTURE_LIMIT="10"`):
{F13155703}
Reviewed By: sergey, brecht
Differential Revision: https://developer.blender.org/D15179
|
|
|
|
|
|
This patch adds some useful debugging & profiling env vars to the Metal backend:
- `CYCLES_METAL_PROFILING`: output a per-kernel timing report at the end of the render
- `CYCLES_METAL_DEBUG`: enable per-dispatch tracing (very verbose)
- `CYCLES_DEBUG_METAL_CAPTURE_KERNEL`: enable programatic .gputrace capture for a specified kernel index
Here's an example of the timing report with `CYCLES_METAL_PROFILING` enabled:
```
---------------------------------------------------------------------------------------------------
Kernel name Total threads Dispatches Avg. T/D Time Time%
---------------------------------------------------------------------------------------------------
integrator_init_from_camera 657,407,232 161 4,083,274 0.24s 0.51%
integrator_intersect_closest 1,629,288,440 681 2,392,494 15.18s 32.12%
integrator_intersect_shadow 751,652,291 470 1,599,260 5.80s 12.28%
integrator_shade_background 304,612,074 263 1,158,220 1.16s 2.45%
integrator_shade_surface 1,159,764,041 676 1,715,627 20.57s 43.52%
integrator_shade_shadow 598,885,847 418 1,432,741 1.27s 2.69%
integrator_queued_paths_array 2,969,650,130 805 3,689,006 0.35s 0.74%
integrator_queued_shadow_paths_array 593,936,619 379 1,567,115 0.14s 0.29%
integrator_terminated_paths_array 22,205,417 155 143,260 0.05s 0.10%
integrator_sorted_paths_array 2,517,140,043 676 3,723,579 1.65s 3.50%
integrator_compact_paths_array 648,912,748 155 4,186,533 0.03s 0.07%
integrator_compact_states 20,872,687 155 134,662 0.14s 0.29%
integrator_terminated_shadow_paths_array 374,100,675 438 854,111 0.16s 0.33%
integrator_compact_shadow_paths_array 503,768,657 438 1,150,156 0.05s 0.10%
integrator_compact_shadow_states 37,664,941 202 186,460 0.23s 0.50%
integrator_reset 25,165,824 6 4,194,304 0.06s 0.12%
film_convert_combined_half_rgba 3,110,400 6 518,400 0.00s 0.01%
prefix_sum 676 676 1 0.19s 0.40%
---------------------------------------------------------------------------------------------------
6,760 47.27s 100.00%
---------------------------------------------------------------------------------------------------
```
Reviewed By: brecht
Differential Revision: https://developer.blender.org/D15044
|
|
|
|
|
|
Rendering directly to a resource using OpenGL interop and Hgi
doesn't work in Houdini, since it never uses the resulting resource
(it does not call `HdRenderBuffer::GetResource`). But since doing
that simultaneously disables mapping (`HdRenderBuffer::Map` is
not implemented then), nothing was displayed. To fix this, keep
track of whether a Hydra viewport does support displaying a Hgi
resource directly, by checking whether
`HdRenderBuffer::GetResource` is ever called and only enable use
of OpenGL interop if that is the case.
Differential Revision: https://developer.blender.org/D15090
|
|
|
|
Must include the AOV writing feature in background shader evaluation.
Differential Revision: https://developer.blender.org/D15114
|
|
The packed image loader was not aware of the fact that UDIM tiles
can be of a different size.
Exposed Python API required to access this information. It has the
same complexity as the "regular" packed files: in both cases the
ImBuf will be acquired and released to access the information.
While the current workflow of packing UDIMs is not very streamlined,
it is still possible and is something what the studio is using here.
Test file:
{F13130516}
Expected behavior achieved with this patch: a bigger checker board
pattern in viewport render
Actual behavior prior to this patch: either memory corruption, or
wrong/black render result on the plane
Differential Revision: https://developer.blender.org/D15111
|
|
Cycles and OptiX 7.4
Acceleration structures in the viewport default to building with the fast
build flag, but the intersection program used for curves was queried with
the fast trace flag. The resulting mismatch caused an exception in the
intersection kernel. Since it's difficult to predict whether dynamic or static
acceleration structures are going to be built at the time of kernel loading,
this fixes the mismatch by always using the fast trace flag for curves.
|
|
|
|
For non-raw, non-sRGB color spaces, always use half float even if that uses
more memory. Otherwise the precision loss from conversion to scene linear or
sRGB (as natively understood by the texture sampling) can be too much.
This also required a change to do alpha association ourselves instead of OIIO,
because in OIIO alpha multiplication happens before conversion to half float
and that gives too much precision loss.
Ref T68926
|
|
Differential Revision: https://developer.blender.org/D15102
|
|
It can be assumed that all scripts comply with basic pep8 formatting
regarding white-space, indentation etc.
Also remove note in best practices page & update `tests/python/pep8.py`.
If we want to exclude some scripts from make format,
this can be done by adding them to `ignore_files` in:
source/tools/utils_maintenance/autopep8_format_paths.py
Or using `# nopep8` for to ignore for individual lines.
Ref T98554
|
|
|
|
|
|
Move MNEE to own kernel, separate from shader ray-tracing. This does introduce
the limitation that a shader can't use both MNEE and AO/bevel, but that seems
like the better trade-off for now.
We can experiment with bigger kernel organization changes later.
Differential Revision: https://developer.blender.org/D15070
|
|
Contributed by luzpaz
Differential Revision: https://developer.blender.org/D15057
|
|
|
|
After recent changes to Nishita sky to clamp negative colors, the pixels ended
up a bit brighter which lead to them exceeding the half float max value. The
CUDA float to half function seems to need clamping.
|
|
|
|
Code was not yet updated to use generic attributes for vertex colors. This also
makes generic attributes work for adaptive subdivision.
|
|
Fix "parameter unused" warning that shows up when building without NanoVDB.
No functional changes.
|
|
|
|
|
|
Regenerate blackbody RGB curve fit to not clamp values, and extend down to
800K since it does now change below 965K.
Note that as before, blackbody is only defined in the range 800K to 12000K
and has a fixed value outside of that. But within that range there should
be no more unnecessary gamut clamping.
|
|
This patch makes it possible to change the precision with which to
store volume data in the NanoVDB data structure (as float, half, or
using variable bit quantization) via the previously unused precision
field in the volume data block.
It makes it possible to further reduce memory usage during
rendering, at a slight cost to the visual detail of a volume.
Differential Revision: https://developer.blender.org/D10023
|
|
|
|
Found those missing casts while looking into a crash report made in
the Blender Chat. Was unable to reproduce the crash, but the casts
should totally be there to avoid integer overflow.
|
|
|
|
|
|
Not sure why constructing a ustring inside [] is causing issues here, but
it's slightly more efficient to construct it once anyway.
|
|
|