git.blender.org/blender.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2022-10-24	Cycles: Metal integrator state size tuning	Michael Jones
	This patch tunes the integrator state sizing for Metal (`num_concurrent_states` and `num_concurrent_busy_states`). On all GPUs architecture, we adjust the busy:total states ratio to be 1:4 which gives better rendering performance than the previous 1:16 ratio (independent of total state count). This gives a small performance uplift (e.g. 2-3% on M1 Ultra). Additionally for M2 architectures, we double the overall state size if there is available headroom. Inclusive of the first change, we can expect uplift of close to 10% in future, as this results in larger dispatch sizes and minimises work submission overheads. In order to make an accurate determination of available headroom, we defer the calculation of `num_concurrent_states` and `num_concurrent_busy_states` until the time of integrator state allocation (i.e. after all of the scene data has been allocated). We also refactor `alloc_integrator_soa` to calculate an exact single-state-size in a first pass, right before allocating the integrator SoA buffers in a second pass. Reviewed By: brecht Differential Revision: https://developer.blender.org/D16313
2022-10-21	Avoid re-compilation of oneAPI AoT kernels when configuration changes	Sergey Sharybin
	Buildbot infrastructure relies on the fact that it can enable and disable `WITH_CYCLES_<COMPUTE>_BINARIES` without affecting speed of incremental builds. This allows buildbot to skip GPU kernels when doing CI regression tests which do not need GPU kernels, as well as it allows to move GPU kernels compilation to a separate step where all the resources are available to the GPU kernel builders. For the oneAPI compute enabling and disabling AoT kernels has much higher implications due to the kernels being a part of the device implementation from the build target perspective. This change makes it so different target names are used for JIT and AoT configurations, which allows CMake to more fully benefit from "caching" the compiled result. The end goal of this change is to make it so sequential build of the same code base on the buildbot happens super fast, Blender binary still needs to be re-linked when the AOT of oneAPI option is toggled, but that's already the case in the buildbot due to the WITH_BUILDINFO. Differential Revision: https://developer.blender.org/D16312
2022-10-21	Cycles: oneAPI: migrate from deprecated APIs, require libSYCL 6.0+	Xavier Hallade
	sycl::info::device::ext_intel_* descriptors are deprecated, replaced with sycl::ext::intel::info::device:: that are available from 6.0+, for which we now check version in CMake.
2022-10-21	Cycles: oneAPI: remove use of SYCL host device	Xavier Hallade
	Host device is deprecated in SYCL 2020 spec, cpu device or standard C++ should be used instead.
2022-10-21	Cycles: Bump versions of DPC++, IGC, and dependencies	Sergey Sharybin
	Patch by Xavier Hallade. Committing next to the actual libraries update in the svn.
2022-10-20	Cleanup: format	Campbell Barton

2022-10-19	Fix macOS build error after recent changes to enable Intel GPUs	Brecht Van Lommel
	This will only work once we upgrade to the macOS 13 SDK. Ref D16253
2022-10-19	Cycles: Allow Intel GPUs under Metal	Morteza Mostajab
	Known Issues: - Command buffer failures when using binary archives (binary archives is disabled for Intel GPUs as a workaround) - Wrong texture sampler being applied (to be addressed in the future) Ref T92212 Reviewed By: brecht Maniphest Tasks: T92212 Differential Revision: https://developer.blender.org/D16253
2022-10-19	Cycles: oneAPI: include sycl/sycl.hpp instead of CL/sycl.hpp	Xavier Hallade
	Since SYCL 2020 API, sycl/sycl.hpp is the way.
2022-10-19	Cycles: oneAPI: fix check_usm for debug builds	Xavier Hallade

2022-10-13	Cleanup: Fixed some warnings	Werner, Stefan
	Some unused parameters were left after changing the oneAPI device code to be a direclty linked shared library.
2022-10-12	Cycles: Enable MNEE on Metal (macOS >= 13)	Michael Jones
	This patch enables MNEE on macOS >= 13. There was an inefficiency in the calculation of spill requirements, fixed as of macOS 13. This patch also adds a temporary inlining workaround for a Metal compiler bug which causes `mnee_compute_constraint_derivatives` to behave incorrectly. Reviewed By: brecht Differential Revision: https://developer.blender.org/D16235
2022-10-10	Cycles: oneAPI: Trigger compilation of used kernels only	Nikita Sirgienko
	JIT compilation of oneAPI kernels now happens during load stage and proper message gets shown in the GUI during compilation. Also, this implementation skips kernels that aren't needed for the used scene, reducing overall (re)compilation time.
2022-10-07	Cycles: link oneAPI backend with debug version of sycl when in Debug	Xavier Hallade
	It fixes SYCL runtime issues in Debug builds that were due to mixing Release and Debug MSVC runtimes. This commit also removes specific handling of dpcpp compiler executable to simplify the CMake implementation. Using it like clang++ works and clang++ executable is also available from Intel oneAPI DPC++ compiler in case it doesn't.
2022-10-07	Cycles: use direct linking for oneAPI backend	Xavier Hallade
	This is a minimal set of changes, allowing a lot of cleanup that can happen afterward as it allows sycl method and objects to be used outside of kernel.cpp. Reviewed By: brecht, sergey Differential Revision: https://developer.blender.org/D15397
2022-09-28	Cleanup: spelling in comments	Campbell Barton
	Also add missing task ID.
2022-09-28	Cleanup: format	Campbell Barton

2022-09-27	Cycles: Add optional per-kernel performance statistics	Nikita Sirgienko
	When verbose level 4 is enabled, Blender prints kernel performance data for Cycles on GPU backends (except Metal that doesn't use debug_enqueue_* methods) for groups of kernels. These changes introduce a new CYCLES_DEBUG_PER_KERNEL_PERFORMANCE environment variable to allow getting timings for each kernels separately and not grouped with others. This is done by adding explicit synchronization after each kernel execution. Differential Revision: https://developer.blender.org/D15971
2022-09-27	Cycles: Disable binary archives on macOS < 13.0	Michael Jones
	An bug with binary archives was fixed in macOS 13.0 which stops some spurious kernel recompilations. In older macOS versions, falling back on the system shader cache will prevent recompilations in most instances (this is the same behaviour as in Blender 3.1.x and 3.2.x). Reviewed By: brecht Differential Revision: https://developer.blender.org/D16082
2022-09-27	Cycles: add Path Guiding on CPU through Intel OpenPGL	Sebastian Herhoz
	This adds path guiding features into Cycles by integrating Intel's Open Path Guiding Library. It can be enabled in the Sampling > Path Guiding panel in the render properties. This feature helps reduce noise in scenes where finding a path to light is difficult for regular path tracing. The current implementation supports guiding directional sampling decisions on surfaces, when the material contains a least one diffuse component, and in volumes with isotropic and anisotropic Henyey-Greenstein phase functions. On surfaces, the guided sampling decision is proportional to the product of the incident radiance and the normal-oriented cosine lobe and in volumes it is proportional to the product of the incident radiance and the phase function. The incident radiance field of a scene is learned and updated during rendering after each per-frame rendering iteration/progression. At the moment, path guiding is only supported by the CPU backend. Support for GPU backends will be added in future versions of OpenPGL. Ref T92571 Differential Revision: https://developer.blender.org/D15286
2022-09-13	Fix compilation error on Windows after recent change	Sergey Sharybin

2022-09-13	Cycles: Make OSL implementation independent from SVM	Patrick Mours
	Cleans up the file structure to be more similar to that of the SVM and also makes it possible to build kernels with OSL support, but without having to include SVM support. This patch was split from D15902. Differential Revision: https://developer.blender.org/D15949
2022-09-13	Cycles: Include reason the oneAPI library could not be loaded	Sergey Sharybin
	Additionally, just stick to a pure error stating. Such messages are aimed for developers and it is rather implied that oneAPI rendering will be disabled.
2022-09-06	Fix T100845: wrong Cycles OptiX runtime compilation include path	Josh Whelchel
	Causing OptiX kernel build errors on Arch Linux. Differential Revision: https://developer.blender.org/D15891
2022-09-06	Merge branch 'blender-v3.3-release'	Nikita Sirgienko

2022-09-06	Cycles: Fix crashes in oneAPI backend for scenes not fitting in dGPU memory	Nikita Sirgienko
	Differential Revision: https://developer.blender.org/D15889
2022-09-06	Cleanup: spelling in comments, formatting, move comments into headers	Campbell Barton

2022-08-29	Cycles: add option to specify OptiX runtime root directory	Brecht Van Lommel
	This allows individual users or Linux distributions to specify a directory Cycles will automatically look for the OptiX include folder, to compile kernels at runtime. It is still possible to override this with the OPTIX_ROOT_DIR environment variable at runtime. Based on patch by Sebastian Parborg. Ref D15792
2022-08-15	Cleanup OpenGL linking and related code after libepoxy merge	Sebastian Parborg
	This cleans up the OpenGL build flags and linking. It additionally also removes some dead code. One of these dead code paths is WITH_X11_ALPHA which actually never was active even with the build flag on. The call to use this was never called because the default initializer for GHOST was set to have it off per default. Nothing called this function with a boolean value to enable it. These cleanups are needed to support true headless OpenGL rendering. Without these cleanups libepoxy will fail to load the correct OpenGL Libraries as we have already linked them to the blender binary. Reviewed By: Brecht, Campbell, Jeroen Differential Revision: http://developer.blender.org/D15554
2022-08-15	GPU: replace GLEW with libepoxy	Christian Rauch
	With libepoxy we can choose between EGL and GLX at runtime, as well as dynamically open EGL and GLX libraries without linking to them. This will make it possible to build with Wayland, EGL, GLVND support while still running on systems that only have X11, GLX and libGL. It also paves the way for headless rendering through EGL. libepoxy is a new library dependency, and is included in the precompiled libraries. GLEW is no longer a dependency, and WITH_SYSTEM_GLEW was removed. Includes contributions by Brecht Van Lommel, Ray Molenkamp, Campbell Barton and Sergey Sharybin. Ref T76428 Differential Revision: https://developer.blender.org/D15291
2022-08-12	Cycles: Improve denoiser update performance when rendering with multiple GPUs	Patrick Mours
	This patch causes the render buffers to be copied to the denoiser device only once before denoising and output/display is then fed from that single buffer on the denoiser device. That way usually all but one copy (from all the render devices to the denoiser device) can be eliminated, provided that the denoiser device is also the display device (in which case interop is used to update the display). As such this patch also adds some logic that tries to ensure the chosen denoiser device is the same as the display device. Differential Revision: https://developer.blender.org/D15657
2022-07-27	Cycles oneAPI: simplify num_concurrent_states selection	Xavier Hallade
	The number of Execution Units and resident "threads" (simd width * threads per EUs) are now exposed and used to select the number of states using a simplified heuristic.
2022-07-25	Cleanup: remove __KERNEL_CPU__	Brecht Van Lommel
	This was tested in some places to check if code was being compiled for the CPU, however this is only defined in the kernel. Checking __KERNEL_GPU__ always works.
2022-07-15	Cleanup: compiler warning	Brecht Van Lommel

2022-07-15	Cycles: generalize shader sorting / locality heuristic to all GPU devices	Brecht Van Lommel
	This was added for Metal, but also gives good results with CUDA and OptiX. Also enable it for future Apple GPUs instead of only M1 and M2, since this has been shown to help across multiple GPUs so the better bet seems to enable rather than disable it. Also moves some of the logic outside of the Metal device code, and always enables the code in the kernel since other devices don't do dynamic compile. Time per sample with OptiX + RTX A6000: new old barbershop_interior 0.0730s 0.0727s bmw27 0.0047s 0.0053s classroom 0.0428s 0.0464s fishy_cat 0.0102s 0.0108s junkshop 0.0366s 0.0395s koro 0.0567s 0.0578s monster 0.0206s 0.0223s pabellon 0.0158s 0.0174s sponza 0.0088s 0.0100s spring 0.1267s 0.1280s victor 0.0524s 0.0531s wdas_cloud 0.0817s 0.0816s Ref D15331, T87836
2022-07-15	Cycles: Apple Silicon optimization to specialize intersection kernels	Michael Jones
	The Metal backend now compiles and caches a second set of kernels which are optimized for scene contents, enabled for Apple Silicon. The implementation supports doing this both for intersection and shading kernels. However this is currently only enabled for intersection kernels that are quick to compile, and already give a good speedup. Enabling this for shading kernels would be faster still, however this also causes a long wait times and would need a good user interface to control this. M1 Max samples per minute (macOS 13.0): PSO_GENERIC PSO_SPECIALIZED_INTERSECT PSO_SPECIALIZED_SHADE barbershop_interior 83.4 89.5 93.7 bmw27 1486.1 1671.0 1825.8 classroom 175.2 196.8 206.3 fishy_cat 674.2 704.3 719.3 junkshop 205.4 212.0 257.7 koro 310.1 336.1 342.8 monster 376.7 418.6 424.1 pabellon 273.5 325.4 339.8 sponza 830.6 929.6 1142.4 victor 86.7 96.4 96.3 wdas_cloud 111.8 112.7 183.1 Code contributed by Jason Fielder, Morteza Mostajabodaveh and Michael Jones Differential Revision: https://developer.blender.org/D14645
2022-07-15	Cycles: refactor to move part of KernelData definition to template header	Brecht Van Lommel
	To be used for specialization on Metal in a following commit, turning these members into compile time constants. Ref D14645
2022-07-14	Cycles: Improve cache usage on Apple GPUs by chunking active indices	Michael Jones
	This patch partitions the active indices into chunks prior to sorting by material in order to tradeoff some material coherence for better locality. On Apple Silicon GPUs (particularly higher end M1-family GPUs), we observe overall render time speedups of up to 15%. The partitioning is implemented by repeating the range of `shader_sort_key` for each partition, and encoding a "locator" key which distributes the indices into sorted chunks. Reviewed By: brecht Differential Revision: https://developer.blender.org/D15331
2022-07-01	Cycles: fix support for multiple Intel GPUs	Xavier Hallade
	Identical Intel GPUs ended up with the same id. Added PCI BDF to the id to make it unique.
2022-06-30	Cleanup: spelling in comments	Campbell Barton

2022-06-29	Cycles: Add support for rendering on Intel GPUs using oneAPI	Xavier Hallade
	This patch adds a new Cycles device with similar functionality to the existing GPU devices. Kernel compilation and runtime interaction happen via oneAPI DPC++ compiler and SYCL API. This implementation is primarly focusing on Intel® Arc™ GPUs and other future Intel GPUs. The first supported drivers are 101.1660 on Windows and 22.10.22597 on Linux. The necessary tools for compilation are: - A SYCL compiler such as oneAPI DPC++ compiler or https://github.com/intel/llvm - Intel® oneAPI Level Zero which is used for low level device queries: https://github.com/oneapi-src/level-zero - To optionally generate prebuilt graphics binaries: Intel® Graphics Compiler All are included in Linux precompiled libraries on svn: https://svn.blender.org/svnroot/bf-blender/trunk/lib The same goes for Windows precompiled binaries but for the graphics compiler, available as "Intel® Graphics Offline Compiler for OpenCL™ Code" from https://www.intel.com/content/www/us/en/developer/articles/tool/oneapi-standalone-components.html, for which path can be set as OCLOC_INSTALL_DIR. Being based on the open SYCL standard, this implementation could also be extended to run on other compatible non-Intel hardware in the future. Reviewed By: sergey, brecht Differential Revision: https://developer.blender.org/D15254 Co-authored-by: Nikita Sirgienko <nikita.sirgienko@intel.com> Co-authored-by: Stefan Werner <stefan.werner@intel.com>
2022-06-28	Cycles: enable Vega GPU/APU support	Sayak Biswas
	Enables Vega and Vega II GPUs as well as Vega APU, using changes in HIP code to support 64-bit waves and a new HIP SDK version. Tested with Radeon WX9100, Radeon VII GPUs and Ryzen 7 PRO 5850U with Radeon Graphics APU. Ref T96740, T91571 Differential Revision: https://developer.blender.org/D15242
2022-06-24	Cycles: stop Metal rendering on command buffer error	Brecht Van Lommel
	If there is an error we should stop rendering, instead of finishing with a wrong render result or reporting a wrong benchmark time. Ref T96519 Differential Revision: https://developer.blender.org/D15287
2022-06-23	Cleanup: make format	Brecht Van Lommel

2022-06-23	Cycles: Add diagnostic tracing of MTLLibrary compilation time	Michael Jones
	Reviewed By: sergey Differential Revision: https://developer.blender.org/D15268
2022-06-23	Cycles: Tidy of KernelData patchup code	Michael Jones
	Reviewed By: sergey Differential Revision: https://developer.blender.org/D15267
2022-06-23	Cycles: Distinguish Apple GPUs by core count	Michael Jones
	This patch suffixes Apple GPU device names with `(GPU - # cores)` so that variant GPUs with the same chipset can be distinguished. Currently benchmark scores for these M1 family GPUs are being incorrectly merged: - M1: 7 or 8 cores - M1 Pro: 14 or 16 cores - M1 Max: 24 or 32 cores - M1 Ultra: 48 or 64 cores Reviewed By: brecht, sergey Differential Revision: https://developer.blender.org/D15257
2022-06-20	Cleanup: renaming and consistency for kernel data	Brecht Van Lommel
	* Rename "texture" to "data array". This has not used textures for a long time, there are just global memory arrays now. (On old CUDA GPUs there was a cache for textures but not global memory, so we used to put all data in textures.) * For CUDA and HIP, put globals in KernelParams struct like other devices. * Drop __ prefix for data array names, no possibility for naming conflict now that these are in a struct.
2022-06-17	Cleanup: add verbose logging category names instead of numbers	Brecht Van Lommel
	And use them more consistently than before.
2022-06-13	Cycles: MetalDeviceQueue - capture of multiple dispatches, and some tidying	Michael Jones
	This patch adds a new mode of gpu capture (env var `CYCLES_DEBUG_METAL_CAPTURE_SAMPLES`) to capture a block of dispatches between "reset" calls. It also fixes member data naming inconsistencies and adds some missing OS version checks. Screenshot showing .gputrace capture in Xcode 14.0 beta (using `CYCLES_DEBUG_METAL_CAPTURE_SAMPLES="1"` and `CYCLES_DEBUG_METAL_CAPTURE_LIMIT="10"`): {F13155703} Reviewed By: sergey, brecht Differential Revision: https://developer.blender.org/D15179