git.blender.org/blender.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2017-10-04	Code refactor: zero render buffers outside of kernel.	Brecht Van Lommel
	This was originally done with the first sample in the kernel for better performance, but it doesn't work anymore with atomics. Any benefit was very minor anyway, too small to measure it seems.
2017-10-04	Code refactor: remove rng_state buffer and compute hash on the fly.	Brecht Van Lommel
	A little faster on some benchmark scenes, a little slower on others, seems about performance neutral on average and saves a little memory.
2017-10-04	Code refactor: add WorkTile struct for passing work to kernel.	Brecht Van Lommel
	This makes sharing some code between mega/split in following commits a bit easier, and also paves the way for rendering multiple tiles later.
2017-08-12	Cycles: optimize CPU split kernel data init.	Brecht Van Lommel

2017-08-08	Cycles: Pack kernel textures into buffers for OpenCL	Mai Lavelle
	Image textures were being packed into a single buffer for OpenCL, which limited the amount of memory available for images to the size of one buffer (usually 4gb on AMD hardware). By packing textures into multiple buffers that limit is removed, while simultaneously reducing the number of buffers that need to be passed to each kernel. Benchmarks were within 2%. Fixes T51554. Differential Revision: https://developer.blender.org/D2745
2017-05-16	Cycles: Fix building with native only option	Mai Lavelle
	Approach suggested by Lukas S.
2017-05-02	Cycles: Branched path tracing for the split kernel	Mai Lavelle
	This implements branched path tracing for the split kernel. General approach is to store the ray state at a branch point, trace the branched ray as normal, then restore the state as necessary before iterating to the next part of the path. A state machine is used to advance the indirect loop state, which avoids the need to add any new kernels. Each iteration the state machine recreates as much state as possible from the stored ray to keep overall storage down. Its kind of hard to keep all the different integration loops in sync, so this needs lots of testing to make sure everything is working correctly. We should probably start trying to deduplicate the integration loops more now. Nonbranched BMW is ~2% slower, while classroom is ~2% faster, other scenes could use more testing still. Reviewers: sergey, nirved Reviewed By: nirved Subscribers: Blendify, bliblubli Differential Revision: https://developer.blender.org/D2611
2017-03-29	Cycles: Make all #include statements relative to cycles source directory	Sergey Sharybin
	The idea is to make include statements more explicit and obvious where the file is coming from, additionally reducing chance of wrong header being picked up. For example, it was not obvious whether bvh.h was refferring to builder or traversal, whenter node.h is a generic graph node or a shader node and cases like that. Surely this might look obvious for the active developers, but after some time of not touching the code it becomes less obvious where file is coming from. This was briefly mentioned in T50824 and seems @brecht is fine with such explicitness, but need to agree with all active developers before committing this. Please note that this patch is lacking changes related on GPU/OpenCL support. This will be solved if/when we all agree this is a good idea to move forward. Reviewers: brecht, lukasstockner97, maiself, nirved, dingto, juicyfruit, swerner Reviewed By: lukasstockner97, maiself, nirved, dingto Subscribers: brecht Differential Revision: https://developer.blender.org/D2586
2017-03-13	Cycles: Cleanup, wipe obviously outdated parts of split kernel comments	Sergey Sharybin

2017-03-08	Cycles: Calculate size of split state buffer kernel side	Mai Lavelle
	By calculating the size of the state buffer in the kernel rather than the host less code is needed and the size actually reflects the requested features. Will also be a little faster in some cases because of larger global work size.
2017-03-08	Cycles: Initialize rng_state for split kernel	Mai Lavelle
	Because the split kernel can render multiple samples in parallel it is necessary to have everything initialized before rendering of any samples begins. The code that normally handles initialization of `rng_state` (`kernel_path_trace_setup()`) only does so for the first sample, which was causing artifacts in the split kernel due to uninitialized `rng_state` for some samples. Note that because the split kernel can render samples in parallel this means that the split kernel is incompatible with the LCG.
2017-03-08	Cycles: Remove sum_all_radiance kernel	Mai Lavelle
	This was only needed for the previous implementation of parallel samples. As we don't have that any more it can be removed. Real reason for removal tho is this: `per_sample_output_buffers` was being calculated too small and artifacts resulted. The tile buffer is already the correct size and calculating the size for `per_sample_output_buffers` is a bit difficult with the current layout of the code. As `per_sample_output_buffers` was only needed for `sum_all_radiance`, removing that kernel and writing output to the tile buffer directly fixes the artifacts.
2017-03-08	Cycles: Split path initialization into own kernel	Mai Lavelle
	This makes it easier to initialize things correctly in the data_init kernel before they are needed by path tracing.
2017-03-08	Cycles: CPU implementation of split kernel	Mai Lavelle

2017-03-08	Cycles: Remove ccl_fetch and SOA	Mai Lavelle

2017-03-08	Cycles: OpenCL split kernel refactor	Mai Lavelle
	This does a few things at once: - Refactors host side split kernel logic into a new device agnostic class `DeviceSplitKernel`. - Removes tile splitting, a new work pool implementation takes its place and allows as many threads as will fit in memory regardless of tile size, which can give performance gains. - Refactors split state buffers into one buffer, as well as reduces the number of arguments passed to kernels. Means there's less code to deal with overall. - Moves kernel logic out of OpenCL kernel files so they can later be used by other device types. - Replaced OpenCL specific APIs with new generic versions - Tiles can now be seen updating during rendering
2016-09-19	Cycles: Cleanup code style in split kernel	Sergey Sharybin

2016-05-23	Cycles CUDA: reduce stack memory by reusing ShaderData.	Brecht Van Lommel
	57% less for path and 48% less for branched path.
2016-01-30	Cycles: Cleanup of OpenCL split kernel routines	Sergey Sharybin
	The idea is to switch from allocating separate buffers for shader data's structure of arrays to allocating one huge memory block and do some index trickery to make it accessed as SOA. This saves quite reasonable amount of lines of code in device_opencl and also makes it possible to get rid of special declaration of ShaderData structure. As a side effect it also makes it easier to experiment with SOA vs. AOS for split kernel. Works fine here on NVidia GTX580, Intel CPU amd AMD Fiji cards. Reviewers: #cycles, brecht, juicyfruit, dingto Differential Revision: https://developer.blender.org/D1593
2016-01-28	Cycles: Remove few function arguments needed only for the split kernel	Sergey Sharybin
	Use KernelGlobals to access all the global arrays for the intermediate storage instead of passing all this storage things explicitly. Tested here with Intel OpenCL, NVIDIA GTX580 and AMD Fiji, didn't see any artifacts, so guess it's all good. Reviewers: juicyfruit, dingto, lukasstockner97 Differential Revision: https://developer.blender.org/D1736
2016-01-07	Cycles: Refactor how we pass bounce info to light path node.	Thomas Dinges
	This commit changes the way how we pass bounce information to the Light Path node. Instead of manualy copying the bounces into ShaderData, we now directly pass PathState. This reduces the arguments that we need to pass around and also makes it easier to extend the feature. This commit also exposes the Transmission Bounce Depth to the Light Path node. It works similar to the Transparent Depth Output: Replace a Transmission lightpath after X bounces with another shader, e.g a Diffuse one. This can be used to avoid black surfaces, due to low amount of max bounces. Reviewed by Sergey and Brecht, thanks for some hlp with this. I tested compilation and usage on CPU (SVM and OSL), CUDA, OpenCL Split and Mega kernel. Hopefully this covers all devices. :)
2015-10-29	Cycles: OpenCL split kernel cleanup, move casts from .h files to .cl files	Sergey Sharybin
	Ideally we shouldn't use char* at all, but for now we have to, so at least let's assume common .h files are free from pointer magic.
2015-08-23	Cleanup: spelling/style	Campbell Barton

2015-07-03	Cycles: Code cleanup in split kernel, whitespaces	Sergey Sharybin

2015-05-27	Cycles: Code cleanup, split kernel	Sergey Sharybin

2015-05-26	Fix T44833: Can't use ccl_local space in non-kernel functions	Sergey Sharybin
	This commit re-shuffles code in split kernel once again and makes it so common parts which is in the headers is only responsible to making all the work needed for specified ray index. Getting ray index, checking for it's validity and enqueuing tasks are now happening in the device specified part of the kernel. This actually makes sense because enqueuing is indeed device-specified and i.e. with CUDA we'll want to enqueue kernels from kernel and avoid CPU roundtrip. TODO: - Kernel comments are still placed in the common header files, but since queue related stuff is not passed to those functions those comments might need to be split as well. Just currently read them considering that they're also covering the way how all devices are invoking the common code path. - Arguments might need to be wrapped into KernelGlobals, so we don't ened to pass all them around as function arguments.
2015-05-25	Fix T44833, OpenCL compile error on AMD.	Thomas Dinges
	This was broken after the kernel file restructure. Variables allocated in the __local address space can only be defined inside a __kernel function. We probably need to solve this a bit differently once we do the CUDA kernel split, but this fix shoud be good enough until then.
2015-05-22	Cycles: Restructure kernel files organization	Sergey Sharybin
	Since the kernel split work we're now having quite a few of new files, majority of which are related on the kernel entry points. Keeping those files in the root kernel folder will eventually make it really hard to follow which files are actual implementation of Cycles kernel. Those files are now moved to kernel/kernels/<device_type>. This way adding extra entry points will be less noisy. It is also nice to have all device-specific files grouped together. Another change is in the way how split kernel invokes logic. Previously all the logic was implemented directly in the .cl files, which makes it a bit tricky to re-use the logic across other devices. Since we'll likely be looking into doing same split work for CUDA devices eventually it makes sense to move logic from .cl files to header files. Those files are stored in kernel/split. This does not mean the header files will not give error messages when tried to be included from other devices and their arguments will likely be changed, but having such separation is a good start anyway. There should be no functional changes. Reviewers: juicyfruit, dingto Differential Revision: https://developer.blender.org/D1314