Welcome to mirror list, hosted at ThFree Co, Russian Federation.

git.blender.org/blender.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-03-18Cycles: support for different 3D transform per volume gridBrecht Van Lommel
This is not yet fully supported by automatic volume bounds but works fine in most cases that will have mostly matching bounds. Ref T73201
2020-03-11Cleanup: stop encoding image data type in slot indexBrecht Van Lommel
This is legacy code from when we had a fixed number of textures.
2020-03-05Adaptive Sampling for Cycles.Stefan Werner
This feature takes some inspiration from "RenderMan: An Advanced Path Tracing Architecture for Movie Rendering" and "A Hierarchical Automatic Stopping Condition for Monte Carlo Global Illumination" The basic principle is as follows: While samples are being added to a pixel, the adaptive sampler writes half of the samples to a separate buffer. This gives it two separate estimates of the same pixel, and by comparing their difference it estimates convergence. Once convergence drops below a given threshold, the pixel is considered done. When a pixel has not converged yet and needs more samples than the minimum, its immediate neighbors are also set to take more samples. This is done in order to more reliably detect sharp features such as caustics. A 3x3 box filter that is run periodically over the tile buffer is used for that purpose. After a tile has finished rendering, the values of all passes are scaled as if they were rendered with the full number of samples. This way, any code operating on these buffers, for example the denoiser, does not need to be changed for per-pixel sample counts. Reviewed By: brecht, #cycles Differential Revision: https://developer.blender.org/D4686
2019-04-17ClangFormat: apply to source, most of internCampbell Barton
Apply clang format as proposed in T53211. For details on usage and instructions for migrating branches without conflicts, see: https://wiki.blender.org/wiki/Tools/ClangFormat
2019-03-08Cycles OpenCL: Remove single programJeroen Bakker
Part of the cleanup of the OpenCL codebase. Single program is not effective when using OpenCL, it is slower to compile and slower during rendering (when used in for example `barbershop` or `victor`). Reviewers: brecht, #cycles Maniphest Tasks: T62267 Differential Revision: https://developer.blender.org/D4481
2019-02-26T61576: Do Not (Re-)Compile OpenCL kernelsJeroen Bakker
The goal of this patch is to have limit the number of times kernels needs to be compiled and are reused as kernels with different compile directives can lead to identical same binaries. The implementation does this by stripping the compile directives. and reshuffling kernels so the output is more likely to be the same. We focussed on the kernels where it was easy to detect and maintain (bundle, bake, displace, do_volume and background). More optimizations could be done but they are probably less obvious. Merged the data_init and state_buffer_size kernels to split_bundle. This patch will also remove empty kernels for do_volume and bake when their features are not enabled. When using the benchmark files there are less background, bake and do_volume kernels compiled. Fix: T61576, T61501, T61466 Reviewed By: brecht, #cycles Differential Revision: https://developer.blender.org/D4390
2019-02-21Fix: Missing closing brackets in includeJeroen Bakker
2019-02-21Fix: OpenCL Displacement and light samplingJeroen Bakker
The bake kernels are also used during mesh displacement and light importance sampling. We disabled the implementation of these kernels when baking was not enabled.
2019-02-20Cycles OpenCL: Remove OpenCL MegaKernelJeroen Bakker
Using OpenCL MegaKernel has been slow and therefore not usefull. This patch will remove the mega kernel from the OpenCL codebase and the OpenCLDeviceBase class. T61736: removal of mega kernel T61703: baking does not work with mega kernel Tags: #cycles Differential Revision: https://developer.blender.org/D4383
2019-02-19T61463: Separate Baking kernelsJeroen Bakker
Cycles OpenCL: Split baking kernels in own program Fix T61463. Before this patch baking was part of the base kernels. There are 3 baking kernels that and all 3 uses shader evaluation. Only for one of these kernels the functionality was wrapped in the __NO_BAKING__ compile directive. When you start baking this leads to long compile times. By separating in individual programs will reduce the compile times. Also wrapped all baking kernels with __NO_BAKING__ to reduce the compilation times. Impact on compilation time job | scene_name | previous | new | percentage --------+-----------------+----------+-------+------------ T61463 | empty | 10.63 | 7.27 | 32% T61463 | bmw | 17.91 | 14.24 | 20% T61463 | fishycat | 19.57 | 15.08 | 23% T61463 | barbershop | 54.10 | 48.18 | 11% T61463 | classroom | 17.55 | 14.42 | 18% T61463 | koro | 18.92 | 17.15 | 9% T61463 | pavillion | 17.43 | 14.23 | 18% T61463 | splash279 | 16.48 | 15.33 | 7% T61463 | volume_emission | 36.22 | 34.19 | 6% Impact on render time job | scene_name | previous | new | percentage --------+-----------------+----------+---------+------------ T61463 | empty | 21.06 | 20.54 | 2% T61463 | bmw | 198.44 | 189.59 | 4% T61463 | fishycat | 394.20 | 388.50 | 1% T61463 | barbershop | 1188.16 | 1185.49 | 0% T61463 | classroom | 341.08 | 339.27 | 1% T61463 | koro | 472.43 | 360.70 | 24% T61463 | pavillion | 905.77 | 902.14 | 0% T61463 | splash279 | 55.26 | 54.92 | 1% T61463 | volume_emission | 62.59 | 39.09 | 38% I don't have a grounded explanation why koro and volume_emission is this much faster; I have done several tests though... Maniphest Tasks: T61463 Differential Revision: https://developer.blender.org/D4376
2019-02-15Cycles: Support multithreaded compilation of kernelsBrecht Van Lommel
This patch implements a workaround to get the multithreaded compilation from D2231 working. So far, it only works for Blender, not for Cycles Standalone. Also, I have only tested the Linux codepath in the helper function. Depends on D2231. Patch by lukasstockner97, jbakker, brecht job | scene_name | compilation_time ----------+-----------------+------------------ Baseline | empty | 22.73 D2264 | empty | 13.94 Baseline | bmw | 56.44 D2264 | bmw | 41.32 Baseline | fishycat | 59.50 D2264 | fishycat | 45.19 Baseline | barbershop | 212.28 D2264 | barbershop | 169.81 Baseline | victor | 67.51 D2264 | victor | 53.60 Baseline | classroom | 51.46 D2264 | classroom | 39.02 Baseline | koro | 62.48 D2264 | koro | 49.03 Baseline | pavillion | 54.37 D2264 | pavillion | 38.82 Baseline | splash279 | 47.43 D2264 | splash279 | 37.94 Baseline | volume_emission | 145.22 D2264 | volume_emission | 121.10 This patch reduced compilation time as the split kernels and base kernels are compiled in parallel. In cycles debug mode (256) you can set unmark the opencl single program file, what reduces the compilation time even further (bmw 17 seconds, barbershop 53 seconds). Reviewers: brecht, dingto, sergey, juicyfruit, lukasstockner97 Reviewed By: brecht Subscribers: Loner, jbakker, candreacchio, 3dLuver, LazyDodo, bliblubli Differential Revision: https://developer.blender.org/D2264
2019-02-06Cycles: animation denoising support in the kernel.Lukas Stockner
This is the internal implementation, not available from the API or interface yet. The algorithm takes into account past and future frames, both to get more coherent animation and reduce noise. Ref D3889.
2019-02-06Cycles: prefilter feature passes separate from denoising.Lukas Stockner
Prefiltering of feature passes will happen during rendering, which can then be used for denoising immediately or written as a render pass for later (animation) denoising. The number of denoising data passes written is reduced because of this, leaving out the feature variance passes. The passes are now Normal, Albedo, Depth, Shadowing, Variance and Intensity. Ref D3889.
2018-11-09Cycles: Cleanup, space after (void)Sergey Sharybin
It was used in like 95% of places.
2018-10-28Cycles: Added Cryptomatte output.Stefan Werner
This allows for extra output passes that encode automatic object and material masks for the entire scene. It is an implementation of the Cryptomatte standard as introduced by Psyop. A good future extension would be to add a manifest to the export and to do plenty of testing to ensure that it is fully compatible with other renderers and compositing programs that use Cryptomatte. Internally, it adds the ability for Cycles to have several passes of the same type that are distinguished by their name. Differential Revision: https://developer.blender.org/D3538
2018-10-08Cycles: Use existing shared temporary memory in reconstruction step of the ↵Lukas Stockner
denoiser Previously the code allocated its own temporary memory, but it's possible to just use the existing shared one instead.
2018-07-06Cycles: Enabled half precision textures for OpenCL devices that support the ↵Stefan Werner
cl_khr_fp16 extension.
2018-07-06Cleanup: strip trailing space for cyclesCampbell Barton
2018-07-05Cycles: Adding native support for UINT16 textures.Stefan Werner
Textures in 16 bit integer format are sometimes used for displacement, bump and normal maps and can be exported by tools like Substance Painter. Without this patch, Cycles would promote those textures to single precision floating point, causing them to take up twice as much memory as needed. Reviewers: #cycles, brecht, sergey Reviewed By: #cycles, brecht, sergey Subscribers: sergey, dingto, #cycles Tags: #cycles Differential Revision: https://developer.blender.org/D3523
2018-07-04Cycles Denoising: Pass tile buffers to every OpenCL kernel to conform to ↵Lukas Stockner
standard and get rid of set_tile_info
2018-07-04Cycles Denoising: Cleanup: Rename tiles to tile_infoLukas Stockner
2018-06-14Cycles: Query XYZ to/from Scene Linear conversion from OCIO instead of ↵Lukas Stockner
assuming sRGB I've limited it to just the RGB<->XYZ stuff for now, correct image handling is the next step. Reviewers: brecht, sergey Differential Revision: https://developer.blender.org/D3478
2017-11-30Cycles: Improve denoising speed on GPUs with small tile sizesLukas Stockner
Previously, the NLM kernels would be launched once per offset with one thread per pixel. However, with the smaller tile sizes that are now feasible, there wasn't enough work to fully occupy GPUs which results in a significant slowdown. Therefore, the kernels are now launched in a single call that handles all offsets at once. This has two downsides: Memory accesses to accumulating buffers are now atomic, and more importantly, the temporary memory now has to be allocated for every shift at once, increasing the required memory. On the other hand, of course, the smaller tiles significantly reduce the size of the memory. The main bottleneck right now is the construction of the transformation - there is nothing to be parallelized there, one thread per pixel is the maximum. I tried to parallelize the SVD implementation by storing the matrix in shared memory and launching one block per pixel, but that wasn't really going anywhere. To make the new code somewhat readable, the handling of rectangular regions was cleaned up a bit and commented, it should be easier to understand what's going on now. Also, some variables have been renamed to make the difference between buffer width and stride more apparent, in addition to some general style cleanup.
2017-10-15Fix OpenCL performance regression after cubic interpolation.Brecht Van Lommel
Reorganize code to reduce register pressure.
2017-10-08Cycles: OpenCL bicubic and tricubic texture interpolation support.Brecht Van Lommel
2017-10-07Cycles: CUDA bicubic and tricubic texture interpolation support.Brecht Van Lommel
While cubic interpolation is quite expensive on the CPU compared to linear interpolation, the difference on the GPU is quite small.
2017-10-07Code refactor: make texture code more consistent between devices.Brecht Van Lommel
* Use common TextureInfo struct for all devices, except CUDA fermi. * Move image sampling code to kernels/*/kernel_*_image.h files. * Use arrays for data textures on Fermi too, so device_vector<Struct> works.
2017-10-05Code refactor: split displace/background into separate kernels, remove luma.Brecht Van Lommel
2017-10-04Code refactor: use split variance calculation for mega kernels too.Brecht Van Lommel
There is no significant difference in denoised benchmark scenes and denoising ctests, so might as well make it all consistent.
2017-10-04Code refactor: remove rng_state buffer and compute hash on the fly.Brecht Van Lommel
A little faster on some benchmark scenes, a little slower on others, seems about performance neutral on average and saves a little memory.
2017-10-04Code refactor: add WorkTile struct for passing work to kernel.Brecht Van Lommel
This makes sharing some code between mega/split in following commits a bit easier, and also paves the way for rendering multiple tiles later.
2017-09-25Cycles: Cleanup, indentationSergey Sharybin
2017-08-09Cycles: Remove ulong usageSergey Sharybin
This is a bit confusing, especially when one mixes OpenCL code where ulong equals to uint64_t with CPU side code where ulong is expected to be something else from the naming. This commit makes it so we use explicit name, common on all platforms.
2017-08-08Cycles: Pack kernel textures into buffers for OpenCLMai Lavelle
Image textures were being packed into a single buffer for OpenCL, which limited the amount of memory available for images to the size of one buffer (usually 4gb on AMD hardware). By packing textures into multiple buffers that limit is removed, while simultaneously reducing the number of buffers that need to be passed to each kernel. Benchmarks were within 2%. Fixes T51554. Differential Revision: https://developer.blender.org/D2745
2017-08-02Cycles: Support "precompiled" headers in include expansion algorithmSergey Sharybin
The idea here is that it is possible to mark certain include statements as "precompiled" which means all subsequent includes of that file will be replaced with an empty string. This is a way to deal with tricky include pattern happening in single program OpenCL split kernel which was including bunch of headers about 10 times. This brings preprocessing time from ~1sec to ~0.1sec on my laptop.
2017-06-10Cycles: Pass all buffers to each kernel call for OpenCLMai Lavelle
Technically not passing all buffers used by a kernel is undefined behavior. We haven't had any issues with this so far on AMD or Nvidia, but it's known to be a problem with Intel and we received a report from AMD that this is a problem on newer hardware, so we need to make this change at some point. Unfortunately there a cost to being correct, about 5% for the benchmark scenes. For low sample counts it's even worse, I've seen up to 50% slowdown. For the latter case I think adjusting tile updating logic can help, but not sure what that would look like yet (it would be just a few lines change however).
2017-06-10Cycles: Add kernel to enqueue inactive raysMai Lavelle
The queue will be used to make reuse of inactive threads to keep the GPU more busy.
2017-06-09Cycles Denoising: Merge outlier heuristic and confidence interval testLukas Stockner
The previous outlier heuristic only checked whether the pixel is more than twice as bright compared to the 75% quantile of the 5x5 neighborhood. While this detected fireflies robustly, it also incorrectly marked a lot of legitimate small highlights as outliers and filtered them away. This commit adds an additional condition for marking a pixel as a firefly: In addition to being above the reference brightness, the lower end of the 3-sigma confidence interval has to be below it. Since the lower end approximates how low the true value of the pixel might be, this test separates pixels that are supposed to be very bright from pixels that are very bright due to random fireflies. Also, since there is now a reliable outlier filter as a preprocessing step, the additional confidence interval test in the reconstruction kernel is no longer needed.
2017-05-19Cycles: Cleanup, variable namesSergey Sharybin
Don't use camel case for variable names. Leave that for the structures.
2017-05-19Cycles: Cleanup, braces after function definitionSergey Sharybin
I wouldn't mind switching fully to Google style, but i am against of mixing two different styles in same project. So just stick to brace at the new line after function definition.
2017-05-19\0;115;0cCycles: Cleanup, use ccl_restrict instead of ccl_restrict_ptrSergey Sharybin
There were following issues with ccl_restrict_ptr: - We already had ccl_restrict for all platforms. - It was secretly adding `const` qualifier to the declaration, which is quite weird since non-const pointer can also be declared as restricted. - We never in Blender are using foo_ptr or FooPtr type definitions, so not sure why we should introduce such a thing here. - It is absolutely wrong from semantic point of view to put pointer into the restrict macro -- const is a part of type, not part of hint for compiler that some pointer is never aliased.
2017-05-18Cycles Denoising: Add more robust outlier heuristic to avoid artifactsLukas Stockner
Extremely bright pixels in the rendered image cause the denoising algorithm to produce extremely noticable artifacts. Therefore, a heuristic is needed to exclude these pixels from the filtering process. The new approach calculates the 75% percentile of the 5x5 neighborhood of each pixel and flags the pixel if it is more than twice as bright. During the reconstruction process, flagged pixels are skipped. Therefore, they don't cause any problems for neighboring pixels, and the outlier pixels themselves are replaced by a prediction of their actual value based on their feature pass values and the neighboring pixels. Therefore, the denoiser now also works as a smarter despeckling filter that uses a more accurate prediction of the pixel instead of a simple average. This can be used even if denoising isn't wanted by setting the denoising radius to 1.
2017-05-07Cycles: Implement denoising option for reducing noise in the rendered imageLukas Stockner
This commit contains the first part of the new Cycles denoising option, which filters the resulting image using information gathered during rendering to get rid of noise while preserving visual features as well as possible. To use the option, enable it in the render layer options. The default settings fit a wide range of scenes, but the user can tweak individual settings to control the tradeoff between a noise-free image, image details, and calculation time. Note that the denoiser may still change in the future and that some features are not implemented yet. The most important missing feature is animation denoising, which uses information from multiple frames at once to produce a flicker-free and smoother result. These features will be added in the future. Finally, thanks to all the people who supported this project: - Google (through the GSoC) and Theory Studios for sponsoring the development - The authors of the papers I used for implementing the denoiser (more details on them will be included in the technical docs) - The other Cycles devs for feedback on the code, especially Sergey for mentoring the GSoC project and Brecht for the code review! - And of course the users who helped with testing, reported bugs and things that could and/or should work better!
2017-05-03Cycles: Split kernel - sort shadersHristo Gueorguiev
Reduce thread divergence in kernel_shader_eval. Rays are sorted in blocks of 2048 according to shader->id. On R9 290 Classroom is ~30% faster, and Pabellon Barcelone is ~8% faster. No sorting for CUDA split kernel. Reviewers: sergey, maiself Reviewed By: maiself Differential Revision: https://developer.blender.org/D2598
2017-05-02Cycles: Branched path tracing for the split kernelMai Lavelle
This implements branched path tracing for the split kernel. General approach is to store the ray state at a branch point, trace the branched ray as normal, then restore the state as necessary before iterating to the next part of the path. A state machine is used to advance the indirect loop state, which avoids the need to add any new kernels. Each iteration the state machine recreates as much state as possible from the stored ray to keep overall storage down. Its kind of hard to keep all the different integration loops in sync, so this needs lots of testing to make sure everything is working correctly. We should probably start trying to deduplicate the integration loops more now. Nonbranched BMW is ~2% slower, while classroom is ~2% faster, other scenes could use more testing still. Reviewers: sergey, nirved Reviewed By: nirved Subscribers: Blendify, bliblubli Differential Revision: https://developer.blender.org/D2611
2017-03-29Cycles: Make all #include statements relative to cycles source directorySergey Sharybin
The idea is to make include statements more explicit and obvious where the file is coming from, additionally reducing chance of wrong header being picked up. For example, it was not obvious whether bvh.h was refferring to builder or traversal, whenter node.h is a generic graph node or a shader node and cases like that. Surely this might look obvious for the active developers, but after some time of not touching the code it becomes less obvious where file is coming from. This was briefly mentioned in T50824 and seems @brecht is fine with such explicitness, but need to agree with all active developers before committing this. Please note that this patch is lacking changes related on GPU/OpenCL support. This will be solved if/when we all agree this is a good idea to move forward. Reviewers: brecht, lukasstockner97, maiself, nirved, dingto, juicyfruit, swerner Reviewed By: lukasstockner97, maiself, nirved, dingto Subscribers: brecht Differential Revision: https://developer.blender.org/D2586
2017-03-16Cycles: Define ccl_local variables in kernel functionsSergey Sharybin
Declaring ccl_local in a device function is not supported by certain compilers.
2017-03-16Cycles: Workaround for compilation error caused by passing KernelGlobalsSergey Sharybin
Pass globals as a bare pointer, same as it sued to be prior to split kernel rework. AMD CPU platform and Intel OpenCL were complaining about this. Perhaps we shouldn't pass globals as pointer at all, this isn't something what is really portable and can cause issues on 32 bit perhaps.
2017-03-11Fix T50888: Numeric overflow in split kernel state buffer size calculationMai Lavelle
Overflow led to the state buffer being too small and the split kernel to get stuck doing nothing forever.
2017-03-09Cycles: split kernel_shadow_blocked to AO & DL partsHristo Gueorguiev
Reduces memory allocation for split kernel. This allows for faster rendering due to bigger global size, specially when GPU memory is limited. Perfromance results: R9 290 total render time Before After Change BMW 4:37 4:34 -1.1 % Classroom 14:43 14:30 -1.5 % Fishy Cat 11:20 11:04 -2.4 % Koro 12:11 12:04 -1.0 % Pabellon Barcelona 22:01 20:44 -5.8 % Pabellon Barcelona(*) 15:32 15:09 -2.5 % (*) without glossy connected to volume