git.blender.org/blender.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2018-11-23	Cycles: Improved robustness of hair motion blur.motion_curve_fix	Stefan Werner
	In some instances, the number of control vertices of a hair could change mid-frame. Cycles would then be unable to calculate proper motion blur for those hairs. This adds interpolated CVs to fill in for the missing data. While this will not necessarily result in a fully accurate reconstruction of the guide hair, it preserves motion blur instead of disabling it. Reviewers: #cycles, sergey Reviewed By: #cycles, sergey Subscribers: sergey, brecht, #cycles Tags: #cycles Differential Revision: https://developer.blender.org/D3695
2018-08-25	Cycles Denoiser: Allocate a single temporary buffer for the entire denoising ↵	Lukas Stockner
	process With small tiles, the repeated allocations on GPUs can actually slow down the denoising quite a lot. Allocating the buffer just once reduces rendertime for the default cube with 16x16 tiles and denoising on a mobile 1050 from 22.7sec to 14.0sec.
2018-07-05	Cycles: Adding native support for UINT16 textures.	Stefan Werner
	Textures in 16 bit integer format are sometimes used for displacement, bump and normal maps and can be exported by tools like Substance Painter. Without this patch, Cycles would promote those textures to single precision floating point, causing them to take up twice as much memory as needed. Reviewers: #cycles, brecht, sergey Reviewed By: #cycles, brecht, sergey Subscribers: sergey, dingto, #cycles Tags: #cycles Differential Revision: https://developer.blender.org/D3523
2018-07-04	Cycles Denoising: Pass tile buffers to every OpenCL kernel to conform to ↵	Lukas Stockner
	standard and get rid of set_tile_info
2018-07-04	Cycles Denoising: Cleanup: Rename tiles to tile_info	Lukas Stockner

2018-07-04	Cycles Denoising: Refactor denoiser tile handling	Lukas Stockner
	This deduplicates the calls for tile (un)mapping and allows to have a target buffer that is different from the source buffer (needed for baking and animation denoising).
2018-07-04	Cycles Denoising: Split main function into logical steps	Lukas Stockner

2018-06-21	Fix Cycles CUDA render errors with CUDA 9.2.	Brecht Van Lommel
	Work around what might be a compiler bug.
2018-06-12	Fix T55448: Typo in Cycles CUDA debug output	Lukas Stockner
	Reviewers: sergey, lukasstockner97 Reviewed By: lukasstockner97 Tags: #cycles, #bf_blender Differential Revision: https://developer.blender.org/D3472
2018-04-29	Cycles: Cleanup: Remove double semicolons	Lukas Stockner

2018-02-18	Cycles: tweak CUDA messages and avoid build errors with existing sm_2x configs.	Brecht Van Lommel

2018-02-18	Code cleanup: remove some more unused code after recent CUDA changes.	Brecht Van Lommel

2018-02-18	Cycles: Remove Fermi texture code.	Thomas Dinges
	This should be the last Fermi removal commit, unless I missed something. It's been a pleasure Fermi!
2018-02-17	Cycles: Remove Fermi support from CMake and update runtime checks in ↵	Thomas Dinges
	device_cuda.cpp. Fermi code in Cycles kernel and texture system are coming next.
2018-02-07	Update CUEW to latest version	Brecht Van Lommel
	This brings separate initialization for libcuda and libnvrtc, which fixes Cycles nvrtc compilation not working on build machines without CUDA hardware available. Differential Revision: https://developer.blender.org/D3045
2018-02-03	cycles: Add an nvrtc based cubin cli compiler.	Ray Molenkamp
	nvcc is very picky regarding compiler versions, severely limiting the compiler we can use, this commit adds a nvrtc based compiler that'll allow us to build the cubins even if the host compiler is unsupported. for details see D2913. Differential Revision: http://developer.blender.org/D2913
2018-01-22	Cycles: Replace use_qbvh boolean flag with an enum-based property	Sergey Sharybin
	This was we can introduce other types of BVH, for example, wider ones, without causing too much mess around boolean flags. Thoughs: - Ideally device info should probably return bitflag of what BVH types it supports. It is possible to implement based on simple logic in device/ and mesh.cpp, rest of the changes will stay the same. - Not happy with workarounds in util_debug and duplicated enum in kernel. Maybe enbum should be stores in kernel, but then it's kind of weird to include kernel types from utils. Soudns some cyclkic dependency. Reviewers: brecht, maxim_d33 Reviewed By: brecht Differential Revision: https://developer.blender.org/D3011
2018-01-11	Fix issue with moving CUDA memory to host and multiple devices.	Brecht Van Lommel
	This is not expected to fix all issues. Also adds some more details to error reporting to investigate failures.
2018-01-03	Cycles: CUDA support for rendering scenes that don't fit on GPU.	Brecht Van Lommel
	In that case it can now fall back to CPU memory, at the cost of reduced performance. For scenes that fit in GPU memory, this commit should not cause any noticeable slowdowns. We don't use all physical system RAM, since that can cause OS instability. We leave at least half of system RAM or 4GB to other software, whichever is smaller. For image textures in host memory, performance was maybe 20-30% slower in our tests (although this is highly hardware and scene dependent). Once other type of data doesn't fit on the GPU, performance can be e.g. 10x slower, and at that point it's probably better to just render on the CPU. Differential Revision: https://developer.blender.org/D2056
2018-01-03	Cycles: make CUDA code a bit more robust to host/device alloc failures.	Brecht Van Lommel
	Fixes a few corner cases found while stress testing host mapped memory.
2017-12-20	Cycles: Cleanup, indentation	Sergey Sharybin

2017-11-30	Cycles: Improve denoising speed on GPUs with small tile sizes	Lukas Stockner
	Previously, the NLM kernels would be launched once per offset with one thread per pixel. However, with the smaller tile sizes that are now feasible, there wasn't enough work to fully occupy GPUs which results in a significant slowdown. Therefore, the kernels are now launched in a single call that handles all offsets at once. This has two downsides: Memory accesses to accumulating buffers are now atomic, and more importantly, the temporary memory now has to be allocated for every shift at once, increasing the required memory. On the other hand, of course, the smaller tiles significantly reduce the size of the memory. The main bottleneck right now is the construction of the transformation - there is nothing to be parallelized there, one thread per pixel is the maximum. I tried to parallelize the SVD implementation by storing the matrix in shared memory and launching one block per pixel, but that wasn't really going anywhere. To make the new code somewhat readable, the handling of rectangular regions was cleaned up a bit and commented, it should be easier to understand what's going on now. Also, some variables have been renamed to make the difference between buffer width and stride more apparent, in addition to some general style cleanup.
2017-11-17	Cycles: Add per-tile render time debug pass	Lukas Stockner
	Reviewers: sergey, brecht Differential Revision: https://developer.blender.org/D2920
2017-11-12	Fix T53289: CUDA missing textures not showing pink, after recent changes.	Brecht Van Lommel

2017-11-09	Cycles: avoid reallocating tile denoising memory many times during render.	Brecht Van Lommel

2017-11-09	Cycles: Replace __MAX_CLOSURE__ build option with runtime integrator variable	Mai Lavelle
	Goal is to reduce OpenCL kernel recompilations. Currently viewport renders are still set to use 64 closures as this seems to be faster and we don't want to cause a performance regression there. Needs to be investigated. Reviewed By: brecht Differential Revision: https://developer.blender.org/D2775
2017-11-08	Cycles: add an extra CUDA synchronize before rendering.	Brecht Van Lommel
	It should not be needed as far as I know, but just in case it fixes any of the recent issues like T52572.
2017-11-05	Code refactor: device memory cleanups, preparing for mapped host memory.	Brecht Van Lommel

2017-11-05	Cycles: reserve CUDA local memory ahead of time.	Brecht Van Lommel
	This way we can log the amount of memory used, and it will be important for host mapped memory support.
2017-11-04	Code refactor: replace CUDA array with linear memory for 1D and 2D textures.	Brecht Van Lommel
	This is a prequisite for getting host memory allocation to work. There appears to be no support for 3D textures using host memory. The original version of this code was written by Stefan Werner for D2056.
2017-11-03	Fix T53247: mixed CPU + GPU render wrong texture limits.	Brecht Van Lommel

2017-10-24	Code refactor: move more memory allocation logic into device API.	Brecht Van Lommel
	* Remove tex_* and pixels_* functions, replace by mem_. Add MEM_TEXTURE and MEM_PIXELS as memory types recognized by devices. * No longer create device_memory and call mem_* directly, always go through device_only_memory, device_vector and device_pixels.
2017-10-24	Code refactor: use device_only_memory and device_vector in more places.	Brecht Van Lommel

2017-10-24	Code refactor: store device/interp/extension/type in each device_memory.	Brecht Van Lommel

2017-10-21	Code refactor: avoid some unnecessary device memory copying.	Brecht Van Lommel

2017-10-19	Cycles: Add extra logging in CUDA device detection code	Sergey Sharybin

2017-10-08	Cycles: schedule more work for non-display and compute preemption CUDA cards.	Brecht Van Lommel
	This change affects CUDA GPUs not connected to a display or connected to a display but supporting compute preemption so that the display does not freeze. I couldn't find an official list, but compute preemption seems to be only supported with GTX 1070+ and Linux (not GTX 1060- or Windows). This helps improve small tile rendering performance further if there are sufficient samples x number of pixels in a single tile to keep the GPU busy.
2017-10-08	Code refactor: use DeviceInfo to enable QBVH and decoupled volume shading.	Brecht Van Lommel

2017-10-07	Code refactor: make texture code more consistent between devices.	Brecht Van Lommel
	* Use common TextureInfo struct for all devices, except CUDA fermi. * Move image sampling code to kernels//kernel__image.h files. * Use arrays for data textures on Fermi too, so device_vector<Struct> works.
2017-10-05	Code refactor: split displace/background into separate kernels, remove luma.	Brecht Van Lommel

2017-10-05	Fix incorrect CUDA remaining time estimate after previous commit.	Brecht Van Lommel

2017-10-04	Cycles: CUDA faster rendering of small tiles, using multiple samples like ↵	Brecht Van Lommel
	OpenCL. The work size is still very conservative, and this doesn't help for progressive refine. For that we will need to render multiple tiles at the same time. But this should already help for denoising renders that require too much memory with big tiles, and just generally soften the performance dropoff with small tiles. Differential Revision: https://developer.blender.org/D2856
2017-10-04	Code refactor: use split variance calculation for mega kernels too.	Brecht Van Lommel
	There is no significant difference in denoised benchmark scenes and denoising ctests, so might as well make it all consistent.
2017-10-04	Code refactor: remove rng_state buffer and compute hash on the fly.	Brecht Van Lommel
	A little faster on some benchmark scenes, a little slower on others, seems about performance neutral on average and saves a little memory.
2017-10-04	Code refactor: add WorkTile struct for passing work to kernel.	Brecht Van Lommel
	This makes sharing some code between mega/split in following commits a bit easier, and also paves the way for rendering multiple tiles later.
2017-09-27	Code refactor: simplify CUDA context push/pop.	Brecht Van Lommel
	Makes it possible to call a function like mem_alloc() when the context is already active. Also fixes some missing pops in case of errors.
2017-08-21	Cycles: attempt to recover from crashing CUDA/OpenCL drivers on Windows.	Brecht Van Lommel
	I don't know if this will actually work, needs testing. Ref T52064.
2017-08-08	Cycles: Pack kernel textures into buffers for OpenCL	Mai Lavelle
	Image textures were being packed into a single buffer for OpenCL, which limited the amount of memory available for images to the size of one buffer (usually 4gb on AMD hardware). By packing textures into multiple buffers that limit is removed, while simultaneously reducing the number of buffers that need to be passed to each kernel. Benchmarks were within 2%. Fixes T51554. Differential Revision: https://developer.blender.org/D2745
2017-08-05	Cycles: CUDA split performance tweaks, still far from megakernel.	Brecht Van Lommel
	On Pabellon, 25.8s mega, 35.4s split before, 32.7s split after.
2017-07-05	Cycles: Pass string by const reference rather than by value	Sergey Sharybin
	Some of the functions might have been inlined, but others i don't see how that was possible (don't think virtual functions can be inlined here). In any case, better be explicitly optimal in the code.