git.blender.org/blender.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2017-11-05	Cycles: reserve CUDA local memory ahead of time.	Brecht Van Lommel
	This way we can log the amount of memory used, and it will be important for host mapped memory support.
2017-11-04	Code refactor: replace CUDA array with linear memory for 1D and 2D textures.	Brecht Van Lommel
	This is a prequisite for getting host memory allocation to work. There appears to be no support for 3D textures using host memory. The original version of this code was written by Stefan Werner for D2056.
2017-11-03	Fix T53247: mixed CPU + GPU render wrong texture limits.	Brecht Van Lommel

2017-11-02	Cycles: Add another limit to OpenCL memory usage	Mai Lavelle
	Some drivers may report very large allocation sizes, which could cause unnecessary memory usage. This is now limited to 2gb which should still be enough to get the needed performance benefits without waste.
2017-10-25	Fix one more assert being triggered due to recent changes.	Brecht Van Lommel

2017-10-25	Code refactor: remove MEM_WRITE_ONLY, always use MEM_READ_WRITE.	Brecht Van Lommel
	It's unlikely the driver can do useful optimizations with this, and if we sum multiple samples we are reading from the memory anyway.
2017-10-24	Fix T53146: incomplete multi GPU and CPU + GPU memory statistics.	Brecht Van Lommel
	Part due to recent changes, part old bug.
2017-10-24	Cycles: Fix compilation in debug mode	Sergey Sharybin
	Please check compilation before committing refactor changes!
2017-10-24	Cycles: Fix compilation error without C++11	Sergey Sharybin

2017-10-24	Fix T53134: denoising with CPU + GPU render leaves some tiles noisy.	Brecht Van Lommel

2017-10-24	Code refactor: move more memory allocation logic into device API.	Brecht Van Lommel
	* Remove tex_* and pixels_* functions, replace by mem_. Add MEM_TEXTURE and MEM_PIXELS as memory types recognized by devices. * No longer create device_memory and call mem_* directly, always go through device_only_memory, device_vector and device_pixels.
2017-10-24	Code refactor: use device_only_memory and device_vector in more places.	Brecht Van Lommel

2017-10-24	Code refactor: store device/interp/extension/type in each device_memory.	Brecht Van Lommel

2017-10-24	Code refactor: pass device to scene, check OSL with device info.	Brecht Van Lommel

2017-10-21	Code refactor: avoid some unnecessary device memory copying.	Brecht Van Lommel

2017-10-21	Cycles: combined CPU + GPU rendering support.	Brecht Van Lommel
	CPU rendering will be restricted to a BVH2, which is not ideal for raytracing performance but can be shared with the GPU. Decoupled volume shading will be disabled to match GPU volume sampling. The number of CPU rendering threads is reduced to leave one core dedicated to each GPU. Viewport rendering will also only use GPU rendering still. So along with the BVH2 usage, perfect scaling should not be expected. Go to User Preferences > System to enable the CPU to render alongside the GPU. Differential Revision: https://developer.blender.org/D2873
2017-10-19	Cycles: Add extra logging in CUDA device detection code	Sergey Sharybin

2017-10-18	Fix T53098, T53079: OpenCL world texture errors after recent changes.	Brecht Van Lommel

2017-10-11	Cycles: Fix possible race condition when initializing devices list	Sergey Sharybin

2017-10-08	Cycles: schedule more work for non-display and compute preemption CUDA cards.	Brecht Van Lommel
	This change affects CUDA GPUs not connected to a display or connected to a display but supporting compute preemption so that the display does not freeze. I couldn't find an official list, but compute preemption seems to be only supported with GTX 1070+ and Linux (not GTX 1060- or Windows). This helps improve small tile rendering performance further if there are sufficient samples x number of pixels in a single tile to keep the GPU busy.
2017-10-08	Fix T53017: Cycles not detecting AMD GPU when there is an NVidia GPU too.	Mathieu Menuet
	Best guess is that cuInit() somehow interferes with the AMD graphics driver on Windows, and switching the initialization order to do OpenCL first seems to solve the issue.
2017-10-08	Code refactor: use DeviceInfo to enable QBVH and decoupled volume shading.	Brecht Van Lommel

2017-10-07	Code refactor: make texture code more consistent between devices.	Brecht Van Lommel
	* Use common TextureInfo struct for all devices, except CUDA fermi. * Move image sampling code to kernels//kernel__image.h files. * Use arrays for data textures on Fermi too, so device_vector<Struct> works.
2017-10-05	Code refactor: split displace/background into separate kernels, remove luma.	Brecht Van Lommel

2017-10-05	Fix incorrect CUDA remaining time estimate after previous commit.	Brecht Van Lommel

2017-10-04	Cycles: CUDA faster rendering of small tiles, using multiple samples like ↵	Brecht Van Lommel
	OpenCL. The work size is still very conservative, and this doesn't help for progressive refine. For that we will need to render multiple tiles at the same time. But this should already help for denoising renders that require too much memory with big tiles, and just generally soften the performance dropoff with small tiles. Differential Revision: https://developer.blender.org/D2856
2017-10-04	Code refactor: use split variance calculation for mega kernels too.	Brecht Van Lommel
	There is no significant difference in denoised benchmark scenes and denoising ctests, so might as well make it all consistent.
2017-10-04	Code refactor: remove rng_state buffer and compute hash on the fly.	Brecht Van Lommel
	A little faster on some benchmark scenes, a little slower on others, seems about performance neutral on average and saves a little memory.
2017-10-04	Code refactor: add WorkTile struct for passing work to kernel.	Brecht Van Lommel
	This makes sharing some code between mega/split in following commits a bit easier, and also paves the way for rendering multiple tiles later.
2017-09-27	Code refactor: simplify CUDA context push/pop.	Brecht Van Lommel
	Makes it possible to call a function like mem_alloc() when the context is already active. Also fixes some missing pops in case of errors.
2017-08-30	Cycles: Fix build with networking enabled	Mai Lavelle

2017-08-25	Cycles: Correct logging of sued CPU intrisics	Sergey Sharybin

2017-08-21	Cycles: attempt to recover from crashing CUDA/OpenCL drivers on Windows.	Brecht Van Lommel
	I don't know if this will actually work, needs testing. Ref T52064.
2017-08-12	Code cleanup: fix warning and improve terminology.	Brecht Van Lommel

2017-08-09	Cycles: Remove ulong usage	Sergey Sharybin
	This is a bit confusing, especially when one mixes OpenCL code where ulong equals to uint64_t with CPU side code where ulong is expected to be something else from the naming. This commit makes it so we use explicit name, common on all platforms.
2017-08-09	Cycles: Proper fix for recent OpenCL image crash	Mai Lavelle
	Problem was that some code checks to see if device_pointer is null or not and the new allocator wasn't even setting the pointer to anything as it tracks memory location separately. Setting the pointer to non null keeps all users of device_pointer happy.
2017-08-08	Cycles: More fixes for Windows 32 bit	Sergey Sharybin
	- Apparently MSVC does not support compound literals in C++ (at least by the looks of it). - Not sure how opencl_device_assert was managing to set protected property of the Device class.
2017-08-08	Cycles: Fix compilation error without C++11	Sergey Sharybin
	Common folks, nobody considered master a C++11 only branch. Such decision is to be done officially and will involve changes in quite a few infrastructure related areas.
2017-08-08	Cycles: Pack kernel textures into buffers for OpenCL	Mai Lavelle
	Image textures were being packed into a single buffer for OpenCL, which limited the amount of memory available for images to the size of one buffer (usually 4gb on AMD hardware). By packing textures into multiple buffers that limit is removed, while simultaneously reducing the number of buffers that need to be passed to each kernel. Benchmarks were within 2%. Fixes T51554. Differential Revision: https://developer.blender.org/D2745
2017-08-07	Cycles: Cleanup, space after keyword	Sergey Sharybin

2017-08-07	Code refactor: split defines into separate header, changes to SSE type headers.	Brecht Van Lommel
	I need to use some macros defined in util_simd.h for float3/float4, to emulate SSE4 instructions on SSE2. But due to issues with order of header includes this was not possible, this does some refactoring to make it work. Differential Revision: https://developer.blender.org/D2764
2017-08-05	Cycles: CUDA split performance tweaks, still far from megakernel.	Brecht Van Lommel
	On Pabellon, 25.8s mega, 35.4s split before, 32.7s split after.
2017-07-11	Cycles: Disable OpenCL clFlush workarounds	Sergey Sharybin
	This is something which was reported to work fine by Mai, Benjamin and confirmed by myself. Disabling this workaround gains us some speedup: Before Now bmw27 04:28.42 04:07.79 classroom 09:26.48 08:54.53 fishy_cat 08:44.01 08:18.70 koro 09:17.98 08:57.18 pavillon_barcelone 12:26.64 11:52.81 Test environment is: - Ubuntu 16.04, with all updates installed - AMD RX 480 GPU - amdgpu pro driver version 17.10-450821
2017-07-07	Cycles: Fix ambiguity in call of min() function	Sergey Sharybin

2017-07-06	Cycles: Add artificial memory limit debug option for OpenCL	Mai Lavelle

2017-07-06	Cycles: Dont allow global size to fall to zero	Mai Lavelle

2017-07-06	Cycles: Detect out of memory before buffer allocation in OpenCL devices	Mai Lavelle

2017-07-05	Cycles: Pass string by const reference rather than by value	Sergey Sharybin
	Some of the functions might have been inlined, but others i don't see how that was possible (don't think virtual functions can be inlined here). In any case, better be explicitly optimal in the code.
2017-07-03	Cycles: Add missing split kernel to CPUDevice	Lukas Stockner

2017-06-30	Cycles: Disable baking in mega kernel when not in use to improve build times	Mai Lavelle