Age | Commit message (Collapse) | Author |
|
Reduce thread divergence in kernel_shader_eval.
Rays are sorted in blocks of 2048 according to shader->id.
On R9 290 Classroom is ~30% faster, and Pabellon Barcelone is ~8% faster.
No sorting for CUDA split kernel.
Reviewers: sergey, maiself
Reviewed By: maiself
Differential Revision: https://developer.blender.org/D2598
|
|
It is really confusing to have some functions available in some devices
and not on another devices.
|
|
viewport is used
Previously the logic was different for duplis and regular objects: regular objects
were using render visibility when Render Layer option is enabled which duplis were
always using viewport visibility when rendering from the viewport.
This was quite confusing because caused different results in viewport and render
when artists were expecting them to match 1:1.
|
|
Can not measure any performance difference, so seems the code is identical
and just shorter.
|
|
It will use SSE2 optimized version when is possible.
|
|
These were causing problems with Nvidia OpenCL.
|
|
|
|
This implements branched path tracing for the split kernel.
General approach is to store the ray state at a branch point, trace the
branched ray as normal, then restore the state as necessary before iterating
to the next part of the path. A state machine is used to advance the indirect
loop state, which avoids the need to add any new kernels. Each iteration the
state machine recreates as much state as possible from the stored ray to keep
overall storage down.
Its kind of hard to keep all the different integration loops in sync, so this
needs lots of testing to make sure everything is working correctly. We should
probably start trying to deduplicate the integration loops more now.
Nonbranched BMW is ~2% slower, while classroom is ~2% faster, other scenes
could use more testing still.
Reviewers: sergey, nirved
Reviewed By: nirved
Subscribers: Blendify, bliblubli
Differential Revision: https://developer.blender.org/D2611
|
|
Spotted by Mai in IRC, thanks!
|
|
Global size y needs to be a multiple of 16.
|
|
This way we don't re-load kernels for every sample in the viewport.
Additionally, we don't risk global size changed inbetween of samples.
|
|
Not sure if this is a proper fix, but was getting frequent crashes, so
committing this real quick just to make master sable again. Can be
reverted later if there's a better fix. The changes to images really
need a closer look...
|
|
This way moving Blender bundle around doesn't re-trigger kernels compilation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Previous fix did not work for mixed textures. This one will over-allocate
information array, but it's better than not being able to render at all.
Some more cleanup and improvement is coming.
|
|
unlimited textures commit"
This reverts commit 8f4166ee495531fa38b676b0a5ef4c482e89f9a5.
The fix was not correct for cases when we've got float textures.
|
|
|
|
textures commit
The indexing was totally wrong in both image packing code and image sampling in kernel.
Fixes T51341: Cycles OpenCL corruption in todays buildbot
|
|
|
|
|
|
|
|
|
|
Whitespace and order of switch/case etc. Let's try to stick to float4/byte4/half4/float/byte/half order as defined in "ImageDataType".
|
|
This patch allows for an unlimited number of textures in Cycles where the hardware allows. It replaces a number static arrays with dynamic arrays and changes the way the flat_slot indices are calculated. Eventually, I'd like to get to a point where there are only flat slots left and textures off all kinds are stored in a single array.
Note that the arrays in DeviceScene are changed from containing device_vector<T> objects to device_vector<T>* pointers. Ideally, I'd like to store objects, but dynamic resizing of a std:vector in pre-C++11 calls the copy constructor, which for a good reason is not implemented for device_vector. Once we require C++11 for Cycles builds, we can implement a move constructor for device_vector and store objects again.
The limits for CUDA Fermi hardware still apply.
Reviewers: tod_baudais, InsigMathK, dingto, #cycles
Reviewed By: dingto, #cycles
Subscribers: dingto, smellslikedonkey
Differential Revision: https://developer.blender.org/D2650
|
|
canceling
Previously canceling a render done by the split kernel could cause artifacts
such as very bright or dark tiles. This was caused by unfinished samples
being included in the output buffer. To avoid this we now wait till all the
currently rendering samples have finished, up to a limit of twice the
expected time for them to finish (currently this is no more than 20 seconds,
but usually its much less). If samples still haven't finished by then we
stop anyways in case there's an endless loop occurring.
|
|
Single program builds twice as fast as multi programs, so its better for
users to have it as the default.
|
|
Testing showed no issues so there's no reason to not have this.
|
|
Was also causing some bad memory access caused by read data from non-initialized
arrays.
Repoted by bzztploink in IRC, thanks!
|
|
It was totally unclear whether the device is enabled or disabled.
Lots of people got fully lost in the current interface.
While the solution is not fully ideal, it is at least solves
ambiguity in the interface.
|
|
Also rremove trailing whitespace.
|
|
This works around a long outstanding issue T50176 with cycles on msvc2015/x86 . root cause is still unknown though,feels like a game of whack'a'mole
Reviewers: sergey, dingto
Subscribers: Blendify
Tags: #cycles
Differential Revision: https://developer.blender.org/D2573
|
|
Using -cl-fast-relaxed-math assumes no NaN/Inf values in any expression.
This causes problems on overflow, division by zero, square root of negative number.
Comparisons with NaN or infinite value are affected as well.
This patch causes <2% slowdown on benchmark scenes.
Fix T50985: Rendering volume scatter with GPU OpenCL comes to an halt after a few seconds
|
|
|
|
|
|
Was doing lots of investigation recently, with need to have lots of things
side by side.
|
|
This file was even a bigger mess than vectorized types header,
cleaning it up to make it easier to maintain this files and
extend further.
|
|
The final goal to reach is to make vectorized types much easier to maintain
and the previous design had following issues:
- Having all types and methods implementation made the source file rather
bloated and unfun to navigate in.
- It was not possible to quickly glance available API for the type you are
interested in.
- Adding more vectorization types will bloat the file even more, making
things even more tricky to follow.
|
|
|
|
Since 9d50175 this is no longer needed, at least not with the current
sampler we are using.
|
|
|
|
Fixes performance issues of C++ one with Windows MSVC debug builds...
Merely a translation from msgfmt.cc code by @sergey, using BLI libs intead of C++'s stdlib.
Reviewers: sergey, campbellbarton, LazyDodo
Subscribers: sergey
Differential Revision: https://developer.blender.org/D2605
|
|
This way we can skip it from compiling into OpenCL kernels by making
this shader compile-time feature.
|
|
The idea is to have osme geenric BSDF node which is subclassed by
"regular" BSDF nodes and uber shaders.
This way we can access special type and closure type for making
decisions somewhere else.
|
|
Similar to previous commit for Gflags.
|
|
|