Age | Commit message (Collapse) | Author |
|
|
|
|
|
|
|
Again, 2 times quicker with BLI than with OMP (from about 5ms to 2.5ms
per frame for the parallelized loop, on a rather small video...).
|
|
Watch out for changes to variables passed by value, these changes
don't persist across the split.
|
|
No need to print status for basic & reliable operations,
build systems can output operations they run if needed,
or debug output changed in the source if developers are debugging.
Nice for ninja, so any printed text hints at a problem to fix.
|
|
|
|
|
|
|
|
Neither me nor Campbell could redo the issue, lets get rid of this workaround
and fix it properly if still needed.
|
|
|
|
DoF disabled
|
|
The title says it all actually, controlled with DoF check box next to textured
solid check box.
Thanks Campbell for review!
|
|
This is fully unreadable to have lots of boolean arguments scattered across the
whole argument list. What does `false, true, true` mean in terms of behavior?
Replace those with bitfield which has advantage of having more human readable
meaning.
|
|
With this kernels for BWM and classroom scenes are building in half the
time as master. Render times are 1% faster as well.
|
|
|
|
|
|
|
|
Produces different results when lamps overlap from POV of a ray, but allows
for this function to be split. The differences probably wont be noticeable
in most scenes. Old behavior could be reattained by placing lamps
into the BVH.
|
|
|
|
|
|
Gives another 4 seconds improvement.
|
|
|
|
`BKE_maskrasterize_buffer`!
So this deduplicates and simplifies code, yeah.
Also, as an odd bonus, new code seems slighly quicker than previous one
(about 5 to 10% quicker).
|
|
Once again nothing much to say here, except that whole mask rendering
process from VSE is about 25% quicker now. ;)
|
|
Pretty straightforward this time, we already have a single struct
pointer containing all needed data (or nearly).
And we gain about 10-15% speed on tracking! :)
|
|
Two more 'not really useful' cases (OMP only shows some noticeable
speedup with above 1M elements, and since this is quick operation anyway
compared to even ather basic operators, gain is in the 1% area of total
processing time in best case).
So not worth parallelizing here, we'll gain much more on tackling heavy
operations. ;)
And BMesh is free from OMP now!
|
|
Performances tests on this one are quite surprising actually...
Parallelized loop itself is at least 10 times quicker with new BLI_task
code than it was with OMP. And subdividing e.g. a heavy mesh with 3
levels of multires (whole process) takes 8 seconds with new code, while
10 seconds with OMP one. And cherry on top, BLI_task code only uses
about 50% of CPU load, while OMP one was at nearly 100%!
In fact, I suspect OMP code was not properly declaring outside vars,
generating a lot of uneeded locks.
Also, raised the minimum level of subdiv to enable parallelization,
tests here showed that we only start to get significant gain with subdiv
levels of 4, below single threaded one is quicker.
|
|
Those three ones were actually giving no significant benefits, in fact
even slowing things down in one case compared to no parallelization at
all (in `BM_mesh_elem_table_ensure()`).
Point being, once more, parallelizing *very* small tasks (like index or
flag setting, etc.) is nearly never worth it.
Also note that we could not easlily use per-item parallel looping in
those three cases, since they are heavily relying on valid
loop-generated index (or are doing non-threadable things like allocation
from a mempool)...
|
|
|
|
|
|
This makes it so that path_flag and max_closures are not passed to shader eval
functions, instead a limited number of ShaderEvalIntents are used. This also
removes the need for ShaderEvalTask in the mega kernel and simplifies the
code a bit.
|
|
Don't operate on multiple boundaries at once,
instead keep collapsing from the first selected boundary.
|
|
Previously outcome depended on order of edges,
now the longest boundary edges are rotated first,
then the faces connected edges.
This gives more predictable results, allowing regions containing
a vertex fan to be rotated onto the next vertex.
|
|
|
|
activated immediately instead of upon LMB
|
|
That was a nasty one, Debug build would never have any issue (even tried
with 64 threads!), but Release build would deadlock nearly immediately,
even with only 2 threads!
What happened here (I think) is that gcc optimizer would generate a
specific path endlessly looping when initial value of virtual_lock was
FLT_MAX, by-passing re-assignment from v_no[0] and the atomic cas
completely. Which would have been correct, should v_no[0] not have been
shared (and modified) by multiple threads. ;)
Idea of that (broken) for loop was to avoid completely calling the
atomic cas as long as v_no[0] was locked by some other thread, but...
Guess the avoided/missing memory barrier was the root of the issue here.
Lesson of the evening: Remember kids, do not trust your compiler to
understand all possible threading-related side effects, and be explicit
rather than elegant when using atomic ops!
Side-effect lesson: do check both release and debug builds when messing
with said atomic ops...
|
|
Using atomic cas correctly is really hairy... ;)
In this case, the returned value from cas needs to validate *two*
conditions, it must not be FLT_MAX (which is our 'locked' value and
would mean another thread has already locked it), but it also must be
equal to previously stored value...
This means we need two steps per loop here, hence using a 'for' loop
instead of a 'while' one now.
Note that collisions are (as expected) very rare, less than 1 for 10k
typically, so did not catch the issue initially (also because I was
mostly working with release build to check on performances...).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Sorry about that...
|
|
|
|
|
|
`BM_mesh_normals_update` was converted from OMP to new parallel iterator code,
basic test with heavily subdivided cube (24.5k faces) gives:
- old OMP code: average 10ms per run.
- new BLI_task code: average 6ms per run.
So new code seems to be easily 40% quicker, in addition to getting rid of OMP. ;)
Reviewers: sergey, campbellbarton
Differential Revision: https://developer.blender.org/D2930
|