Age | Commit message (Collapse) | Author |
|
Together with the extended loop callback and userdata_chunk, this allows to perform
cumulative tasks (like aggregation) in a lockfree way using local userdata_chunk to store temp data,
and once all workers have finished, to merge those userdata_chunks in the finalize callback
(from calling thread, so no need to lock here either).
Note that this changes how userdata_chunk is handled (now fully from 'main' thread,
which means a given worker thread will always get the same userdata_chunk, without
being re-initialized anymore to init value at start of each iter chunk).
|
|
Code by @sergey, with small edits and doc by @mont29.
|
|
This commit implements new function BLI_task_pool_push_from_thread()
who's main goal is to have less parasitic load on the CPU bu avoiding
memory allocations as much as possible, making taks pushing cheaper.
This function expects thread ID, which must be 0 for the thread from
which pool is created from (and from which wait_work() is called) and
for other threads it mush be the ID which was sent to the thread working
function.
This reduces allocations quite a bit in the new dependency graph,
hopefully gaining some visible speedup on a fewzillion core machines
(on my own machine can only see benefit in profiler, which shows
significant reduce of time wasted in the memory allocation).
|
|
Based on usages so far:
- Split callback worker func in two, 'basic' and 'extended' versions. The former goes back
to the simplest verion, while the later keeps the 'userdata_chunk', and gets the thread_id too.
- Add use_threading to simple BLI_task_parallel_range(), turns out we need this pretty much systematically,
and allows to get rid of most usages of BLI_task_parallel_range_ex().
- Now BLI_task_parallel_range() expects 'basic' version of callback, while BLI_task_parallel_range_ex()
expectes 'extended' version of the callback.
All in all, this should make common usage of BLI_task_parallel_range simpler (less verbose), and add
access to advanced callback to thread id, which is mandatory in some (future) cases.
|
|
use threading or not, instead of threshold.
From recent experience, turns out we often do want to use something else than basic
range of parallelized forloop as control parameter over threads usage, so now BLI func
only takes a boolean, and caller defines best check for its own case.
|
|
This mimics OpenMP's 'firstprivate' feature. It is sometimes handy to have some persistent local data during a whole chunk.
Reviewers: sergey
Reviewed By: sergey
Subscribers: campbellbarton
Differential Revision: https://developer.blender.org/D1635
|
|
With current code, in single-threaded context, a pool of task may never be executed
until one calls BLI_task_pool_work_and_wait() on it, this is not acceptable for
asynchronous tasks where you never want to actually lock the main thread.
This commits adds an extra thread in single-threaded case, and a new 'type' of pool,
such that one can create real background pools of tasks. See code for details.
Review: D1565
|
|
Useful in case one needs more complex handling of tasks data than a mere MEM_freeN().
|
|
Allocate statistics array dynamically, so increasing max number of threads does
not increase sloppyness of the memory usage.
For the further cleanups: we can try alloca-ing this array, but it's also not
really safe because we can have quite huge number of threads in the future.
Plus statistics will allocate memory for each individual entry, so using alloca
is not going to give anything beneficial here.
|
|
|
|
This way we can have scheduler capable of scheduling tasks on all the CPUs
but in the same time we can limit tasks like baking (in the future) to use
no more than given number of threads.
|
|
It now supports different scheduling schemas: dynamic and static.
Static one is the default and it splits work into equal number of
range iterations.
Dynamic one allocates chunks of 32 iterations which then being
dynamically send to a thread which is currently idling.
This gives slightly better performance. Still some tricks are
possible to have. For example we can use some smarter static scheduling
when one thread might steal tasks from another threads when it runs
out of work to be done.
Also removed unneeded spin lock in the mesh deform evaluation,
on the first glance it seemed to be a reduction involved here but
int fact threads are just adding value to the original vertex
coordinates. No write access to the same element of vertexCos
happens from separate threads.
|
|
This commit switches meshdeform modifier to use threads to evaluate
the vertices positions using the central task scheduler.
SO now we've got an utility function to help splitting the for loop
into tasks using BLI_task module which is pretty straightforward to
use: it gets range (which is an integer lower and higher bounds) and
the function and userdata to be invoked for each of the iterations.
The only weak point for now is the passing the data to the callback,
this isn't so trivial to improve in pure C.
Reviewers: campbellbarton
Differential Revision: https://developer.blender.org/D838
|
|
|
|
Replaces ThreadedWorker and is gonna to be used
for threaded object update in the future and
some more upcoming changes.
But in general, it's to be used for any task
based subsystem in Blender.
Originally written by Brecht, with some fixes
and tweaks by self.
|