git.blender.org/blender.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2020-01-07	Cleanup: Remove unused task scheduler constants	Sergey Sharybin

2019-12-17	Cleanup: redundant struct declarations	Campbell Barton

2019-11-26	BLI_task: Add pooled threaded index range iterator, Take II.	Bastien Montagne
	This code allows to push a set of different operations all based on iterations over a range of indices, and then process them all at once over multiple threads. This commit also adds unit tests for both old un-pooled, and new pooled task_parallel_range family of functions, as well as some basic performances tests. This is mainly interesting for relatively low amount of individual tasks, as expected. E.g. performance tests on a 32 threads machine, for a set of 10 different tasks, shows following improvements when using pooled version instead of ten sequential calls to BLI_task_parallel_range(): \| Num Items \| Sequential \| Pooled \| Speed-up \| \| --------- \| ---------- \| ------- \| -------- \| \| 10K \| 365 us \| 138 us \| 2.5 x \| \| 100K \| 877 us \| 530 us \| 1.66 x \| \| 1000K \| 5521 us \| 4625 us \| 1.25 x \| Differential Revision: https://developer.blender.org/D6189 Note: Compared to previous commit yesterday, this reworks atomic handling in parallel iter code, and fixes a dummy double-free bug. Now we should only use the two critical values for synchronization from atomic calls results, which is the proper way to do things. Reading a value after an atomic operation does not guarantee you will get the latest value in all cases (especially on Windows release builds it seems).
2019-11-25	Revert "BLI_task: Add pooled threaded index range iterator."	Bastien Montagne
	This reverts commit f9028a3be1f77c01edca44a68894e2ba9d9cfb14. This is giving weird heisenbug crash on only Windows release builds... Reverting until we understand to issue.
2019-11-25	BLI_task: Add pooled threaded index range iterator.	Bastien Montagne
	This code allows to push a set of different operations all based on iterations over a range of indices, and then process them all at once over multiple threads. This commit also adds unit tests for both old un-pooled, and new pooled `task_parallel_range` family of functions, as well as some basic performances tests. This is mainly interesting for relatively low amount of individual tasks, as expected. E.g. performance tests on a 32 threads machine, for a set of 10 different tasks, shows following improvements when using pooled version instead of ten sequential calls to `BLI_task_parallel_range()`: \| Num Items \| Sequential \| Pooled \| Speed-up \| \| --------- \| ---------- \| ------- \| -------- \| \| 10K \| 365 us \| 138 us \| 2.5 x \| \| 100K \| 877 us \| 530 us \| 1.66 x \| \| 1000K \| 5521 us \| 4625 us \| 1.25 x \| Differential Revision: https://developer.blender.org/D6189
2019-10-30	BLI_task: Add new generic `BLI_task_parallel_iterator()`.	Bastien Montagne
	This new function is part of the 'parallel for loops' functions. It takes an iterator callback to generate items to be processed, in addition to the usual 'process' func callback. This allows to use common code from BLI_task for a wide range of custom iteratiors, whithout having to re-invent the wheel of the whole tasks & data chuncks handling. This supports all settings features from `BLI_task_parallel_range()`, including dynamic and static (if total number of items is knwon) scheduling, TLS data and its finalize callback, etc. One question here is whether we should provide usercode with a spinlock by default, or enforce it to always handle its own sync mechanism. I kept it, since imho it will be needed very often, and generating one is pretty cheap even if unused... ---------- Additionaly, this commit converts (currently unused) `BLI_task_parallel_listbase()` to use that generic code. This was done mostly as proof of concept, but performance-wise it shows some interesting data, roughly: - Very light processing (that should not be threaded anyway) is several times slower, which is expected due to more overhead in loop management code. - Heavier processing can be up to 10% quicker (probably thanks to the switch from dynamic to static scheduling, which reduces a lot locking to fill-in the per-tasks chunks of data). Similar speed-up in non-threaded case comes as a surprise though, not sure what can explain that. While this conversion is not really needed, imho we should keep it (instead of existing code for that function), it's easier to have complex handling logic in as few places as possible, for maintaining and for improving it. Note: That work was initially done to allow for D5372 to be possible... Unfortunately that one proved to be not better than orig code on performances point of view. Reviewed By: sergey Differential Revision: https://developer.blender.org/D5371
2019-07-30	BLI_task: Cleanup: rename some structs to make them more generic.	Bastien Montagne
	TLS and Settings can be used by other types of parallel 'for loops', so removing 'Range' from their names. No functional changes expected here.
2019-07-30	BLI_task: tweak default chunk size for `BLI_task_parallel_range()`.	Bastien Montagne
	Previously we were setting it to 1 (aka no 'chunking'), to follow previous behavior. However, this is far from optimal, especially with CPUs that can have tens of threads nowadays. Now taking an heuristic approach (inspired from the one already existing for `BLI_task_parallel_listbase()`, which tries to guesstimate best chunk sizes based on several factors (amount of threads/parallel tasks, total number of items, ...). Think this is a reasonable base ground, more optimization here would of course be possible. Note that code that was already explicitely settings some value here won't be affected at all by that change.
2019-04-17	ClangFormat: apply to source, most of intern	Campbell Barton
	Apply clang format as proposed in T53211. For details on usage and instructions for migrating branches without conflicts, see: https://wiki.blender.org/wiki/Tools/ClangFormat
2019-04-16	Cleanup: trailing commas	Campbell Barton

2019-02-18	doxygen: add newline after \file	Campbell Barton
	While \file doesn't need an argument, it can't have another doxy command after it.
2019-02-06	Cleanup: remove redundant doxygen \file argument	Campbell Barton
	Move \ingroup onto same line to be more compact and make it clear the file is in the group.
2019-02-01	Cleanup: remove redundant, invalid info from headers	Campbell Barton
	BF-admins agree to remove header information that isn't useful, to reduce noise. - BEGIN/END license blocks Developers should add non license comments as separate comment blocks. No need for separator text. - Contributors This is often invalid, outdated or misleading especially when splitting files. It's more useful to git-blame to find out who has developed the code. See P901 for script to perform these edits.
2019-01-15	Cleanup: commas at the end of enums	Campbell Barton
	Without this clang-format may wrap them onto a single line.
2018-12-04	BLI_task: fix queue in work_and_wait, and support resetting.	Alexander Gavrilov
	To make the pool more usable for running multiple stages of tasks, fix local queue handling in BLI_task_pool_work_and_wait. Specifically, after the wait loop the local queue should be empty, or the wait part of the function contract isn't fulfilled. Instead, check and run any tasks in queue before the wait loop. Also, add a new function that resets the suspended state of the pool.
2018-06-29	Cleanup: trailing newlines	Campbell Barton

2018-06-01	Cleanup: trailing whitespace (comment blocks)	Campbell Barton
	Strip unindented comment blocks - mainly headers to avoid conflicts.
2018-01-21	Cleanup: style	Campbell Barton

2018-01-10	Task scheduler: Use restrict pointer qualifier	Sergey Sharybin
	Those pointers are never to be aliased, so let's be explicit about this and hope compiler does save some CPU ticks.
2018-01-09	Task scheduler: Use const qualifiers in parallel range	Sergey Sharybin

2018-01-09	Task scheduler: Add minimum number of iterations per thread in parallel range	Sergey Sharybin
	The idea is to support following: allow doing parallel for on a small range, each iteration of which takes lots of compute power, but limit such range to a subset of threads. For example, on a machine with 44 threads we can occupy 4 threads to handle range of 64 elements, 16 elements per thread, where each block of 16 elements is very complex to compute. The idea should be to use this setting instead of global use_threading flag, which is only based on size of array. Proper use of the new flag will improve threadability. This commit only contains internal task scheduler changes, this setting is not used yet by any areas.
2018-01-09	Task scheduler: Use single parallel range function with more flexible function	Sergey Sharybin
	Now all the fine-tuning is happening using parallel range settings structure, which avoid passing long lists of arguments, allows extend fine-tuning further, avoid having lots of various functions which basically does the same thing.
2018-01-09	Task scheduler: Get rid of extended version of parallel range callback	Sergey Sharybin
	Wrap all arguments into TLS type of argument. Avoids some branching and also makes it easier to extend things in the future.
2017-11-23	Add a new parallel looper for MemPool items to BLI_task.	Bastien Montagne
	It merely uses the new thread-safe iterators system of mempool, quite straight forward. Note that to avoid possible confusion with two void pointers as parameters of the callback, a dummy opaque struct pointer is used instead for the second parameter (pointer generated by iteration over mempool), callback functions must explicitely convert it to expected real type. Also added a basic gtest for this new feature.
2017-05-31	Task scheduler: Optimize subsequent pushing bunch of tasks	Sergey Sharybin
	The idea is to accumulate all new tasks in a thread local queue first without doing any thread synchronization (aka, locks and conditional variables) and move those tasks to a scheduler queue once they are all ready. This way we avoid per-task-pool lock and only have one lock per bunch of tasks. This is particularly handy when scheduling new dependency graph node children. Brings FPS of cached simulation from the linked below file from ~30 to ~50. See documentation for BLI_task_pool_delayed_push_{begin, end} and for TaskThreadLocalStorage::do_delayed_push. Fixes T50027: Rigidbody playback and simulation performance regression with new depsgraph Thanks Bastien for the review!
2017-03-07	Task scheduler: Add concept of suspended pools	Sergey Sharybin
	Suspended pools allows to push huge amount of initial tasks without any threading synchronization and hence overhead. This gives ~50% speedup of cached rigid body with file from T50027 and seems to have no negative affect in other scenes here.
2017-03-07	Task scheduler: Remove per-pool threads limit	Sergey Sharybin
	This feature was adding extra complexity to task scheduling which required yet extra variables to be worried about to be modified in atomic manner, which resulted in following issues: - More complex code to maintain, which increases risks of something going wrong when we modify the code. - Extra barriers and/or locks during task scheduling, which causes extra threading overhead. - Unable to use some other implementation (such as TBB) even for the comparison tests. Notes about other changes. There are two places where we really had to use that limit. One of them is the single threaded dependency graph. This will now construct a single-threaded scheduler at evaluation time. This shouldn't be a problem because it only happens when using debugging command line arguments and the code simply don't run in regular Blender operation. The code seems a bit duplicated here across old and new depsgraph, but think it's OK since the old depsgraph is already gone in 2.8 branch and i don't see where else we might want to use such a single-threaded scheduler. When/if we'll want to do so, we can move it to a centralized single-threaded scheduler in threads.c. OpenGL render was a bit more tricky to port, but basically we are using conditional variables to wait background thread to do all the job.
2017-03-03	Fix own previous commit, sorry about that :(	Bastien Montagne

2017-03-01	Task scheduler: Remove query for the pool's number of threads	Sergey Sharybin
	Not really happy of per-pool threads limit, need to find better approach to that. But at least it's possible to get rid of half of the nastyness here by removing getter which was only used in an assert statement. That piece of code was already well-tested and this code becomes obsolete in the new depsgraph and does no longer exists in blender 2.8 branch.
2017-03-01	Task scheduler: Remove counter of done tasks	Sergey Sharybin
	This was only used for progress report, and it's wrong because: - Pool might in theory be re-used by different tasks - We should not make any decision based on scheduling stats Proper way is to take care of progress by the task itself.
2016-05-16	BLI_task: Add new 'BLI_task_parallel_range_finalize()'.	Bastien Montagne
	Together with the extended loop callback and userdata_chunk, this allows to perform cumulative tasks (like aggregation) in a lockfree way using local userdata_chunk to store temp data, and once all workers have finished, to merge those userdata_chunks in the finalize callback (from calling thread, so no need to lock here either). Note that this changes how userdata_chunk is handled (now fully from 'main' thread, which means a given worker thread will always get the same userdata_chunk, without being re-initialized anymore to init value at start of each iter chunk).
2016-05-13	BLI_task: add support for listbase parallelized for loops.	Bastien Montagne
	Code by @sergey, with small edits and doc by @mont29.
2016-05-10	Task scheduler: Add thread-aware task push routines	Sergey Sharybin
	This commit implements new function BLI_task_pool_push_from_thread() who's main goal is to have less parasitic load on the CPU bu avoiding memory allocations as much as possible, making taks pushing cheaper. This function expects thread ID, which must be 0 for the thread from which pool is created from (and from which wait_work() is called) and for other threads it mush be the ID which was sent to the thread working function. This reduces allocations quite a bit in the new dependency graph, hopefully gaining some visible speedup on a fewzillion core machines (on my own machine can only see benefit in profiler, which shows significant reduce of time wasted in the memory allocation).
2016-01-16	Cleanup: BLI_task - API changes.	Bastien Montagne
	Based on usages so far: - Split callback worker func in two, 'basic' and 'extended' versions. The former goes back to the simplest verion, while the later keeps the 'userdata_chunk', and gets the thread_id too. - Add use_threading to simple BLI_task_parallel_range(), turns out we need this pretty much systematically, and allows to get rid of most usages of BLI_task_parallel_range_ex(). - Now BLI_task_parallel_range() expects 'basic' version of callback, while BLI_task_parallel_range_ex() expectes 'extended' version of the callback. All in all, this should make common usage of BLI_task_parallel_range simpler (less verbose), and add access to advanced callback to thread id, which is mandatory in some (future) cases.
2015-12-30	BLI_task: change BLI_task_parallel_range_ex() to just take a bool whether to ↵	Bastien Montagne
	use threading or not, instead of threshold. From recent experience, turns out we often do want to use something else than basic range of parallelized forloop as control parameter over threads usage, so now BLI func only takes a boolean, and caller defines best check for its own case.
2015-11-25	BLI_task: BLI_task_parallel_range_ex: add some per-chunk userdata.	Bastien Montagne
	This mimics OpenMP's 'firstprivate' feature. It is sometimes handy to have some persistent local data during a whole chunk. Reviewers: sergey Reviewed By: sergey Subscribers: campbellbarton Differential Revision: https://developer.blender.org/D1635
2015-11-02	BLI_task: add support for full-background taskpools.	Bastien Montagne
	With current code, in single-threaded context, a pool of task may never be executed until one calls BLI_task_pool_work_and_wait() on it, this is not acceptable for asynchronous tasks where you never want to actually lock the main thread. This commits adds an extra thread in single-threaded case, and a new 'type' of pool, such that one can create real background pools of tasks. See code for details. Review: D1565
2015-11-02	BLI_task: add freedata callback to tasks.	Bastien Montagne
	Useful in case one needs more complex handling of tasks data than a mere MEM_freeN().
2015-04-13	Depsgraph debug: Remove hardcoded array of BLENDER_MAX_THREADS elements	Sergey Sharybin
	Allocate statistics array dynamically, so increasing max number of threads does not increase sloppyness of the memory usage. For the further cleanups: we can try alloca-ing this array, but it's also not really safe because we can have quite huge number of threads in the future. Plus statistics will allocate memory for each individual entry, so using alloca is not going to give anything beneficial here.
2015-01-06	cleanup: warnings	Campbell Barton

2014-11-21	Task scheduler: Add an option to limit number of threads per pool	Sergey Sharybin
	This way we can have scheduler capable of scheduling tasks on all the CPUs but in the same time we can limit tasks like baking (in the future) to use no more than given number of threads.
2014-11-03	Optimization of parallel range	Sergey Sharybin
	It now supports different scheduling schemas: dynamic and static. Static one is the default and it splits work into equal number of range iterations. Dynamic one allocates chunks of 32 iterations which then being dynamically send to a thread which is currently idling. This gives slightly better performance. Still some tricks are possible to have. For example we can use some smarter static scheduling when one thread might steal tasks from another threads when it runs out of work to be done. Also removed unneeded spin lock in the mesh deform evaluation, on the first glance it seemed to be a reduction involved here but int fact threads are just adding value to the original vertex coordinates. No write access to the same element of vertexCos happens from separate threads.
2014-10-22	Meshdeform modifier: Use threaded evaluation	Sergey Sharybin
	This commit switches meshdeform modifier to use threads to evaluate the vertices positions using the central task scheduler. SO now we've got an utility function to help splitting the for loop into tasks using BLI_task module which is pretty straightforward to use: it gets range (which is an integer lower and higher bounds) and the function and userdata to be invoked for each of the iterations. The only weak point for now is the passing the data to the callback, this isn't so trivial to improve in pure C. Reviewers: campbellbarton Differential Revision: https://developer.blender.org/D838
2013-10-26	spelling: use American spelling for canceled	Campbell Barton

2013-10-12	Task scheduler ported form CYcles to C	Sergey Sharybin
	Replaces ThreadedWorker and is gonna to be used for threaded object update in the future and some more upcoming changes. But in general, it's to be used for any task based subsystem in Blender. Originally written by Brecht, with some fixes and tweaks by self.