git.blender.org/blender.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2020-08-10	Tests: move remaining gtests into their own module folders	Brecht Van Lommel
	And make them part of the blender_test runner. The one exception is blenlib performance tests, which we don't want to run by default. They remain in their own executable. Differential Revision: https://developer.blender.org/D8498
2020-07-28	Cleanup: correct usage of extern-C blocks in various places	Jacques Lucke
	This removes extern-C blocks around other includes and adds such blocks for some headers that need them.
2020-04-23	BLI: remove TaskParallelRangePool	Brecht Van Lommel
	This is not currently used and will take some work to support with TBB, so remove it until we have a new implementation based on TBB. Fixes T76005, parallel range pool tests failing. Ref D7475
2020-03-19	Cleanup: `make format` after SortedIncludes change	Dalai Felinto

2020-01-27	Fix OBJECT_GUARDED_FREE compiler error when type is in namespace	Brecht Van Lommel

2019-11-26	BLI_task: Add pooled threaded index range iterator, Take II.	Bastien Montagne
	This code allows to push a set of different operations all based on iterations over a range of indices, and then process them all at once over multiple threads. This commit also adds unit tests for both old un-pooled, and new pooled task_parallel_range family of functions, as well as some basic performances tests. This is mainly interesting for relatively low amount of individual tasks, as expected. E.g. performance tests on a 32 threads machine, for a set of 10 different tasks, shows following improvements when using pooled version instead of ten sequential calls to BLI_task_parallel_range(): \| Num Items \| Sequential \| Pooled \| Speed-up \| \| --------- \| ---------- \| ------- \| -------- \| \| 10K \| 365 us \| 138 us \| 2.5 x \| \| 100K \| 877 us \| 530 us \| 1.66 x \| \| 1000K \| 5521 us \| 4625 us \| 1.25 x \| Differential Revision: https://developer.blender.org/D6189 Note: Compared to previous commit yesterday, this reworks atomic handling in parallel iter code, and fixes a dummy double-free bug. Now we should only use the two critical values for synchronization from atomic calls results, which is the proper way to do things. Reading a value after an atomic operation does not guarantee you will get the latest value in all cases (especially on Windows release builds it seems).
2019-11-25	Revert "BLI_task: Add pooled threaded index range iterator."	Bastien Montagne
	This reverts commit f9028a3be1f77c01edca44a68894e2ba9d9cfb14. This is giving weird heisenbug crash on only Windows release builds... Reverting until we understand to issue.
2019-11-25	BLI_task: Add pooled threaded index range iterator.	Bastien Montagne
	This code allows to push a set of different operations all based on iterations over a range of indices, and then process them all at once over multiple threads. This commit also adds unit tests for both old un-pooled, and new pooled `task_parallel_range` family of functions, as well as some basic performances tests. This is mainly interesting for relatively low amount of individual tasks, as expected. E.g. performance tests on a 32 threads machine, for a set of 10 different tasks, shows following improvements when using pooled version instead of ten sequential calls to `BLI_task_parallel_range()`: \| Num Items \| Sequential \| Pooled \| Speed-up \| \| --------- \| ---------- \| ------- \| -------- \| \| 10K \| 365 us \| 138 us \| 2.5 x \| \| 100K \| 877 us \| 530 us \| 1.66 x \| \| 1000K \| 5521 us \| 4625 us \| 1.25 x \| Differential Revision: https://developer.blender.org/D6189
2019-10-30	BLI_task: Add new generic `BLI_task_parallel_iterator()`.	Bastien Montagne
	This new function is part of the 'parallel for loops' functions. It takes an iterator callback to generate items to be processed, in addition to the usual 'process' func callback. This allows to use common code from BLI_task for a wide range of custom iteratiors, whithout having to re-invent the wheel of the whole tasks & data chuncks handling. This supports all settings features from `BLI_task_parallel_range()`, including dynamic and static (if total number of items is knwon) scheduling, TLS data and its finalize callback, etc. One question here is whether we should provide usercode with a spinlock by default, or enforce it to always handle its own sync mechanism. I kept it, since imho it will be needed very often, and generating one is pretty cheap even if unused... ---------- Additionaly, this commit converts (currently unused) `BLI_task_parallel_listbase()` to use that generic code. This was done mostly as proof of concept, but performance-wise it shows some interesting data, roughly: - Very light processing (that should not be threaded anyway) is several times slower, which is expected due to more overhead in loop management code. - Heavier processing can be up to 10% quicker (probably thanks to the switch from dynamic to static scheduling, which reduces a lot locking to fill-in the per-tasks chunks of data). Similar speed-up in non-threaded case comes as a surprise though, not sure what can explain that. While this conversion is not really needed, imho we should keep it (instead of existing code for that function), it's easier to have complex handling logic in as few places as possible, for maintaining and for improving it. Note: That work was initially done to allow for D5372 to be possible... Unfortunately that one proved to be not better than orig code on performances point of view. Reviewed By: sergey Differential Revision: https://developer.blender.org/D5371
2019-06-05	GTests: BLI_task: Add basic tests for BLI_task_parallel_listbase(), and some ↵	Bastien Montagne
	performances benchmarks. Nothing special to mention about regression test itself, it basically mimics the one for `BLI_task_parallel_mempool()`... Basic performances benchmarks do not tell us much, besides the fact that for very light processing of listbase, even with 100k items, single-thread remains an order of magnitude faster than threaded code. Synchronization is just way too expensive in that case with current code. This should be partially solvable with much bigger (and configurable) chunk sizes though (current ones are just ridiculous for such cases ;) )...