Age | Commit message (Collapse) | Author |
|
|
|
|
|
This replaces header include guards with `#pragma once`.
A couple of include guards are not removed yet (e.g. `__RNA_TYPES_H__`),
because they are used in other places.
This patch has been generated by P1561 followed by `make format`.
Differential Revision: https://developer.blender.org/D8466
|
|
The MSVC atomic function is defined for an unsigned type.
Not sure why this became an issue after switch to TBB by default,
maybe some CFLAGS changed to be more strict after that.
|
|
When building without TBB use native to the platform spin lock
implementation. For Windows it is an atomic-based busy-wait,
for Linux it is pthreads' spin lock.
For macOS it is a mutex lock. The reason behind this is to stop
using atomics library which has been declared deprecated in SDK
version 10.12. So this changes fixes a lot of noisy warnings on
the newer SDK.
Differential Revision: https://developer.blender.org/D8180
|
|
Previously this would be enabled when threads were used, but threads are now
basically always in use so there is no point. Further, this is only needed for
guarded allocation with --debug-memory which is not performance critical.
|
|
Surrounding includes with an 'extern "C"' block is not necessary anymore.
Also that made it harder to add any C++ code to some headers, or include headers
that have "optional" C++ code like `MEM_guardedalloc.h`.
I tested compilation on linux and windows (and got help from @LazyDodo).
If this still breaks compilation due to some linker error, the header containing
the symbol in question is probably missing an 'extern "C"' block.
Differential Revision: https://developer.blender.org/D7653
|
|
This patch enables TBB as the default task scheduler. TBB stands for Threading Building Blocks and is developed by Intel. The library contains several threading patters. This patch maps blenders BLI_task_* function to their counterpart. After this patch we can add more patterns. A promising one is TBB:graph that can be used for depsgraph, draw manager and compositor.
Performance changes depends on the actual hardware. It was tested on different hardwares from laptops to workstations and we didn't detected any downgrade of the performance.
* Linux Xeon E5-2699 v4 got FPS boost from 12 to 17 using Spring's 04_010_A.anim.blend.
* AMD Ryzen Threadripper 2990WX 32-Core Animation playback goes from 9.5-10.5 FPS to 13.0-14.0 FPS on Agent 327 , 10_03_B.anim.blend.
Reviewed By: brecht, sergey
Differential Revision: https://developer.blender.org/D7475
|
|
|
|
This reverts commit 9d282d7a8d689a17ae58e94453ae99a41e91b701.
os_unfair_lock requires macOS 10.12 or newer.
|
|
OSSplinLock is a deprecated API, os_unfair_lock is its successor.
This reduces the number of warnings when building on macOS.
|
|
Apply clang format as proposed in T53211.
For details on usage and instructions for migrating branches
without conflicts, see:
https://wiki.blender.org/wiki/Tools/ClangFormat
|
|
While \file doesn't need an argument, it can't have another doxy
command after it.
|
|
Move \ingroup onto same line to be more compact and
make it clear the file is in the group.
|
|
BF-admins agree to remove header information that isn't useful,
to reduce noise.
- BEGIN/END license blocks
Developers should add non license comments as separate comment blocks.
No need for separator text.
- Contributors
This is often invalid, outdated or misleading
especially when splitting files.
It's more useful to git-blame to find out who has developed the code.
See P901 for script to perform these edits.
|
|
The idea is to make main thread and job threads to be scheduled
on CPU dies which has direct access to memory (those are NUMA
nodes 0 and 2).
We also do this for new EPYC CPUs since their NUMA nodes 1 and 3
do have access but only to a higher range DDR slots. By preferring
nodes 0 and 2 on EPYC we make it so users with partially filled
DDR slots has fast memory access.
One thing which is not really solved yet is localization of
memory allocation: we do not guarantee that memory is allocated
on the closest to the NUMA node DDR slot and hope that memory
manager of OS is acting in favor of us.
|
|
|
|
|
|
Strip unindented comment blocks - mainly headers to avoid conflicts.
|
|
- Use BLI_threadpool_ prefix for (deprecated)
thread/listbase API.
- Use BLI_thread as prefix for other functions.
See P614 to apply instead of manually resolving conflicts.
|
|
- When returning the number of items in a collection use BLI_*_len()
- Keep _size() for size in bytes.
- Keep _count() for data structures that don't store length
(hint this isn't a simple getter).
See P611 to apply instead of manually resolving conflicts.
|
|
|
|
The issue was caused by SpinLock implementation in old pthreads we ar eusing on
Windows. Using newer one (2.10-rc) demonstrates same exact behavior. But likely
using own atomics and memory barrier based implementation solves the issue.
A bit annoying that we need to change such a core part of Blender just to make
specific CPU happy, but it's better to have artists happy on all computers.
There is no expected downsides of this change, but it is so called "works for
me" category. Let's see how it all goes.
|
|
This commit contains all the changes required for most optimal maximum threads
number bump. This is needed to avoid possibly unneeded initialization or data
allocation on systems with lower threads count.
TODO: Still need to review arrays in render data structures from render_types.h,
P.S. We might remove actual bump of max threads from this patch, so when we'll
be applying the patch we can do all the preparation work and then do actual
bump of max threads.
Reviewers: mont29, campbellbarton
Reviewed By: mont29, campbellbarton
Maniphest Tasks: T43306
Differential Revision: https://developer.blender.org/D1343
|
|
|
|
Also, factorized internal code to get global mutex from its ID.
|
|
Avoids counting the whole queue when we only want to check whether it is empty or not!
|
|
Official Documentation:
http://www.blender.org/manual/render/workflows/multiview.html
Implemented Features
====================
Builtin Stereo Camera
* Convergence Mode
* Interocular Distance
* Convergence Distance
* Pivot Mode
Viewport
* Cameras
* Plane
* Volume
Compositor
* View Switch Node
* Image Node Multi-View OpenEXR support
Sequencer
* Image/Movie Strips 'Use Multiview'
UV/Image Editor
* Option to see Multi-View images in Stereo-3D or its individual images
* Save/Open Multi-View (OpenEXR, Stereo3D, individual views) images
I/O
* Save/Open Multi-View (OpenEXR, Stereo3D, individual views) images
Scene Render Views
* Ability to have an arbitrary number of views in the scene
Missing Bits
============
First rule of Multi-View bug report: If something is not working as it should *when Views is off* this is a severe bug, do mention this in the report.
Second rule is, if something works *when Views is off* but doesn't (or crashes) when *Views is on*, this is a important bug. Do mention this in the report.
Everything else is likely small todos, and may wait until we are sure none of the above is happening.
Apart from that there are those known issues:
* Compositor Image Node poorly working for Multi-View OpenEXR
(this was working prefectly before the 'Use Multi-View' functionality)
* Selecting camera from Multi-View when looking from camera is problematic
* Animation Playback (ctrl+F11) doesn't support stereo formats
* Wrong filepath when trying to play back animated scene
* Viewport Rendering doesn't support Multi-View
* Overscan Rendering
* Fullscreen display modes need to warn the user
* Object copy should be aware of views suffix
Acknowledgments
===============
* Francesco Siddi for the help with the original feature specs and design
* Brecht Van Lommel for the original review of the code and design early on
* Blender Foundation for the Development Fund to support the project wrap up
Final patch reviewers:
* Antony Riakiotakis (psy-fi)
* Campbell Barton (ideasman42)
* Julian Eisel (Severin)
* Sergey Sharybin (nazgul)
* Thomas Dinged (dingto)
Code contributors of the original branch in github:
* Alexey Akishin
* Gabriel Caraballo
|
|
2.70 for non Apple systems.
Also refactored the code that restores the previous openmp thread count.
The logic here was weird, mostly due to all the commit madness with
Apple openmp support. The restored thread count though should not depend
on the on/off state of threaded sculpting (since it has to do with
systems other than sculpting only). For OSX threads are restored to the
system thread count but Jens should recheck here.
|
|
also rename BLI_omp_thread_count -> BLI_system_thread_count_omp
|
|
- autodetect optimal default, which typically avoids HT threads
- can store setting in .blend per scene
- this does not touch general omp max threads, due i found other areas where the calculations are fitting for huge corecount
- Intel notes, some of the older generation processors with HyperThreading would not provide significant performance boost for FPU intensive applications. On those systems you might want to set OMP_NUM_THREADS = total number of cores (not total number of hardware theads).
|
|
Issue of this bug is that most part of fftw is not thread safe, only compute-intensive fftw_execute & co are.
Since smoke was affected by this issue as well, a global fftw mutex was added to BLI_threads.
Audaspace also uses fftw in one of its readers (AUD_BandPassReader.cpp),
but this is not an issue currently since this code is disabled in CMake/scons files.
There was another threading issue with smoke, we need to copy dm used by emit_from_derivedmesh(),
as it is modified by this func.
Reviewers: sergey, brecht
Reviewed By: brecht
CC: brecht
Differential Revision: https://developer.blender.org/D374
|
|
Solved by adding RW lock to BKE_vfont_to_curve.
So now all the threads are allowed to read chars from ghash,
but they'll be locked as soon as one thread would need to load
more chars from font to the ghash.
|
|
Replaces ThreadedWorker and is gonna to be used
for threaded object update in the future and
some more upcoming changes.
But in general, it's to be used for any task
based subsystem in Blender.
Originally written by Brecht, with some fixes
and tweaks by self.
|
|
- Re-arrange locks, so no actual memory allocation
(which is relatively slow) happens from inside
the lock. operation system will take care of locks
which might be needed there on it's own.
- Use spin lock instead of mutex, since it's just
list operations happens from inside lock, no need
in mutex here.
- Use atomic operations for memory in use and total
used blocks counters.
This makes guarded allocator almost the same speed
as non-guarded one in files from Tube project.
There're still MemHead/MemTail overhead which might
be bad for CPU cache utilization
|
|
There's one thing we didn't foresee from the beginning,
which is apparently TLS is only available in OSX starting
from version 10.7, and we still do support of 10.6.
After recent Brecht's changes about locked viewport
while initializing BI render this TLS is not needed
in trunk anymore. So reverting this chunk of base
iteration to use static variable. But leaving all the
other static variables warped into context still, it
should help a bit in the future refactor.
Real fix would be to have some kind of graph context
evaluation structure which would be passing to update
routines (which will solve threaded mballs update) and
making depsgraph responsible for getting a motherball.
But this is all for GSoC project.
|
|
This time issue was caused by static variables used in
BKE_scene_base_iter_next function.
Change is not so much ultimate actually, but didn't
find more clear solution for now. So the changes are:
- Wrap almost all the static variables into own context-
like structure, which is owned by the callee function
and getting passed to the iteration function.
- Recursion detection wasn't possible with such approach,
so recursion detection still uses static in_next_object
variable, but which is now stored in thread local
storage (TLS, or thread variable if this names are more
clear for you).
This makes code thread-safe, but for sure final solution
shall be completely different. Ideally, dependency graph
shall be possible to answer on question "which object is
a motherball for this metaball". This will avoid iterating
via all the bases, objects and duplis just to get needed
motherball.
Further, metaball evaluation ideally will use the same
kind of depsgraph filtering, which will get result for
question like "which objects belongs to this group of
metaballs".
But this ideal things are to be solved in Joshua's and
mind GSoC projects.
Tested on linux (gcc and clang) and windows (msvc2008),
hopefully no compilation error will happen.
Thanks to Brecht for reviewing the change and getting
feedback for other possible ways we've dicussed!
|
|
data.
Now the viewport rendering thread will lock the main thread while it is exporting
objects to render data. This is not ideal if you have big scenes that might block
the UI, but Cycles does the same, and it's fairly quick because the same evaluated
mesh can be used as for viewport drawing. It's the only way to get things stable
until the thread safe dependency graph is here.
This adds a mechanism to the job system for jobs to lock the main thread, using a
new 'ticket mutex lock' which is a mutex lock that gives priority to the first
thread that tries to lock the mutex.
Still to solve: undo/redo crashes.
|
|
Now it works for blender internal, cycles and other multithreading code in
Blender in both background and UI mode.
|
|
Added a mutex lock for smoke data access. The render was already working with a
copy of the volume data, so it's just a short lock to copy things and should not
block the UI much.
|
|
This commit makes BKE_image_acquire_ibuf referencing result, which means once
some area requested for image buffer, it'll be guaranteed this buffer wouldn't
be freed by image signal.
To de-reference buffer BKE_image_release_ibuf should now always be used.
To make referencing working correct we can not rely on result of
image_get_ibuf_threadsafe called outside from thread lock. This is so because
we need to guarantee getting image buffer from list of loaded buffers and it's
referencing happens atomic. Without lock here it is possible that between call
of image_get_ibuf_threadsafe and referencing the buffer IMA_SIGNAL_FREE would
be called. Image signal handling too is blocking now to prevent such a
situation.
Threads are locking by spinlock, which are faster than mutexes. There were some
slowdown reports in the past about render slowdown when using OSX on Xeon CPU.
It shouldn't happen with spin locks, but more tests on different hardware would
be really welcome. So far can not see speed regressions on own computers.
This commit also removes BKE_image_get_ibuf, because it was not so intuitive
when get_ibuf and acquire_ibuf should be used.
Thanks to Ton and Brecht for discussion/review :)
|
|
Replace old color pipeline which was supporting linear/sRGB color spaces
only with OpenColorIO-based pipeline.
This introduces two configurable color spaces:
- Input color space for images and movie clips. This space is used to convert
images/movies from color space in which file is saved to Blender's linear
space (for float images, byte images are not internally converted, only input
space is stored for such images and used later).
This setting could be found in image/clip data block settings.
- Display color space which defines space in which particular display is working.
This settings could be found in scene's Color Management panel.
When render result is being displayed on the screen, apart from converting image
to display space, some additional conversions could happen.
This conversions are:
- View, which defines tone curve applying before display transformation.
These are different ways to view the image on the same display device.
For example it could be used to emulate film view on sRGB display.
- Exposure affects on image exposure before tone map is applied.
- Gamma is post-display gamma correction, could be used to match particular
display gamma.
- RGB curves are user-defined curves which are applying before display
transformation, could be used for different purposes.
All this settings by default are only applying on render result and does not
affect on other images. If some particular image needs to be affected by this
transformation, "View as Render" setting of image data block should be set to
truth. Movie clips are always affected by all display transformations.
This commit also introduces configurable color space in which sequencer is
working. This setting could be found in scene's Color Management panel and
it should be used if such stuff as grading needs to be done in color space
different from sRGB (i.e. when Film view on sRGB display is use, using VD16
space as sequencer's internal space would make grading working in space
which is close to the space using for display).
Some technical notes:
- Image buffer's float buffer is now always in linear space, even if it was
created from 16bit byte images.
- Space of byte buffer is stored in image buffer's rect_colorspace property.
- Profile of image buffer was removed since it's not longer meaningful.
- OpenGL and GLSL is supposed to always work in sRGB space. It is possible
to support other spaces, but it's quite large project which isn't so
much important.
- Legacy Color Management option disabled is emulated by using None display.
It could have some regressions, but there's no clear way to avoid them.
- If OpenColorIO is disabled on build time, it should make blender behaving
in the same way as previous release with color management enabled.
More details could be found at this page (more details would be added soon):
http://wiki.blender.org/index.php/Dev:Ref/Release_Notes/2.64/Color_Management
--
Thanks to Xavier Thomas, Lukas Toene for initial work on OpenColorIO
integration and to Brecht van Lommel for some further development and code/
usecase review!
|
|
Issue was caused by threading conflict between compositor output node which
is freeing buffers used by render result image and image draw code which
could use buffers at the same time as compositor frees this buffers.
Solved by adding adding lock around viewer image invalidation and image
drawing.
Use renamed LOCK_PREVIEW mutex for this, which si not called LOCK_DRAW_IMAGE.
With new compositor locking for preview is not needed so it could be removed.
Added the same lock around viewer operation which also frees buffers used
by viewer image. It's actually quite difficult to check whether this is
indeed needed. This code seems to be using acquire/release technique, but
somehow acquiring ImBuf before invalidating it in compositor operation
doesn't resolve the issue, so probably it's not actually locking acquire
and things should be checked deeper.
|
|
|
|
waiting in scheduling and worker threads instead of continuous loops with sleep times. This should help reduce unnecessary wait times in Tile.
|
|
|
|
|
|
merge it didn't need to be) - now rendering uses its better if its threadsafe.
|
|
without this each face would get a solid color, this is the same method used in object mode.
also copy BLI_array.h fix from bmesh branch.
|
|
without the underscores these clogged up the namespace for autocompleation which was annoying.
|