Age | Commit message (Collapse) | Author |
|
|
|
Function would not clear the static scheduler pointer, which lead to
crash (mem use after free) when trying to re-init and use the task API
again. Should not happen in Blender itself, but could in other cases
(like some future gtests ;) ).
|
|
Apply clang format as proposed in T53211.
For details on usage and instructions for migrating branches
without conflicts, see:
https://wiki.blender.org/wiki/Tools/ClangFormat
|
|
|
|
|
|
|
|
This reverts the changes from ce927e1 to put the main and job threads on
node 0. The problem is that all threads created as children from these
threads will inherit the NUMA node and so will end up on the same node.
This can be fixed case-by-case by assigning the NUMA node for every child
thread, however this is difficult for external libraries and OpenMP, and
out of our control for plugins like external renderers.
|
|
While \file doesn't need an argument, it can't have another doxy
command after it.
|
|
|
|
Move \ingroup onto same line to be more compact and
make it clear the file is in the group.
|
|
BF-admins agree to remove header information that isn't useful,
to reduce noise.
- BEGIN/END license blocks
Developers should add non license comments as separate comment blocks.
No need for separator text.
- Contributors
This is often invalid, outdated or misleading
especially when splitting files.
It's more useful to git-blame to find out who has developed the code.
See P901 for script to perform these edits.
|
|
|
|
The idea is to make main thread and job threads to be scheduled
on CPU dies which has direct access to memory (those are NUMA
nodes 0 and 2).
We also do this for new EPYC CPUs since their NUMA nodes 1 and 3
do have access but only to a higher range DDR slots. By preferring
nodes 0 and 2 on EPYC we make it so users with partially filled
DDR slots has fast memory access.
One thing which is not really solved yet is localization of
memory allocation: we do not guarantee that memory is allocated
on the closest to the NUMA node DDR slot and hope that memory
manager of OS is acting in favor of us.
|
|
|
|
|
|
Strip unindented comment blocks - mainly headers to avoid conflicts.
|
|
Suggested by Percy Ross Tiglao.
|
|
- Use BLI_threadpool_ prefix for (deprecated)
thread/listbase API.
- Use BLI_thread as prefix for other functions.
See P614 to apply instead of manually resolving conflicts.
|
|
- When returning the number of items in a collection use BLI_*_len()
- Keep _size() for size in bytes.
- Keep _count() for data structures that don't store length
(hint this isn't a simple getter).
See P611 to apply instead of manually resolving conflicts.
|
|
|
|
The issue was caused by SpinLock implementation in old pthreads we ar eusing on
Windows. Using newer one (2.10-rc) demonstrates same exact behavior. But likely
using own atomics and memory barrier based implementation solves the issue.
A bit annoying that we need to change such a core part of Blender just to make
specific CPU happy, but it's better to have artists happy on all computers.
There is no expected downsides of this change, but it is so called "works for
me" category. Let's see how it all goes.
|
|
|
|
|
|
Also, factorized internal code to get global mutex from its ID.
|
|
Number of system threads is quite difficult to change without need of blender
restart, so we can cache result of the systcalls (which are not really cheap)
in order to be able to call BLI_system_thread_count() without worrying of
performance issues in that function.
Reviewers: campbellbarton
Differential Revision: https://developer.blender.org/D1342
|
|
Avoids counting the whole queue when we only want to check whether it is empty or not!
|
|
Exposed when checking on T44871
|
|
For a long time this function was only intended to be used from the main thread,
but since out implementation of parallel range (which is currently only used by
mesh deform modifier) we might want to switch to threaded alloc from object
update thread.
Now we're using spinlock around the check, which makes the code safe to be used
from all over the place.
We might consider using a bit of atomics operations magic there, but it's not so
much important for now, this code is not used in the performance critical code
path.
|
|
Official Documentation:
http://www.blender.org/manual/render/workflows/multiview.html
Implemented Features
====================
Builtin Stereo Camera
* Convergence Mode
* Interocular Distance
* Convergence Distance
* Pivot Mode
Viewport
* Cameras
* Plane
* Volume
Compositor
* View Switch Node
* Image Node Multi-View OpenEXR support
Sequencer
* Image/Movie Strips 'Use Multiview'
UV/Image Editor
* Option to see Multi-View images in Stereo-3D or its individual images
* Save/Open Multi-View (OpenEXR, Stereo3D, individual views) images
I/O
* Save/Open Multi-View (OpenEXR, Stereo3D, individual views) images
Scene Render Views
* Ability to have an arbitrary number of views in the scene
Missing Bits
============
First rule of Multi-View bug report: If something is not working as it should *when Views is off* this is a severe bug, do mention this in the report.
Second rule is, if something works *when Views is off* but doesn't (or crashes) when *Views is on*, this is a important bug. Do mention this in the report.
Everything else is likely small todos, and may wait until we are sure none of the above is happening.
Apart from that there are those known issues:
* Compositor Image Node poorly working for Multi-View OpenEXR
(this was working prefectly before the 'Use Multi-View' functionality)
* Selecting camera from Multi-View when looking from camera is problematic
* Animation Playback (ctrl+F11) doesn't support stereo formats
* Wrong filepath when trying to play back animated scene
* Viewport Rendering doesn't support Multi-View
* Overscan Rendering
* Fullscreen display modes need to warn the user
* Object copy should be aware of views suffix
Acknowledgments
===============
* Francesco Siddi for the help with the original feature specs and design
* Brecht Van Lommel for the original review of the code and design early on
* Blender Foundation for the Development Fund to support the project wrap up
Final patch reviewers:
* Antony Riakiotakis (psy-fi)
* Campbell Barton (ideasman42)
* Julian Eisel (Severin)
* Sergey Sharybin (nazgul)
* Thomas Dinged (dingto)
Code contributors of the original branch in github:
* Alexey Akishin
* Gabriel Caraballo
|
|
|
|
physical threads as in 2.70a tag
|
|
also rename BLI_omp_thread_count -> BLI_system_thread_count_omp
|
|
|
|
- autodetect optimal default, which typically avoids HT threads
- can store setting in .blend per scene
- this does not touch general omp max threads, due i found other areas where the calculations are fitting for huge corecount
- Intel notes, some of the older generation processors with HyperThreading would not provide significant performance boost for FPU intensive applications. On those systems you might want to set OMP_NUM_THREADS = total number of cores (not total number of hardware theads).
|
|
Issue of this bug is that most part of fftw is not thread safe, only compute-intensive fftw_execute & co are.
Since smoke was affected by this issue as well, a global fftw mutex was added to BLI_threads.
Audaspace also uses fftw in one of its readers (AUD_BandPassReader.cpp),
but this is not an issue currently since this code is disabled in CMake/scons files.
There was another threading issue with smoke, we need to copy dm used by emit_from_derivedmesh(),
as it is modified by this func.
Reviewers: sergey, brecht
Reviewed By: brecht
CC: brecht
Differential Revision: https://developer.blender.org/D374
|
|
See: http://clang-omp.github.io
+ fix a longstanding bad include in darwin-config
|
|
|
|
|
|
|
|
Replaces ThreadedWorker and is gonna to be used
for threaded object update in the future and
some more upcoming changes.
But in general, it's to be used for any task
based subsystem in Blender.
Originally written by Brecht, with some fixes
and tweaks by self.
|
|
- Re-arrange locks, so no actual memory allocation
(which is relatively slow) happens from inside
the lock. operation system will take care of locks
which might be needed there on it's own.
- Use spin lock instead of mutex, since it's just
list operations happens from inside lock, no need
in mutex here.
- Use atomic operations for memory in use and total
used blocks counters.
This makes guarded allocator almost the same speed
as non-guarded one in files from Tube project.
There're still MemHead/MemTail overhead which might
be bad for CPU cache utilization
|
|
data.
Now the viewport rendering thread will lock the main thread while it is exporting
objects to render data. This is not ideal if you have big scenes that might block
the UI, but Cycles does the same, and it's fairly quick because the same evaluated
mesh can be used as for viewport drawing. It's the only way to get things stable
until the thread safe dependency graph is here.
This adds a mechanism to the job system for jobs to lock the main thread, using a
new 'ticket mutex lock' which is a mutex lock that gives priority to the first
thread that tries to lock the mutex.
Still to solve: undo/redo crashes.
|
|
|
|
Now it works for blender internal, cycles and other multithreading code in
Blender in both background and UI mode.
|
|
Added a mutex lock for smoke data access. The render was already working with a
copy of the volume data, so it's just a short lock to copy things and should not
block the UI much.
|
|
|
|
to include a few more that gcc is using too.
|
|
non-threadsafe usage of guarded allocator.
Also added small chunk of code to check consistency of begin/end
threaded malloc.
All this additional checks are commented and wouldn't affect on
builds, however found them helpful to troubleshoot issues so
decided to commit it to SVN.
|
|
This commit makes BKE_image_acquire_ibuf referencing result, which means once
some area requested for image buffer, it'll be guaranteed this buffer wouldn't
be freed by image signal.
To de-reference buffer BKE_image_release_ibuf should now always be used.
To make referencing working correct we can not rely on result of
image_get_ibuf_threadsafe called outside from thread lock. This is so because
we need to guarantee getting image buffer from list of loaded buffers and it's
referencing happens atomic. Without lock here it is possible that between call
of image_get_ibuf_threadsafe and referencing the buffer IMA_SIGNAL_FREE would
be called. Image signal handling too is blocking now to prevent such a
situation.
Threads are locking by spinlock, which are faster than mutexes. There were some
slowdown reports in the past about render slowdown when using OSX on Xeon CPU.
It shouldn't happen with spin locks, but more tests on different hardware would
be really welcome. So far can not see speed regressions on own computers.
This commit also removes BKE_image_get_ibuf, because it was not so intuitive
when get_ibuf and acquire_ibuf should be used.
Thanks to Ton and Brecht for discussion/review :)
|
|
Replace old color pipeline which was supporting linear/sRGB color spaces
only with OpenColorIO-based pipeline.
This introduces two configurable color spaces:
- Input color space for images and movie clips. This space is used to convert
images/movies from color space in which file is saved to Blender's linear
space (for float images, byte images are not internally converted, only input
space is stored for such images and used later).
This setting could be found in image/clip data block settings.
- Display color space which defines space in which particular display is working.
This settings could be found in scene's Color Management panel.
When render result is being displayed on the screen, apart from converting image
to display space, some additional conversions could happen.
This conversions are:
- View, which defines tone curve applying before display transformation.
These are different ways to view the image on the same display device.
For example it could be used to emulate film view on sRGB display.
- Exposure affects on image exposure before tone map is applied.
- Gamma is post-display gamma correction, could be used to match particular
display gamma.
- RGB curves are user-defined curves which are applying before display
transformation, could be used for different purposes.
All this settings by default are only applying on render result and does not
affect on other images. If some particular image needs to be affected by this
transformation, "View as Render" setting of image data block should be set to
truth. Movie clips are always affected by all display transformations.
This commit also introduces configurable color space in which sequencer is
working. This setting could be found in scene's Color Management panel and
it should be used if such stuff as grading needs to be done in color space
different from sRGB (i.e. when Film view on sRGB display is use, using VD16
space as sequencer's internal space would make grading working in space
which is close to the space using for display).
Some technical notes:
- Image buffer's float buffer is now always in linear space, even if it was
created from 16bit byte images.
- Space of byte buffer is stored in image buffer's rect_colorspace property.
- Profile of image buffer was removed since it's not longer meaningful.
- OpenGL and GLSL is supposed to always work in sRGB space. It is possible
to support other spaces, but it's quite large project which isn't so
much important.
- Legacy Color Management option disabled is emulated by using None display.
It could have some regressions, but there's no clear way to avoid them.
- If OpenColorIO is disabled on build time, it should make blender behaving
in the same way as previous release with color management enabled.
More details could be found at this page (more details would be added soon):
http://wiki.blender.org/index.php/Dev:Ref/Release_Notes/2.64/Color_Management
--
Thanks to Xavier Thomas, Lukas Toene for initial work on OpenColorIO
integration and to Brecht van Lommel for some further development and code/
usecase review!
|