Age | Commit message (Collapse) | Author |
|
This is the conventional way of dealing with unused arguments in C++,
since it works on all compilers.
Regex find and replace: `UNUSED\((\w+)\)` -> `/*$1*/`
|
|
https://wiki.blender.org/wiki/Style_Guide/C_Cpp#C.2B.2B_Type_Cast
This was discussed in https://devtalk.blender.org/t/rfc-style-guide-for-type-casts-in-c-code/25907.
|
|
|
|
|
|
In large node setup the threading overhead was sometimes very significant.
That's especially true when most nodes do very little work.
This commit improves the scheduling by not using multi-threading in many
cases unless it's likely that it will be worth it. For more details see the comments
in `BLI_lazy_threading.hh`.
Differential Revision: https://developer.blender.org/D15976
|
|
|
|
|
|
|
|
This refactors the geometry nodes evaluation system. No changes for the
user are expected. At a high level the goals are:
* Support using geometry nodes outside of the geometry nodes modifier.
* Support using the evaluator infrastructure for other purposes like field evaluation.
* Support more nodes, especially when many of them are disabled behind switch nodes.
* Support doing preprocessing on node groups.
For more details see T98492.
There are fairly detailed comments in the code, but here is a high level overview
for how it works now:
* There is a new "lazy-function" system. It is similar in spirit to the multi-function
system but with different goals. Instead of optimizing throughput for highly
parallelizable work, this system is designed to compute only the data that is actually
necessary. What data is necessary can be determined dynamically during evaluation.
Many lazy-functions can be composed in a graph to form a new lazy-function, which can
again be used in a graph etc.
* Each geometry node group is converted into a lazy-function graph prior to evaluation.
To evaluate geometry nodes, one then just has to evaluate that graph. Node groups are
no longer inlined into their parents.
Next steps for the evaluation system is to reduce the use of threads in some situations
to avoid overhead. Many small node groups don't benefit from multi-threading at all.
This is much easier to do now because not everything has to be inlined in one huge
node tree anymore.
Differential Revision: https://developer.blender.org/D15914
|
|
This potentially overallocates buffers so that they are usable
for more data types, which allows buffers to be reused more
easily. That leads to fewer separate allocations and improved
cache usage (in one of my test files the number of separate
allocations went down from 1826 to 1555).
|
|
This leads to a 5-10% performance improvement in my benchmark
that runs a procedure many times on a single element.
|
|
This makes calculation of selected indices slightly faster when the
input is a virtual array (the direct output of various nodes like
Face Area, etc). The utility can be helpful for other areas that
need to find selected indices besides field evaluation.
With the face area node used as a selection with 4 million faces,
the speedup is 3.51 ms to 3.39 ms, just a slight speedup.
Differential Revision: https://developer.blender.org/D15127
|
|
This improves performance of the procedure executor on secondary metrics
(i.e. not for the main use case when many elements are processed together,
but for the use case when a single element is processed at a time).
In my benchmark I'm measuring a 50-60% improvement:
* Procedure with a single function (executed many times): `5.8s -> 2.7s`.
* Procedure with 1000 functions (executed many times): `2.4 -> 1.0s`.
The speedup is mainly achieved in multiple ways:
* Store an `Array` of variable states, instead of a map. The array is indexed
with indices stored in each variable. This also avoids separately allocating
variable states.
* Move less data around in the scheduler and use a `Stack` instead of `Map`.
`Map` was used before because it allows for some optimizations that might
be more important in the future, but they don't matter right now (e.g. joining
execution paths that diverged earlier).
* Avoid memory allocations by giving the `LinearAllocator` some memory
from the stack.
|
|
The separate geometry and delete geometry nodes often invert the
selection so that deleting elements from a geometry can be implemented
as copying the opposite selection of elements. This should make the two
nodes faster in some cases, since the generic versions of selection
creation functions (i.e. from d3a1e9cbb92cca04e) are used instead
of the single threaded code that was used for this node.
The change also makes the deletion/separation code easier to
understand because it doesn't have to pass around the inversion.
|
|
Goals:
* Better high level control over where devirtualization occurs. There is always
a trade-off between performance and compile-time/binary-size.
* Simplify using array devirtualization.
* Better performance for cases where devirtualization wasn't used before.
Many geometry nodes accept fields as inputs. Internally, that means that the
execution functions have to accept so called "virtual arrays" as inputs. Those
can be e.g. actual arrays, just single values, or lazily computed arrays.
Due to these different possible virtual arrays implementations, access to
individual elements is slower than it would be if everything was just a normal
array (access does through a virtual function call). For more complex execution
functions, this overhead does not matter, but for small functions (like a simple
addition) it very much does. The virtual function call also prevents the compiler
from doing some optimizations (e.g. loop unrolling and inserting simd instructions).
The solution is to "devirtualize" the virtual arrays for small functions where the
overhead is measurable. Essentially, the function is generated many times with
different array types as input. Then there is a run-time dispatch that calls the
best implementation. We have been doing devirtualization in e.g. math nodes
for a long time already. This patch just generalizes the concept and makes it
easier to control. It also makes it easier to investigate the different trade-offs
when it comes to devirtualization.
Nodes that we've optimized using devirtualization before didn't get a speedup.
However, a couple of nodes are using devirtualization now, that didn't before.
Those got a 2-4x speedup in common cases.
* Map Range
* Random Value
* Switch
* Combine XYZ
Differential Revision: https://developer.blender.org/D14628
|
|
Since {rBeae36be372a6b16ee3e76eff0485a47da4f3c230} the distinction
between float and byte colors is more explicit in the ui. So far, geometry
nodes couldn't really deal with byte colors in general. This patch fixes that.
There is still only one color socket, which contains float colors. Conversion
to and from byte colors is done when read from or writing to attributes.
* Support writing to byte color attributes in Store Named Attribute node.
* Support converting to/from byte color in attribute conversion operator.
* Support propagating byte color attributes.
* Add all the implicit conversions from byte colors to the other types.
* Display byte colors as integers in spreadsheet.
Differential Revision: https://developer.blender.org/D14705
|
|
The issue was that the executor would forget about the caller provided
storage if the variable is destructed.
|
|
This improves performance e.g. when creating an integer attribute
based on an index field. For 4 million vertices, I measured a speedup
from 3.5 ms to 1.2 ms.
|
|
Value-initialization has the potential to be more efficient.
Also, the code becomes simpler.
|
|
When boolean fields are evaluated and used as selections, we create
a vector of indices. This process is currently single-threaded, but
226f0c4fef7e7792c added a more optimized multi-threaded version
of this process. It's simple to use this in the field evaluator.
I tested this with the set position node and a random
value node set to boolean mode on a Ryzen 2700x:
| | Before | After | Improvement |
| 10% Selected | 40.5 ms | 29.0 ms | 1.4x |
| 90% Selected | 115 ms | 45.3 ms | 2.5x |
In the future there could be a specialized version for non-span
virtual array selections that uses `materialize` to lower virtual
call overhead.
Differential Revision: https://developer.blender.org/D14436
|
|
This is a follow up to rB2252bc6a5527cd7360d1ccfe7a2d1bc640a8dfa6.
|
|
For more detail about `CPPType`, see `BLI_cpp_type.hh` and D14367.
Differential Revision: https://developer.blender.org/D14367
|
|
|
|
Use a shorter/simpler license convention, stops the header taking so
much space.
Follow the SPDX license specification: https://spdx.org/licenses
- C/C++/objc/objc++
- Python
- Shell Scripts
- CMake, GNUmakefile
While most of the source tree has been included
- `./extern/` was left out.
- `./intern/cycles` & `./intern/atomic` are also excluded because they
use different header conventions.
doc/license/SPDX-license-identifiers.txt has been added to list SPDX all
used identifiers.
See P2788 for the script that automated these edits.
Reviewed By: brecht, mont29, sergey
Ref D14069
|
|
This commit adds infrastructure for 8 bit signed integer attributes.
This can be useful given the discussion in T94193, where we want to
store spline type, Bezier handle type, and other small enums as
attributes.
This is only exposed in the interface in the attribute lists, so it
shouldn't be an option in geometry nodes, at least for now.
I expect that this type won't be used directly very often, it
should mostly be cast to an enum type. However, with support
for 8 bit integers, it also makes sense to add things like mixing
implementations for consistency.
Differential Revision: https://developer.blender.org/D13721
|
|
Every translation unit that included the modified headers generated
some extra code, even though it was not used. This adds unnecessary
compile time overhead and is annoying when investigating the
generated assembly.
|
|
|
|
This patch implements the vector types (i.e:`float2`) by making heavy
usage of templating. All vector functions are now outside of the vector
classes (inside the `blender::math` namespace) and are not vector size
dependent for the most part.
In the ongoing effort to make shaders less GL centric, we are aiming
to share more code between GLSL and C++ to avoid code duplication.
####Motivations:
- We are aiming to share UBO and SSBO structures between GLSL and C++.
This means we will use many of the existing vector types and others
we currently don't have (uintX, intX). All these variations were
asking for many more code duplication.
- Deduplicate existing code which is duplicated for each vector size.
- We also want to share small functions. Which means that vector
functions should be static and not in the class namespace.
- Reduce friction to use these types in new projects due to their
incompleteness.
- The current state of the `BLI_(float|double|mpq)(2|3|4).hh` is a
bit of a let down. Most clases are incomplete, out of sync with each
others with different codestyles, and some functions that should be
static are not (i.e: `float3::reflect()`).
####Upsides:
- Still support `.x, .y, .z, .w` for readability.
- Compact, readable and easilly extendable.
- All of the vector functions are available for all the vectors types
and can be restricted to certain types. Also template specialization
let us define exception for special class (like mpq).
- With optimization ON, the compiler unroll the loops and performance
is the same.
####Downsides:
- Might impact debugability. Though I would arge that the bugs are
rarelly caused by the vector class itself (since the operations are
quite trivial) but by the type conversions.
- Might impact compile time. I did not saw a significant impact since
the usage is not really widespread.
- Functions needs to be rewritten to support arbitrary vector length.
For instance, one can't call `len_squared_v3v3` in
`math::length_squared()` and call it a day.
- Type cast does not work with the template version of the `math::`
vector functions. Meaning you need to manually cast `float *` and
`(float *)[3]` to `float3` for the function calls.
i.e: `math::distance_squared(float3(nearest.co), positions[i]);`
- Some parts might loose in readability:
`float3::dot(v1.normalized(), v2.normalized())`
becoming
`math::dot(math::normalize(v1), math::normalize(v2))`
But I propose, when appropriate, to use
`using namespace blender::math;` on function local or file scope to
increase readability.
`dot(normalize(v1), normalize(v2))`
####Consideration:
- Include back `.length()` method. It is quite handy and is more C++
oriented.
- I considered the GLM library as a candidate for replacement. It felt
like too much for what we need and would be difficult to extend / modify
to our needs.
- I used Macros to reduce code in operators declaration and potential
copy paste bugs. This could reduce debugability and could be reverted.
- This touches `delaunay_2d.cc` and the intersection code. I would like
to know @howardt opinion on the matter.
- The `noexcept` on the copy constructor of `mpq(2|3)` is being removed.
But according to @JacquesLucke it is not a real problem for now.
I would like to give a huge thanks to @JacquesLucke who helped during this
and pushed me to reduce the duplication further.
Reviewed By: brecht, sergey, JacquesLucke
Differential Revision: https://developer.blender.org/D13791
|
|
Includes unwanted changes
This reverts commit 46e049d0ce2bce2f53ddc41a0dbbea2969d00a5d.
|
|
This patch implements the vector types (i.e:`float2`) by making heavy
usage of templating. All vector functions are now outside of the vector
classes (inside the `blender::math` namespace) and are not vector size
dependent for the most part.
In the ongoing effort to make shaders less GL centric, we are aiming
to share more code between GLSL and C++ to avoid code duplication.
####Motivations:
- We are aiming to share UBO and SSBO structures between GLSL and C++.
This means we will use many of the existing vector types and others
we currently don't have (uintX, intX). All these variations were
asking for many more code duplication.
- Deduplicate existing code which is duplicated for each vector size.
- We also want to share small functions. Which means that vector
functions should be static and not in the class namespace.
- Reduce friction to use these types in new projects due to their
incompleteness.
- The current state of the `BLI_(float|double|mpq)(2|3|4).hh` is a
bit of a let down. Most clases are incomplete, out of sync with each
others with different codestyles, and some functions that should be
static are not (i.e: `float3::reflect()`).
####Upsides:
- Still support `.x, .y, .z, .w` for readability.
- Compact, readable and easilly extendable.
- All of the vector functions are available for all the vectors types
and can be restricted to certain types. Also template specialization
let us define exception for special class (like mpq).
- With optimization ON, the compiler unroll the loops and performance
is the same.
####Downsides:
- Might impact debugability. Though I would arge that the bugs are
rarelly caused by the vector class itself (since the operations are
quite trivial) but by the type conversions.
- Might impact compile time. I did not saw a significant impact since
the usage is not really widespread.
- Functions needs to be rewritten to support arbitrary vector length.
For instance, one can't call `len_squared_v3v3` in
`math::length_squared()` and call it a day.
- Type cast does not work with the template version of the `math::`
vector functions. Meaning you need to manually cast `float *` and
`(float *)[3]` to `float3` for the function calls.
i.e: `math::distance_squared(float3(nearest.co), positions[i]);`
- Some parts might loose in readability:
`float3::dot(v1.normalized(), v2.normalized())`
becoming
`math::dot(math::normalize(v1), math::normalize(v2))`
But I propose, when appropriate, to use
`using namespace blender::math;` on function local or file scope to
increase readability.
`dot(normalize(v1), normalize(v2))`
####Consideration:
- Include back `.length()` method. It is quite handy and is more C++
oriented.
- I considered the GLM library as a candidate for replacement. It felt
like too much for what we need and would be difficult to extend / modify
to our needs.
- I used Macros to reduce code in operators declaration and potential
copy paste bugs. This could reduce debugability and could be reverted.
- This touches `delaunay_2d.cc` and the intersection code. I would like
to know @howardt opinion on the matter.
- The `noexcept` on the copy constructor of `mpq(2|3)` is being removed.
But according to @JacquesLucke it is not a real problem for now.
I would like to give a huge thanks to @JacquesLucke who helped during this
and pushed me to reduce the duplication further.
Reviewed By: brecht, sergey, JacquesLucke
Differential Revision: https://developer.blender.org/D13791
|
|
Reverted because the commit removes a lot of commits.
This reverts commit a2c1c368af48644fa8995ecbe7138cc0d7900c30.
|
|
This patch implements the vector types (i.e:float2) by making heavy
usage of templating. All vector functions are now outside of the vector
classes (inside the blender::math namespace) and are not vector size
dependent for the most part.
In the ongoing effort to make shaders less GL centric, we are aiming
to share more code between GLSL and C++ to avoid code duplication.
Motivations:
- We are aiming to share UBO and SSBO structures between GLSL and C++.
This means we will use many of the existing vector types and others we
currently don't have (uintX, intX). All these variations were asking
for many more code duplication.
- Deduplicate existing code which is duplicated for each vector size.
- We also want to share small functions. Which means that vector functions
should be static and not in the class namespace.
- Reduce friction to use these types in new projects due to their
incompleteness.
- The current state of the BLI_(float|double|mpq)(2|3|4).hh is a bit of a
let down. Most clases are incomplete, out of sync with each others with
different codestyles, and some functions that should be static are not
(i.e: float3::reflect()).
Upsides:
- Still support .x, .y, .z, .w for readability.
- Compact, readable and easilly extendable.
- All of the vector functions are available for all the vectors types and
can be restricted to certain types. Also template specialization let us
define exception for special class (like mpq).
- With optimization ON, the compiler unroll the loops and performance is
the same.
Downsides:
- Might impact debugability. Though I would arge that the bugs are rarelly
caused by the vector class itself (since the operations are quite trivial)
but by the type conversions.
- Might impact compile time. I did not saw a significant impact since the
usage is not really widespread.
- Functions needs to be rewritten to support arbitrary vector length. For
instance, one can't call len_squared_v3v3 in math::length_squared() and
call it a day.
- Type cast does not work with the template version of the math:: vector
functions. Meaning you need to manually cast float * and (float *)[3] to
float3 for the function calls.
i.e: math::distance_squared(float3(nearest.co), positions[i]);
- Some parts might loose in readability:
float3::dot(v1.normalized(), v2.normalized())
becoming
math::dot(math::normalize(v1), math::normalize(v2))
But I propose, when appropriate, to use
using namespace blender::math; on function local or file scope to
increase readability. dot(normalize(v1), normalize(v2))
Consideration:
- Include back .length() method. It is quite handy and is more C++
oriented.
- I considered the GLM library as a candidate for replacement.
It felt like too much for what we need and would be difficult to
extend / modify to our needs.
- I used Macros to reduce code in operators declaration and potential
copy paste bugs. This could reduce debugability and could be reverted.
- This touches delaunay_2d.cc and the intersection code. I would like to
know @Howard Trickey (howardt) opinion on the matter.
- The noexcept on the copy constructor of mpq(2|3) is being removed.
But according to @Jacques Lucke (JacquesLucke) it is not a real problem
for now.
I would like to give a huge thanks to @Jacques Lucke (JacquesLucke) who
helped during this and pushed me to reduce the duplication further.
Reviewed By: brecht, sergey, JacquesLucke
Differential Revision: http://developer.blender.org/D13791
|
|
|
|
It is common to have fields that contain a constant value. Before this
commit, such constants were represented by operation nodes which
don't have inputs. Having a special node type for constants makes
working with them a bit cheaper.
It also allows skipping some unnecessary processing when evaluating
fields, because constant fields can be detected more easily.
This commit also generalizes the concept of field node types a bit.
|
|
We often had to use two `FieldEvaluator` instances to first evaluate
the selection and then the remaining fields. Now both can be done
with a single `FieldEvaluator`. This results in less boilerplate code in
many cases.
Performance is not affected by this change. In a separate patch we
could improve performance by reusing evaluated sub-fields that are
used by the selection and the other fields.
Differential Revision: https://developer.blender.org/D13571
|
|
This was an oversight in rB7b88a4a3ba7eb9b839afa6c42d070812a3af7997.
|
|
This implements an optimization pass for multi-function procedures.
It optimizes memory reuse by moving destruct instructions up.
For more details see the in-code comment.
In very large fields with many short lived intermediate values, this change
can improve performance 3-4x. Furthermore, in such cases, peak memory
consumption is reduced significantly (e.g. 100x lower peak memory usage).
Differential Revision: https://developer.blender.org/D13548
|
|
Also move notes about where noise functions come from
into the function body as it's not relavant to the public doc-string.
|
|
This reduces the number of separate memory allocations done
by the multi-function procedure executor (which is used by the
field evaluation).
Now a linear memory allocator is used to allocate all intermediate
values. Furthermore, more buffers are reused when possible. This
reduces the total amount of allocated memory and improves
cache efficiency because the values are more likely to be in cache
already.
The performance improvement of this patch are most noticable
when few elements are processed by many functions. The situation
will improve even more with D13548, because then buffers can actually
be reused in practice. I measured up to 20% faster field evaluation
in extreme cases with this change.
|
|
Calling `foreach_field_input` on a highly nested field (we do that
often) has an exponential running time in the number of nodes.
That is because the same node may be visited many times.
This made Blender freeze on some setups that should work just fine.
Now every field keeps track of its inputs all the time. That replaces
the exponential algorithm with constant time access.
|
|
Ref T92709
|
|
Most of our field inputs are currently specific to geometry. This patch introduces
a new `GeometryFieldInput` that reduces the overhead of adding new geometry
field input.
Differential Revision: https://developer.blender.org/D13489
|
|
For some underlying data (e.g. spans) we had two virtual array
implementations. One for the mutable and one for the immutable
case. Now that most code does not deal with the virtual array
implementations directly anymore (since rBrBd4c868da9f97a),
we can get away with sharing one implementation for both cases.
This means that we have to do a `const_cast` in a few places, but
this is an implementation detail that does not leak into "user code"
(only when explicitly casting a `VArrayImpl` to a `VMutableArrayImpl`,
which should happen nowhere).
|
|
Forgot to actually slice the span in rB6b5e1cfacab4c4605ec2d7bfef360389afe849be.
|
|
Previously, there was a fixed grain size for all multi-functions. That was
not sufficient because some functions could benefit a lot from smaller
grain sizes.
This refactors adds a new `MultiFunction::call_auto` method which has the
same effect as just calling `MultiFunction::call` but additionally figures
out how to execute the specific multi-function efficiently. It determines
a good grain size and decides whether the mask indices should be shifted
or not.
Most multi-function evaluations benefit from this, but medium sized work
loads (1000 - 50000 elements) benefit from it the most. Especially when
expensive multi-functions (e.g. noise) is involved. This is because for
smaller work loads, threading is rarely used and for larger work loads
threading worked fine before already.
With this patch, multi-functions can specify execution hints, that allow
the caller to execute it most efficiently. These execution hints still
have to be added to more functions.
Some performance measurements of a field evaluation involving noise and
math nodes, ordered by the number of elements being evaluated:
```
1,000,000: 133 ms -> 120 ms
100,000: 30 ms -> 18 ms
10,000: 20 ms -> 2.7 ms
1,000: 4 ms -> 0.5 ms
100: 0.5 ms -> 0.4 ms
```
|
|
Under some circumstances that can lead to more than a 2x
performance increase, because math nodes can better optimize
for the case when the slice is a single value or span.
|
|
Previously, `GVArray::ForSingle` would always allocate a copy of the passed
in value. Now it only does so when the value is too large or not trivial.
|
|
Those were not implemented consistently and don't really help in practice.
|
|
Currently the geometry nodes evaluator always stores a field for every
type that supports it, even if it is just a single value. This results in a lot
of overhead when there are many sockets that just contain a single
value, which is often the case.
This introduces a new `ValueOrField<T>` type that is used by the geometry
nodes evaluator. Now a field will only be created when it is actually
necessary. See D13307 for more details. In extrem cases this can speed
up the evaluation 2-3x (those cases are probably never hit in practice
though, but it's good to get rid of unnecessary overhead nevertheless).
Differential Revision: https://developer.blender.org/D13307
|
|
The idea behind this change is the same as in
rB6ee2abde82ef121cd6e927995053ac33afdbb438.
A `MultiFunction::debug_parameter_name` method could be
added separately when necessary.
|