Age | Commit message (Collapse) | Author |
|
Previously there was a special extraction process for "vertex colors"
that copied the color data to the GPU with a special format. Instead,
this patch replaces this with use of the generic attribute extraction.
This reduces the number of code paths, allowing easier optimization
in the future.
To make it possible to use the generic extraction system for attributes
but also assign aliases for use by shaders, some changes are necessary.
First, the GPU material attribute can now store whether it actually refers
to the default color attribute, rather than a specific name. This replaces
the hack to use `CD_MCOL` in the color attribute shader node. Second,
the extraction code checks the names against the default and active
names and assigns aliases if the request corresponds to a special active
attribute. Finally, support for byte color attributes was added to the
generic attribute extraction.
Differential Revision: https://developer.blender.org/D15205
|
|
|
|
After this commit, all mesh data extraction and drawing code is in C++,
including headers, making it possible to use improved types for future
performance improvements and simplifications.
The only non-trivial changes are in `draw_cache_impl_mesh.cc`,
where use of certain features and macros in C necessitated larger
changes.
Differential Revision: https://developer.blender.org/D15088
|
|
This uses the recently introduced evaluator's vertex
data to smoothly interpolate original coordinates instead
of using linear interpolation.
The orcos are interpolated at the same time as positions
and as such, the specific subdivision routine for the
orco extractor has been removed. The patch evaluation
shader uses a definition to enable code specific to
orco evaluation.
Since the orco layer may not have been requested on first
render, and since orco data is now stored in the OpenSubDiv
evaluator, the evaluator needs to be recreated if an
orco layer is suddenly available. For this, a callback
to check if the evaluator has the data was added. This is
added to the evaluator as the `Subdiv` cache stored in the
modifier is invalidated less often than the Mesh batch cache
and so leads to fewer evaluator recreations.
Differential Revision: https://developer.blender.org/D14999
|
|
The mesh drawing code used a different mesh to check whether or not to
draw face dots and to actually retrieve them. The fix is moving the
responsibility of determining whether to use subsurf face dots to the
creation of `MeshRenderData` where the mesh used for drawing is
known, rather than doing it at a higher level.
Differential Revision: https://developer.blender.org/D14855
|
|
Problem is that the orco layer was not taken care of by the GPU
subdivision routines. This only handles the issues for EEVEE/Workbench.
For Cycles, this would need to be handled at the wrapper level somehow.
|
|
The lines paint mask IBO extraction was not implemented for GPU subdivision.
For it to work, we also now need to preserve the subdivision loop to
subdivision edge map, which until now was overwritten to store coarse edges
(the map to coarse edges is still preserved).
Also the paint flag stored in the 4th dimension of the loop normal buffer
was not properly set for flat shaded faces, leading to other kind of
artefacts and render issues.
|
|
|
|
Tangents are computed from UVs on the CPU side when using GPU subdivision
and require that the normals are available there as well (at least for smooth
shading, flat normals can be computed on the fly). This simply adds the missing
normals update call for the `MeshRenderData` setup for the subdivision case.
Differential Revision: https://developer.blender.org/D14278
|
|
|
|
There are two issues revealed in the bug report:
- the GPU subdivision does not support meshes with only loose geometry
- the loose geometry is not subdivided
For the first case, checks are added to ensure we still fill the
buffers with loose geometry even if no polygons are present.
For the second case, this adds
`BKE_subdiv_mesh_interpolate_position_on_edge` which encapsulates the
loose vertex interpolation mechanism previously found in
`subdiv_mesh_vertex_of_loose_edge`.
The subdivided loose geometry is stored in a new specific data structure
`DRWSubdivLooseGeom` so as to not pollute `MeshExtractLooseGeom`. These
structures store the corresponding coarse element data, which will be
used for filling GPU buffers appropriately.
Differential Revision: https://developer.blender.org/D14171
|
|
|
|
A simple case of missing the tangent VBO. The tangents are computed from
the coarse mesh, and interpolated on the GPU for the final mesh. Code for
initializing the tangents, and the vertex format for the VBO was
factored out of the coarse extraction routine, to be shared with the
subdivision routine.
|
|
|
|
The issue has two causes: on one hand origin indices were not handled
properly, on the other hand the extraction type (Mesh, BMesh, or mapped)
was not detected correctly.
For the second case reuse the MeshRenderData creation from the coarse
code path so that we make the same decisions. Loose geometry extraction
had to be updated to properly handle the BMesh cases.
For the origin indices, in some cases (for edges and faces), the arrays
used by the subdivision code already have the origin indices baked into
them, so mapping them a second time through the origin index layer is
wrong, and could cause out of bounds accesses.
For vertices especially, we would use two arrays: one for mapping
subdivision vertices to coarse vertices, and another one to map coarse
vertices to subdivision loops used for the selection index buffer. The
second one is now removed (which saves a bit of memory) as it is did not
have the proper data setup for use with the origin indices and we can
easily compute it using the first array anyway.
|
|
Use a shorter/simpler license convention, stops the header taking so
much space.
Follow the SPDX license specification: https://spdx.org/licenses
- C/C++/objc/objc++
- Python
- Shell Scripts
- CMake, GNUmakefile
While most of the source tree has been included
- `./extern/` was left out.
- `./intern/cycles` & `./intern/atomic` are also excluded because they
use different header conventions.
doc/license/SPDX-license-identifiers.txt has been added to list SPDX all
used identifiers.
See P2788 for the script that automated these edits.
Reviewed By: brecht, mont29, sergey
Ref D14069
|
|
The evaluated mesh is a result of evaluated modifiers, and referencing
other evaluated IDs such as materials.
It can not be stored in the EditMesh structure which is intended to be
re-used by many areas. Such sharing was causing ownership errors causing
bugs like
T93855: Cycles crash with edit mode and simultaneous viewport and final render
The proposed solution is to store the evaluated edit mesh and its cage in
the object's runtime field. The motivation goes as following:
- It allows to avoid ownership problems like the ones in the linked report.
- Object level is chosen over mesh level is because the evaluated mesh
is affected by modifiers, which are on the object level.
This patch allows to have modifier stack of an object which shares mesh with
an object which is in edit mode to be properly taken into account (before
the change the modifier stack from the active object will be used for all
objects which share the mesh).
There is a change in the way how copy-on-write is handled in the edit mode to
allow proper state update when changing active scene (or having two windows
with different scenes). Previously, the copt-on-write would have been ignored
by skipping tagging CoW component. Now it is ignored from within the CoW
operation callback. This allows to update edit pointers for objects which are
not from the current depsgraph and where the edit_mesh was never assigned in
the case when the depsgraph was evaluated prior the active depsgraph.
There is no user level changes changes expected with the CoW handling changes:
should not affect on neither performance, nor memory consumption.
Tested scenarios:
- Various modifiers configurations of objects sharing mesh and be part of the
same scene.
- Steps from the reports: T93855, T82952, T77359
This also fixes T76609, T72733 and perhaps other reports.
Differential Revision: https://developer.blender.org/D13824
|
|
The crash is due to the fact that GPU subdivision extraction routines
for edit data (including UVs) only worked for BMesh. However, a Mesh
based version is still needed for texture painting. This adds the
missing components. This also ensures all data are properly initialized
(at least the ones revealed by the bug).
|
|
This puts the loop over the final subdivision quads outside of the mesh
iteration callback. This can also allow for easier parallel execution in
the future if need be.
|
|
The `lines_adjacency` IBO build in the GPU subdivision case was missing
edges at the boundaries of open meshes. As it is used for the shadow
pass, the shadows were then not clipped properly.
This would also make X-Ray mode render differently in those cases.
To fix this, we can simply reuse the buffer finalization routine from the
non-subdivision case, as such edges are handled there.
|
|
This evaluator is used in order to evaluate subdivision at render time, allowing for
faster renders of meshes with a subdivision surface modifier placed at the last
position in the modifier list.
When evaluating the subsurf modifier, we detect whether we can delegate evaluation
to the draw code. If so, the subdivision is first evaluated on the GPU using our own
custom evaluator (only the coarse data needs to be initially sent to the GPU), then,
buffers for the final `MeshBufferCache` are filled on the GPU using a set of
compute shaders. However, some buffers are still filled on the CPU side, if doing so
on the GPU is impractical (e.g. the line adjacency buffer used for x-ray, whose
logic is hardly GPU compatible).
This is done at the mesh buffer extraction level so that the result can be readily used
in the various OpenGL engines, without having to write custom geometry or tesselation
shaders.
We use our own subdivision evaluation shaders, instead of OpenSubDiv's vanilla one, in
order to control the data layout, and interpolation. For example, we store vertex colors
as compressed 16-bit integers, while OpenSubDiv's default evaluator only work for float
types.
In order to still access the modified geometry on the CPU side, for use in modifiers
or transform operators, a dedicated wrapper type is added `MESH_WRAPPER_TYPE_SUBD`.
Subdivision will be lazily evaluated via `BKE_object_get_evaluated_mesh` which will
create such a wrapper if possible. If the final subdivision surface is not needed on
the CPU side, `BKE_object_get_evaluated_mesh_no_subsurf` should be used.
Enabling or disabling GPU subdivision can be done through the user preferences (under
Viewport -> Subdivision).
See patch description for benchmarks.
Reviewed By: campbellbarton, jbakker, fclem, brecht, #eevee_viewport
Differential Revision: https://developer.blender.org/D12406
|
|
Also add groups in some files.
|
|
This reverts commit e7fedf6dba5fe2ec39260943361915a6b2b8270a.
And also fix a compilation issue on windows.
Differential Revision: https://developer.blender.org/D12969
|
|
This reverts commit 03013d19d16704672f9db93bc62547651b6a5cb8.
This commit broke the windows build pretty badly and I don't
feel confident landing the fix for this without review.
Will post a possible fix in D12969 and we'll take it from there.
|
|
This adds generic attribute rendering support for meshes for Eevee and
Workbench. Each attribute is stored inside of the `MeshBufferList` as a
separate VBO, with a maximum of `GPU_MAX_ATTR` VBOs for consistency with
the GPU shader compilation code.
Since `DRW_MeshCDMask` is not general enough, attribute requests are
stored in new `DRW_AttributeRequest` structures inside of a convenient
`DRW_MeshAttributes` structure. The latter is used in a similar manner
as `DRW_MeshCDMask`, with the `MeshBatchCache` keeping track of needed,
used, and used-over-time attributes. Again, `GPU_MAX_ATTR` is used in
`DRW_MeshAttributes` to prevent too many attributes being used.
To ensure thread-safety when updating the used attributes list, a mutex
is added to the Mesh runtime. This mutex will also be used in the future
for other things when other part of the rendre pre-processing are multi-threaded.
`GPU_BATCH_VBO_MAX_LEN` was increased to 16 in order to accommodate for
this design.
Since `CD_PROP_COLOR` are a valid attribute type, sculpt vertex colors
are now handled using this system to avoid to complicate things. In the
future regular vertex colors will also use this. From this change, bit
operations for DRW_MeshCDMask are now using uint32_t (to match the
representation now used by the compiler).
Due to the difference in behavior for implicit type conversion for scalar types
between OpenGL and what users expect (a scalar `s` is converted to
`vec4(s, 0, 0, 1)` by OpenGL, vs. `vec4(s, s, s, 1)` in Blender's various node graphs) ,
all scalar types are using a float3 internally for now, which increases memory usage.
This will be resolved during or after the EEVEE rewrite as properly handling
this involves much deeper changes.
Ref T85075
Reviewed By: fclem
Maniphest Tasks: T85075
Differential Revision: https://developer.blender.org/D12969
|
|
The cache is used to fill the buffer list.
|
|
Matches the existing `MeshBatchCache`.
|
|
`MeshBufferList` is more specific and can avoid confusion with
`MeshBufferExtractionCache`.
|
|
In the draw module, it's not easy to identify what its header is, and
where the shared functions are.
So move `draw_cache_extract_mesh_extractors.c` and
`draw_cache_extract_mesh_private.h` to the same folder as the extractors
and rename these files to make them more identifiable.
Reviewed By: jbakker
Differential Revision: https://developer.blender.org/D11991
|
|
|
|
The `ibo.tris` extraction in multithread is currently only done if the
mesh has only 1 material.
Now we cache a map indicating the index of each polygon after sort and
thus allow the extraction of tris with materials in multithreaded.
As caching is a heavy operation and was already being performed in
multi-thread for triangle offsets, no significant improvements are
expected.
The benefit will be much greater when we can skip updating the cache
while transforming a geometry.
**Profiling:**
||master:|PATCH:
|---|---|---|
|large_mesh_editing_materials:|Average: 13.855380 FPS|Average: 15.525684 FPS
||rdata 9ms iter 36ms (frame 71ms)|rdata 9ms iter 29ms (frame 64ms)
|subdiv_mesh_final_only_materials:|Average: 28.113742 FPS|Average: 28.633599 FPS
||rdata 0ms iter 1ms (frame 36ms)|rdata 0ms iter 1ms (frame 35ms)
1.1x overall speedup
Differential Revision: https://developer.blender.org/D11445
|
|
This centralizes caching functions.
|
|
The `ibo.lines_paint_mask` extractor doesn't have a callback to iterate
bmesh faces, this made `filter_into` ignore the extractor.
|
|
No functional changes.
|
|
|
|
When having multiple materials in a mesh the triangles are sorted based
on material index. This sorting is done single threaded, but needs two
loops over the data. One to count the bucket size and the second one to
add the triangles to the right position in the buckets.
This patch will do the counting in a multithreaded approach that would
speed up the cache creation. It has been measured that this part is the
most blocking part of the cache creation.
Reviewed By: mano-wii
Differential Revision: https://developer.blender.org/D11615
|
|
There is an attempt to free an illegal pointer in `extract_edge_fac_finish`.
|
|
|
|
When using multiple materials in a single mesh the most time is spend in
counting the offsets of each material for the sorting.
This patch moves the counting of the offsets to render mesh data and
caches it as long as the geometry doesn't change.
This patch doesn't include multithreading of this code.
Reviewed By: mano-wii
Differential Revision: https://developer.blender.org/D11612
|
|
One drawback to trying to predict the number of threads that will be
used in the `task_graph` is that we are only sure of the number when the
threads are running.
Using `BLI_task_parallel_range` allows the driver to
choose the best thread distribution through `parallel_reduce`.
The benefit is most evident on hardware with fewer cores.
This is the result on an 4-core laptop:
||before:|after:
|---|---|---|
|large_mesh_editing:|Average: 5.203638 FPS|Average: 5.398925 FPS
||rdata 15ms iter 43ms (frame 193ms)|rdata 14ms iter 36ms (frame 187ms)
Differential Revision: https://developer.blender.org/D11558
|
|
This is an adaptation of {D11488}.
A disadvantage of manually setting the iter ranges per thread is that
we don't know how many threads are running in the background and so we
don't know how to best distribute the ranges.
To solve this limitation we can use `parallel_reduce` and thus let the
driver choose the best distribution of ranges among the threads.
This proved to be especially beneficial for computers with few cores.
**Benchmarking:**
Here's the result on an 4-core laptop:
||master:|PATCH:
|---|---|---|
|large_mesh_editing:|Average: 5.203638 FPS|Average: 5.398925 FPS
||rdata 15ms iter 43ms (frame 193ms)|rdata 14ms iter 36ms (frame 187ms)
Here's the result on an 8-core PC:
||master:|PATCH:
|---|---|---|
|large_mesh_editing:|Average: 15.267482 FPS|Average: 15.906881 FPS
||rdata 9ms iter 28ms (frame 65ms)|rdata 9ms iter 25ms (frame 63ms)
|large_mesh_editing_ledge: |Average: 15.145966 FPS|Average: 15.520474 FPS
||rdata 9ms iter 29ms (frame 65ms)|rdata 9ms iter 25ms (frame 64ms)
|looptris_test:|Average: 4.001917 FPS|Average: 4.061105 FPS
||rdata 12ms iter 90ms (frame 236ms)|rdata 12ms iter 87ms (frame 230ms)
|subdiv_mesh_cage_and_final:|Average: 1.917769 FPS|Average: 1.971790 FPS
||rdata 7ms iter 37ms (frame 261ms)|rdata 7ms iter 31ms (frame 258ms)
||rdata 7ms iter 38ms (frame 252ms)|rdata 7ms iter 33ms (frame 249ms)
|subdiv_mesh_final_only:|Average: 6.387240 FPS|Average: 6.591251 FPS
||rdata 3ms iter 25ms (frame 151ms)|rdata 3ms iter 16ms (frame 145ms)
|subdiv_mesh_final_only_ledge:|Average: 6.247393 FPS|Average: 6.596024 FPS
||rdata 3ms iter 26ms (frame 158ms)|rdata 3ms iter 16ms (frame 148ms)
**Notes:**
- The improvement can only be noticed if all extracts are multithreaded.
- This patch touches different areas of the code, so it can be split into another patch if the idea is accepted.
These screenshots show how threads behave in a quadcore:
Master:
{F10164664}
Patch:
{F10164666}
Differential Revision: https://developer.blender.org/D11558
|
|
This patch adds a specific extraction method when the mesh has only
one material. This method is multi-threaded.
There is a trade-off in this patch as the ibo isn't compressed (it adds
restart indexes for hidden faces). So it depends if threading is faster
than the additional GPU buffer upload.
# Subdivided cube
I used a cube subdivided 7 times, modifiers applied. that gives around 400000 faces.
The test is selecting some vertices and move them. During this test the next buffers are updated on each frame:
* vbo.pos_nor
* vbo.lnor
* vbo.edit_data
* ibo.tris
* ibo.points
System info:
|platform| Linux-5.11.0-7614-generic-x86_64-with-glibc2.33|
| renderer| AMD SIENNA_CICHLID (DRM 3.40.0, 5.11.0-7614-generic, LLVM 11.0.1)|
|vendor| AMD|
|version| 4.6 (Core Profile) Mesa 21.0.1|
|cpu| Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz|
|compiler| gcc version 10.3.0|
Timing have been measured using DEBUG_TIME in `draw_cache_extract_mesh`.
master: `rdata 8ms iter 45ms (frame 153ms)`
this patch `rdata 6ms iter 36ms (frame 132ms)`
Reviewed By: mano-wii
Maniphest Tasks: T88352
Differential Revision: https://developer.blender.org/D11290
|
|
Current index builder is designed to be used in a single thread.
This makes all index buffer extractions single threaded.
This patch adds a thread safe solution enabling multithreaded
building of index buffers.
To reduce locking the solution would provide a task/thread local
index buffer builder (called sub builder).
When a thread is finished this thread local index buffer builder
can be joined with the initial index buffer builder.
`GPU_indexbuf_subbuilder_init`: Initialized a sub builder. The
index list is shared between the parent and sub buffer, but the
counters are localized. Ensuring that updating counters would
not need any locking.
`GPU_indexbuf_subbuilder_finish`: merge the information of the
sub builder back to the parent builder. Needs to be invoked outside
the worker thread, or when sure that all worker threads have been
finished. Internal the function is not thread safe.
For testing purposes the extract_points extractor has been migrated to
the new API. Herefore changes to the mesh extractor were needed.
* When creating tasks, the task number of current task is stored in
ExtractTaskData including the total number of tasks.
* Adding two functions in `MeshExtract`.
** `task_init` will initialize the task specific userdata.
** `task_finish` should merge back the task specific userdata back.
* adding task_id parameter to the iteration functions so they can
access the correct task data without any need for locking.
There is no noticeable change in end user performance.
Reviewed By: mano-wii
Differential Revision: https://developer.blender.org/D11499
|
|
|
|
|
|
|
|
The `loose_lines`' ibo was not being initialized.
|
|
Some threads were always idle because of this.
|
|
|
|
|