Age | Commit message (Collapse) | Author |
|
|
|
|
|
This evaluator is used in order to evaluate subdivision at render time, allowing for
faster renders of meshes with a subdivision surface modifier placed at the last
position in the modifier list.
When evaluating the subsurf modifier, we detect whether we can delegate evaluation
to the draw code. If so, the subdivision is first evaluated on the GPU using our own
custom evaluator (only the coarse data needs to be initially sent to the GPU), then,
buffers for the final `MeshBufferCache` are filled on the GPU using a set of
compute shaders. However, some buffers are still filled on the CPU side, if doing so
on the GPU is impractical (e.g. the line adjacency buffer used for x-ray, whose
logic is hardly GPU compatible).
This is done at the mesh buffer extraction level so that the result can be readily used
in the various OpenGL engines, without having to write custom geometry or tesselation
shaders.
We use our own subdivision evaluation shaders, instead of OpenSubDiv's vanilla one, in
order to control the data layout, and interpolation. For example, we store vertex colors
as compressed 16-bit integers, while OpenSubDiv's default evaluator only work for float
types.
In order to still access the modified geometry on the CPU side, for use in modifiers
or transform operators, a dedicated wrapper type is added `MESH_WRAPPER_TYPE_SUBD`.
Subdivision will be lazily evaluated via `BKE_object_get_evaluated_mesh` which will
create such a wrapper if possible. If the final subdivision surface is not needed on
the CPU side, `BKE_object_get_evaluated_mesh_no_subsurf` should be used.
Enabling or disabling GPU subdivision can be done through the user preferences (under
Viewport -> Subdivision).
See patch description for benchmarks.
Reviewed By: campbellbarton, jbakker, fclem, brecht, #eevee_viewport
Differential Revision: https://developer.blender.org/D12406
|
|
Differential Revision: https://developer.blender.org/D13657
|
|
Explains more what it does, not how it is used.
|
|
This patch adds shader compilation tests for the basic engine in `shaders_test.cc`
Addresses T92701
Reviewed By: jbakker
Differential Revision: https://developer.blender.org/D13066
|
|
Added namespace blender::draw::image_engine. Code is still C-a-like.
Only changed the obvious code style to CPP (structs, nullptr, casts).
|
|
This reverts commit e7fedf6dba5fe2ec39260943361915a6b2b8270a.
And also fix a compilation issue on windows.
Differential Revision: https://developer.blender.org/D12969
|
|
This reverts commit 03013d19d16704672f9db93bc62547651b6a5cb8.
This commit broke the windows build pretty badly and I don't
feel confident landing the fix for this without review.
Will post a possible fix in D12969 and we'll take it from there.
|
|
This adds generic attribute rendering support for meshes for Eevee and
Workbench. Each attribute is stored inside of the `MeshBufferList` as a
separate VBO, with a maximum of `GPU_MAX_ATTR` VBOs for consistency with
the GPU shader compilation code.
Since `DRW_MeshCDMask` is not general enough, attribute requests are
stored in new `DRW_AttributeRequest` structures inside of a convenient
`DRW_MeshAttributes` structure. The latter is used in a similar manner
as `DRW_MeshCDMask`, with the `MeshBatchCache` keeping track of needed,
used, and used-over-time attributes. Again, `GPU_MAX_ATTR` is used in
`DRW_MeshAttributes` to prevent too many attributes being used.
To ensure thread-safety when updating the used attributes list, a mutex
is added to the Mesh runtime. This mutex will also be used in the future
for other things when other part of the rendre pre-processing are multi-threaded.
`GPU_BATCH_VBO_MAX_LEN` was increased to 16 in order to accommodate for
this design.
Since `CD_PROP_COLOR` are a valid attribute type, sculpt vertex colors
are now handled using this system to avoid to complicate things. In the
future regular vertex colors will also use this. From this change, bit
operations for DRW_MeshCDMask are now using uint32_t (to match the
representation now used by the compiler).
Due to the difference in behavior for implicit type conversion for scalar types
between OpenGL and what users expect (a scalar `s` is converted to
`vec4(s, 0, 0, 1)` by OpenGL, vs. `vec4(s, s, s, 1)` in Blender's various node graphs) ,
all scalar types are using a float3 internally for now, which increases memory usage.
This will be resolved during or after the EEVEE rewrite as properly handling
this involves much deeper changes.
Ref T85075
Reviewed By: fclem
Maniphest Tasks: T85075
Differential Revision: https://developer.blender.org/D12969
|
|
|
|
These 2 large tables, `areaTexBytes` and `searchTexBytes`, contributed
~176kb worth of duplicate data into the `blender` executable due to the
header being used in multiple places. We were lucky that only 2
translation units had included this header so only 1 duplicate copy of
each was wasted.
Define the tables as `extern` to address this.
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D12723
|
|
This is a necessary step for EEVEE's new arch. This moves more data
to the draw manager. This makes it easier to have the render or draw
engines manage their own data.
This makes more sense and cleans-up what the GPUViewport holds
Also rewrites the Texture pool manager to be in C++.
This also move the DefaultFramebuffer/TextureList and the engine related
data to a new `DRWViewData` struct. This struct manages the per view
(as in stereo view) engine data.
There is a bit of cleanup in the way the draw manager is setup.
We now use a temporary DRWData instead of creating a dummy viewport.
Development: fclem, jbakker
Differential Revision: https://developer.blender.org/D11966
|
|
Allow the UDIM grid to be shown and adjusted when there are images
loaded in UV edit mode. Right now the grid feature disappears once an
image is loaded and many have found this to be confusing.
Based on community and artist feedback, there was support to change this
behavior[1]
This patch does the following:
- Allows the grid to be shown even when images are present
- The max allowable dimensions for the grid has been increased from
10x10 to 10x100 to match the underlying maximum UDIM range that blender
supports
Note: This should not affect other Image editor modes like Paint/Mask or
the Render Result viewer etc. Future work in this area is currently
documented in a dedicated design task[2]
[1] https://devtalk.blender.org/t/the-udim-tile-grid-design-and-feedback-thread/20136
[2] https://developer.blender.org/T90913
Differential Revision: https://developer.blender.org/D11860
|
|
|
|
This is a simple engine used only to debug the texture of select ids.
It is only used when the `WITH_DRAW_DEBUG` option is enabled and the
debug value is 31.
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D5490
|
|
In the draw module, it's not easy to identify what its header is, and
where the shared functions are.
So move `draw_cache_extract_mesh_extractors.c` and
`draw_cache_extract_mesh_private.h` to the same folder as the extractors
and rename these files to make them more identifiable.
Reviewed By: jbakker
Differential Revision: https://developer.blender.org/D11991
|
|
|
|
Makes code cleaner and easier to find.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This implements T87633
This overlay renders a flash animation on the target object when
transfering the mode to it using the mode transfer operator.
This provides visual feedback when switching between objects without
extra overlays that affect the general color and lighting in the scene.
Differences with the design task:
- This uses just a fade out animation instead of a fade in/out animation.
The code is ready for fade in/out, but as the rest of the overlays
(face sets, masks...) change instantly without animation, having a fade
in/out effect gives the impression that the object flashes twice (once
for the face sets, twice for the peak alpha of the flash animation).
- The rendering uses a flat color without fresnel for now, but this can
be improved in the future to make it look more like the shader in the
prototype.
- Not enabled by default (can be enabled in the overlays panel), maybe
the defaults can change for 3.0 to disable fade inactive and enable this
instead.
Reviewed By: jbakker, JulienKaspar
Differential Revision: https://developer.blender.org/D11055
|
|
More cleanups will come to make this more CPP-like.
|
|
draw_cache_extract_mesh for task scheduling. Will be refactored to draw_cache_extract_mesh_scheduling later on after migrating to CPP.
draw_cache_extract_mesh_render_data extraction of mesh render data from edit mesh/mesh into a more generic structure.
draw_cache_extract_mesh_extractors containing all the extractors. This will be split up further into a single file per extractor.
|
|
This patch will use compute shaders to create the VBO for hair.
The previous implementation uses transform feedback.
Timings before: between 0.000069s and 0.000362s.
Timings after: between 0.000032s and 0.000092s.
Speedup isn't noticeable by end-users. The patch is used to test
the new compute shader pipeline and integrate it with the draw
manager. Allowing EEVEE, Workbench and other draw engines to
use compute shaders with the introduction of `DRW_shgroup_call_compute`
and `DRW_shgroup_vertex_buffer`.
Future improvements are possible by generating the index buffer
of hair directly on the GPU.
NOTE: that compute shaders aren't supported by Apple and still use
the transform feedback workaround.
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D11057
|
|
This patch adds relatively small changes to the curve draw
cache implementation in order to draw the curve data in the
viewport. The dependency graph iterator is also modified
so that it iterates over the curve geometry component, which
is presented to users as `Curve` data with a pointer to the
`CurveEval`
The idea with the spline data type in geometry nodes is that
curve data itself is only the control points, and any evaluated
data with faces is a mesh. That is mostly expected elsewhere in
Blender anyway. This means it's only necessary to implement
wire edge drawing of `CurveEval` data.
Adding a `CurveEval` pointer to `Curve` is in line with changes
I'd like to make in the future like using `CurveEval` in more places
such as edit mode.
An alternate solution involves converting the curve wire data
to a mesh, however, that requires copying all of the data, and
since avoiding it is rather simple and is in-line with future plans
anyway, I think doing it this way is better.
Differential Revision: https://developer.blender.org/D11351
|
|
This reverts commit 8f9599d17e80254928d2d72081a4c7e0dee64038.
Mac seems to have an error with this change.
```
ERROR: /Users/blender/git/blender-vdev/blender.git/source/blender/draw/intern/draw_hair.c:115:44: error: use of undeclared identifier 'shader_src'
ERROR: /Users/blender/git/blender-vdev/blender.git/source/blender/draw/intern/draw_hair.c:123:13: error: use of undeclared identifier 'shader_src'
ERROR: make[2]: *** [source/blender/draw/CMakeFiles/bf_draw.dir/intern/draw_hair.c.o] Error 1
ERROR: make[1]: *** [source/blender/draw/CMakeFiles/bf_draw.dir/all] Error 2
ERROR: make: *** [all] Error 2
```
|
|
This patch will use compute shaders to create the VBO for hair.
The previous implementation uses tranform feedback.
Timings master (transform feedback with GPU_USAGE_STATIC between 0.000069s and 0.000362s
Timings transform feedback with GPU_USAGE_DEVICE_ONLY. between 0.000057s and 0.000122s
Timings compute shader between 0.000032 and 0.000092s
Future improvements:
* Generate hair Index buffer using compute shaders: currently done single threaded on CPU, easy to add as compute shader.
Reviewed By: fclem
Differential Revision: https://developer.blender.org/D11057
|
|
I'd like to use this file to draw curves from geometry nodes, which
would otherwise require implementing a C API. The changes in this
commit are minimal, mostly just casts and changing to nullptr.
Differential Revision: https://developer.blender.org/D11350
|
|
DrawTest will be used by other tests as well.
|
|
This split is to make code easier to manage and rename the files to
`effect_reflection_*` to avoid confusion.
Also this cleans up a bit of the branching mess in the trace shader.
|
|
|
|
This modifies the principled BSDF and the Glass BSDF which now
have better fit to multiscatter GGX.
Code to generate the LUT have been updated and can run at runtime.
The refraction LUT has been changed to have the critical angle always
centered around one pixel so that interpolation can be mitigated.
Offline LUT data will be updated in another commit
This simplify the BTDF retreival removing the manual clean cut at
low roughness. This maximize the precision of the LUT by scalling
the sides by the critical angle.
I also touched the ior > 1.0 approximation to be smoother.
Also incluse some cleanup of bsdf_sampling.glsl
|
|
This refactor was needed for some reasons:
- closure_lit_lib.glsl was unreadable and could not be easily extended to use new features.
- It was generating ~5K LOC for any shader. Slowing down compilation.
- Some calculations were incorrect and BSDF/Closure code had lots of workaround/hacks.
What this refactor does:
- Add some macros to define the light object loops / eval.
- Clear separation between each closures which now have separate files. Each closure implements the eval functions.
- Make principled BSDF a bit more correct in some cases (specular coloring, mix between glass and opaque).
- The BSDF term are applied outside of the eval function and on the whole lighting (was separated for lights before).
- Make light iteration last to avoid carrying more data than needed.
- Makes sure that all inputs are within correct ranges before evaluating the closures (use `safe_normalize` on normals).
- Making each BSDF isolated means that we might carry duplicated data (normals for instance) but this should be optimized by compilers.
- Makes Translucent BSDF its own closure type to avoid having to disable raytraced shadows using hacks.
- Separate transmission roughness is now working on Principled BSDF.
- Makes principled shader variations using constants. Removing a lot of duplicated code. This needed `const` keyword detection in `gpu_material_library.c`.
- SSR/SSS masking and data loading is a bit more consistent and defined outside of closure eval. The loading functions will act as accumulator if the lighting is not to be separated.
- SSR pass now do a full deferred lighting evaluation, including lights, in order to avoid interference with the closure eval code. However, it seems that the cost of having a global SSR toggle uniform is making the surface shader more expensive (which is already the case, by the way).
- Principle fully black specular tint now returns black instead of white.
- This fixed some artifact issue on my AMD computer on normal surfaces (which might have been some uninitialized variables).
- This touched the Ambient Occlusion because it needs to be evaluated for each closure. But to avoid the cost of this, we use another approach to just pass the result of the occlusion on interpolated normals and modify it using the bent normal for each Closure. This tends to reduce shadowing. I'm still looking into improving this but this is out of the scope of this patch.
- Performance might be a bit worse with this patch since it is more oriented towards code modularity. But not by a lot.
Render tests needs to be updated after this.
Reviewed By: jbakker
Differential Revision: https://developer.blender.org/D10390
# Conflicts:
# source/blender/draw/engines/eevee/eevee_shaders.c
# source/blender/draw/engines/eevee/shaders/common_utiltex_lib.glsl
# source/blender/draw/intern/shaders/common_math_lib.glsl
|
|
This is a complete refactor over the old system. The goal was to increase quality
first and then have something more flexible and optimised.
|{F9603145} | {F9603142}|{F9603147}|
This fixes issues we had with the old system which were:
- Too much overdraw (low performance).
- Not enough precision in render targets (hugly color banding/drifting).
- Poor resolution near in-focus regions.
- Wrong support of orthographic views.
- Missing alpha support in viewport.
- Missing bokeh shape inversion on foreground field.
- Issues on some GPUs. (see T72489) (But I'm sure this one will have other issues as well heh...)
- Fix T81092
I chose Unreal's Diaphragm DOF as a reference / goal implementation.
It is well described in the presentation "A Life of a Bokeh" by Guillaume Abadie.
You can check about it here https://epicgames.ent.box.com/s/s86j70iamxvsuu6j35pilypficznec04
Along side the main implementation we provide a way to increase the quality by jittering the
camera position for each sample (the ones specified under the Sampling tab).
The jittering is dividing the actual post processing dof radius so that it fills the undersampling.
The user can still add more overblur to have a noiseless image, but reducing bokeh shape sharpness.
Effect of overblur (left without, right with):
| {F9603122} | {F9603123}|
The actual implementation differs a bit:
- Foreground gather implementation uses the same "ring binning" accumulator as background
but uses a custom occlusion method. This gives the problem of inflating the foreground elements
when they are over background or in-focus regions.
This is was a hard decision but this was preferable to the other method that was giving poor
opacity masks for foreground and had other more noticeable issues. Do note it is possible
to improve this part in the future if a better alternative is found.
- Use occlusion texture for foreground. Presentation says it wasn't really needed for them.
- The TAA stabilisation pass is replace by a simple neighborhood clamping at the reduce copy
stage for simplicity.
- We don't do a brute-force in-focus separate gather pass. Instead we just do the brute force
pass during resolve. Using the separate pass could be a future optimization if needed but
might give less precise results.
- We don't use compute shaders at all so shader branching might not be optimal. But performance
is still way better than our previous implementation.
- We mainly rely on density change to fix all undersampling issues even for foreground (which
is something the reference implementation is not doing strangely).
Remaining issues (not considered blocking for me):
- Slight defocus stability: Due to slight defocus bruteforce gather using the bare scene color,
highlights are dilated and make convergence quite slow or imposible when using jittered DOF
(or gives )
- ~~Slight defocus inflating: There seems to be a 1px inflation discontinuity of the slight focus
convolution compared to the half resolution. This is not really noticeable if using jittered
camera.~~ Fixed
- Foreground occlusion approximation is a bit glitchy and gives incorrect result if the
a defocus foreground element overlaps a farther foreground element. Note that this is easily
mitigated using the jittered camera position.
|{F9603114}|{F9603115}|{F9603116}|
- Foreground is inflating, not revealing background. However this avoids some other bugs too
as discussed previously. Also mitigated with jittered camera position.
|{F9603130}|{F9603129}|
- Sensor vertical fit is still broken (does not match cycles).
- Scattred bokeh shapes can be a bit strange at polygon vertices. This is due to the distance field
stored in the Bokeh LUT which is not rounded at the edges. This is barely noticeable if the
shape does not rotate.
- ~~Sampling pattern of the jittered camera position is suboptimal. Could try something like hammersley
or poisson disc distribution.~~Used hexaweb sampling pattern which is not random but has better
stability and overall coverage.
- Very large bokeh (> 300 px) can exhibit undersampling artifact in gather pass and quite a bit of
bleeding. But at this size it is preferable to use jittered camera position.
Codewise the changes are pretty much self contained and each pass are well documented.
However the whole pipeline is quite complex to understand from bird's-eye view.
Notes:
- There is the possibility of using arbitrary bokeh texture with this implementation.
However implementation is a bit involved.
- Gathering max sample count is hardcoded to avoid to deal with shader variations. The actual
max sample count is already quite high but samples are not evenly distributed due to the
ring binning method.
- While this implementation does not need 32bit/channel textures to render correctly it does use
many other textures so actual VRAM usage is higher than previous method for viewport but less
for render. Textures are reused to avoid many allocations.
- Bokeh LUT computation is fast and done for each redraw because it can be animated. Also the
texture can be shared with other viewport with different camera settings.
|
|
The mask overlay wasn't part of the overlay engine. The reasoning nehind
this was that more editors used the mask overlay and most of them used
old drawing code. This patch adds the mask overlay drawing to the draw
overlay engine. This code path will only be used by the image editor
VSE, Compositor and Movie Clip editor will still use the previous
method.
During this patch some alternatives have been researched:
1. `ED_mask_draw_region`: this would lead to different code paths when
drawing in the image editor, and some hacks to retrieve the correct
framebuffer.
2. Add mask drawing to image engine: Would lead to incorrect color
management when viewing the mask.
3. Add mask drawing to image engine and overlay engine: Would lead to
duplicated code.
4. Add mask drawing to overlay engine and for combined overlay select
the correct framebuffer.
Option 4 was chosen as the exception (switching framebuffers) can be
done without hacks. The code stays clean.
|
|
Cryptomatte is a standard to efficiently create mattes for compositing. The
renderer outputs the required render passes, which can then be used in the
compositor to create masks for specified objects. Unlike the Material and Object
Index passes, the objects to isolate are selected in compositing, and mattes
will be anti-aliased.
Cryptomatte was already available in Cycles this patch adds it to the EEVEE
render engine. Original specification can be found at
https://raw.githubusercontent.com/Psyop/Cryptomatte/master/specification/IDmattes_poster.pdf
**Accurate mode**
Following Cycles, there are two accuracy modes. The difference between the two
modes is the number of render samples they take into account to create the
render passes. When accurate mode is off the number of levels is used. When
accuracy mode is active, the number of render samples is used.
**Deviation from standard**
Cryptomatte specification is based on a path trace approach where samples and
coverage are calculated at the same time. In EEVEE a sample is an exact match on
top of a prepared depth buffer. Coverage is at that moment always 1. By sampling
multiple times the number of surface hits decides the actual surface coverage
for a matte per pixel.
**Implementation Overview**
When drawing to the cryptomatte GPU buffer the depth of the fragment is matched
to the active depth buffer. The hashes of each cryptomatte layer is written in
the GPU buffer. The exact layout depends on the active cryptomatte layers. The
GPU buffer is downloaded and integrated into an accumulation buffer (stored in
CPU RAM).
The accumulation buffer stores the hashes + weights for a number of levels,
layers per pixel. When a hash already exists the weight will be increased. When
the hash doesn't exists it will be added to the buffer.
After all the samples have been calculated the accumulation buffer is processed.
During this phase the total pixel weights of each layer is mapped to be in a
range between 0 and 1. The hashes are also sorted (highest weight first).
Blender Kernel now has a `BKE_cryptomatte` header that access to common
functions for cryptomatte. This will in the future be used by the API.
* Alpha blended materials aren't supported. Alpha blended materials support in
render passes needs research how to implement it in a maintainable way for any
render pass.
This is a list of tasks that needs to be done for the same release that this
patch lands on (Blender 2.92)
* T82571 Add render tests.
* T82572 Documentation.
* T82573 Store hashes + Object names in the render result header.
* T82574 Use threading to increase performance in accumulation and post
processing.
* T82575 Merge the cycles and EEVEE settings as they are identical.
* T82576 Add RNA to extract the cryptomatte hashes to use in python scripts.
Reviewed By: Clément Foucault
Maniphest Tasks: T81058
Differential Revision: https://developer.blender.org/D9165
|
|
|
|
The clone tool in the image editor can show a second texture on top
of the image. This wasn't ported and now results into alpha and depth
issues. This fix adds the clone tool drawing to the overlay engine.
Reviewed By: Clément Foucault
Differential Revision: https://developer.blender.org/D9352
|
|
|
|
Move headers files from `render/extern/` to `render/`
Part of T73586
|
|
Similar to previous commit, missing build dependency.
|
|
Ref T76372.
|