Age | Commit message (Collapse) | Author |
|
|
|
This fixes building with MSVC (and older GCC versions) after
3e7886db54d0cb3ce32909c71ad2a8c9d9eab223.
|
|
'-fvisibility=hidden' only applies to definitions, not declarations,
so the compiler has to be conservative about how references to global
data symbols are performed.
Explicitly specifying the visibility allows for better code generation.
|
|
|
|
Increasing a reference counter only requires atomicity, but not
ordering or synchronization.
|
|
|
|
|
|
failed to decode
|
|
|
|
(To be used alongside --filmgrain.)
Addresses part of #310.
|
|
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
Addresses part of #310.
|
|
Section 6.4.2 (Color config semantics) of the AV1 spec says:
If matrix_coefficients is equal to MC_IDENTITY, it is a requirement of
bitstream conformance that subsampling_x is equal to 0 and
subsampling_y is equal to 0.
Add Dav1dSettings.strict_std_compliance flag which, when set, allows
aborting decoding when such standard-compliance violations fail, even
though they don't affect decoding. In CLI, this flag can be accessed
using -strict.
|
|
This change is motivated by a desire to be able to toggle between CPU
and GPU film gain synthesis in players such as VLC. Because VLC
initializes the codec before the vout (and, indeed, the active vout
module may change in the middle of decoding), it cannot make the
decision of whether to apply film grain in libdav1d as part of codec
initialization. It needs to be decided on a frame-by-frame basis
depending on whether the currently active vout supports film grain
synthesis or not.
Using the new API, users like VLC can simply set `apply_grain` to 0 and
then manually call `dav1d_apply_grain` whenever the vout does not
support GPU film grain synthesis. As a side note, `dav1d_apply_grain`
could also technically be called from dedicated worker threads,
something that libdav1d does not currently do internally.
The alternative to this solution would have been to allow changing
Dav1dSettings at runtime, but that would be more invasive and a proper
API would also need to take other settings into consideration, some of
which can't be changed as easily as `apply_grain`. This commit
represents a stop-gap solution.
Bump the minor version to allow clients to depend on this API.
|
|
|
|
Supports Linux, MacOS, and Windows.
|
|
|
|
|
|
Merges the 3 threading parameters into a single `--threads=` argument.
Frame threading can still be controlled via the `--framedelay=` argument.
Internally, the threading model is now a global thread/task pool design.
Co-authored-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
Reduces size from 16B to 12B, while maintaining a 4-byte alignment.
|
|
Helps differentiating actual errors in the buffer data or passed-in arguments
with scenarios like empty buffers or containing OBUs other than Sequence
Header.
|
|
We use the 'noinline' attribute in order to reduce code size, but that
doesn't prevent gcc from cloning the function, which is something that
goes against the purpose of preventing inlining in the first place.
Adding the 'noclone' attribute reduces the (stripped) binary size by
around 45 kB on x86-64.
|
|
|
|
And a function to fetch them. Should be useful to signal changes in the
bitstream the user may want to know about.
Starting with two flags, DAV1D_EVENT_FLAG_NEW_SEQUENCE and
DAV1D_EVENT_FLAG_NEW_OP_PARAMS_INFO, which signal the presence of an updated
sequence header in the last returned (or to be returned) picture.
|
|
A static_assert is used if available, otherwise a custom
construct.
|
|
|
|
We currently run 'git describe --match' to obtain the current version,
but meson doesn't properly quote/escape the pattern string on Windows.
As a result, "fatal: Not a valid object name .ninja_log" is printed
when compiling on Windows systems. Compilation still works, but the
warning is annoying and misleading.
Currently we don't actually need the pattern matching functionality
(which is why things still work), so simply remove it as a workaround.
|
|
Should make the code more readable.
|
|
Replace checks for INTER or SWITCH frames with a simple macro for
increased readability and maintainability.
|
|
|
|
Add buffer pools for miscellaneous smaller buffers that are
repeatedly being freed and reallocated.
Also improve dav1d_ref_create() by consolidating two separate
memory allocations into a single one.
|
|
Makes !1078 redundant.
|
|
Errors on C11 features like anonymous strucs/unions.
|
|
|
|
Also changes the type intptr_t to make adding variable size members more
convenient.
|
|
|
|
A bitstream may contain values larger than the currently defined
entries, but it's technically UB to put such values into an enum.
Discovered in Firefox through fuzzing with UBSan.
|
|
Memory sanitizer depends on compiler instrumentation which makes it
inherently incompatible with asm DSP functions. Refs #336
|
|
Also, the assertion that 'align' is a power of 2 can be used by all
cases in dav1d_alloc_aligned().
|
|
The callback function may be NULL, not Dav1dSettings field.
|
|
|
|
The description was being added only to the last field of each line by Doxygen.
|
|
This only supports 10 bpc, not 12 bpc, as the sum and tmp buffers can
be int16_t for 10 bpc, but need to be int32_t for 12 bpc.
Make actual templates out of the functions in looprestoration_tmpl.S,
and add box3/5_h to looprestoration16.S.
Extend dav1d_sgr_calc_abX_neon with a mandatory bitdepth_max parameter
(which is passed even in 8bpc mode), add a define to bitdepth.h for
passing such a parameter in all modes. This makes this function
a few instructions slower in 8bpc mode than it was before (overall impact
seems to be around 1% of the total runtime of SGR), but allows using the
same actual function instantiation for all modes, saving a bit of code
size.
Examples of checkasm runtimes:
Cortex A53 A72 A73
selfguided_3x3_10bpc_neon: 516755.8 389412.7 349058.7
selfguided_5x5_10bpc_neon: 380699.9 293486.6 254591.6
selfguided_mix_10bpc_neon: 878142.3 667495.9 587844.6
Corresponding 8 bpc numbers for comparison:
selfguided_3x3_8bpc_neon: 491058.1 361473.4 347705.9
selfguided_5x5_8bpc_neon: 352655.0 266423.7 248192.2
selfguided_mix_8bpc_neon: 826094.1 612372.2 581943.1
|
|
The ar_coeff_shift element needs to have a 16-byte alignment on x86.
|
|
|
|
We specify most strides in bytes, but since C defines offsets
in multiples of sizeof(type) we use the PXSTRIDE() macro to
downshift the strides by one in high-bit depth templated files.
This however means that the compiler is required to mask away
the least significant bit, because it could in theory be non-zero.
Avoid that by telling the compiler (when compiled in release mode)
that the lsb is in fact guaranteed to always be zero.
|
|
Required for AVX-512.
|
|
gen_grain_y_ar0_8bpc_c: 84853.3
gen_grain_y_ar0_8bpc_ssse3: 23528.0
gen_grain_y_ar1_8bpc_c: 140775.5
gen_grain_y_ar1_8bpc_ssse3: 70410.2
gen_grain_y_ar2_8bpc_c: 251311.3
gen_grain_y_ar2_8bpc_ssse3: 95222.2
gen_grain_y_ar3_8bpc_c: 394763.0
gen_grain_y_ar3_8bpc_ssse3: 103541.9
gen_grain_uv_ar0_8bpc_420_c: 29773.7
gen_grain_uv_ar0_8bpc_420_ssse3: 7068.9
gen_grain_uv_ar1_8bpc_420_c: 46113.2
gen_grain_uv_ar1_8bpc_420_ssse3: 22148.1
gen_grain_uv_ar2_8bpc_420_c: 70061.4
gen_grain_uv_ar2_8bpc_420_ssse3: 25479.0
gen_grain_uv_ar3_8bpc_420_c: 113826.0
gen_grain_uv_ar3_8bpc_420_ssse3: 30004.9
fguv_32x32xn_8bpc_420_csfl0_c: 8148.9
fguv_32x32xn_8bpc_420_csfl0_ssse3: 1371.3
fguv_32x32xn_8bpc_420_csfl1_c: 6391.9
fguv_32x32xn_8bpc_420_csfl1_ssse3: 1034.8
fgy_32x32xn_8bpc_c: 14201.3
fgy_32x32xn_8bpc_ssse3: 3443.0
|
|
CFI will SIGILL when calling a function pointer obtained through
dlsym(), regardless of whether or not the signature is correct.
See https://bugs.llvm.org/show_bug.cgi?id=44500
|
|
|