Age | Commit message (Collapse) | Author |
|
When compiling with asm enabled there's no point in compiling
C versions of DSP functions that have asm implementations using
instruction sets that the compiler can unconditionally use.
E.g. when compiling with -mssse3 we can remove the C version
of all functions with SSSE3 implementations.
This is accomplished using the compiler's dead code elimination
functionality.
Can be configured using the new 'trim_dsp' meson option, which
by default is enabled when compiling in release mode.
|
|
|
|
|
|
Enabling/disabling signal handlers is very slow and requires a syscall.
A better approach is to keep the signal handlers enabled all the time,
and use a simple flag variable to determine if a given signal should
be handled or passed on to the default signal handler.
|
|
GetTickCount() increases at a very low frequency, >10ms per tick.
When running multiple loops of checkasm instances in parallel
different instances regularly ends up using identical seeds.
Prefer the use of QueryPerformanceCounter() instead, which ticks at
a significantly higher rate, which in turn increases randomness.
|
|
|
|
|
|
Fixes use of uninitialized value.
|
|
|
|
Verifying that the YMM state is clean when returning from assembly
functions helps catching potential issues with AVX/SSE transitions.
|
|
On Intel CPUs certain AVX-512 shuffle instructions incorrectly
flag the upper halves of YMM registers as in use when writing
to XMM registers, which may cause AVX/SSE state transitions.
This behavior is not documented and only occurs on physical
hardware, not when using the Intel SDE, so as far as I can tell
it appears to be a hardware bug.
Work around the issue by using EVEX-only registers. This avoids
the problem at the cost of a slightly larger code size.
|
|
fg_data->num_y_points is used in generate_grain_uv, but is only set
after the call: move the initialization above.
|
|
artifacts:reports:cobertura was deprecated in GitLab 14.9
|
|
* meson 0.49.0
* nasm 2.14
|
|
An attacker already has arbitrary code execution inside the container.
Ref: CVE-2022-24765
|
|
|
|
Insert missing space.
|
|
|
|
|
|
Makes it possible to benchmark the different code paths individually.
|
|
Alternate between buffers when benchmarking in order to more
accurately measure throughout instead of latency.
|
|
|
|
|
|
Additionally, switch from 'only'/'except' to 'rules' which is
more flexible.
|
|
* Doxygen had a longstanding bug [0] where it would use `dot` even if
not configured to do so. Due to this behaviour our config magically
worked.
This bug is fixed in 1.9.2 therefore we need to explicitly enable
`dot` support in order to keep existing functionality.
* Enables WARN_AS_ERROR to catch mistakes.
* Adds a version string to the header to easily identify which commit
the docs are built from.
[0] https://github.com/doxygen/doxygen/issues/7273
|
|
Increasing a reference counter only requires atomicity, but not
ordering or synchronization.
|
|
Checking if the Dav1dRef pointer is non-zero and zeroing it is
already performed in dav1d_ref_dec(), no need to do it twice.
Also reorder code to enable tail call elimination.
|
|
Avoids the function call overhead in non-LTO builds.
Also reorder code in dav1d_ref_dec() to enable tail call elimination.
|
|
inv_txfm_add_32x8_dct_dct_0_12bpc_c: 286.7
inv_txfm_add_32x8_dct_dct_0_12bpc_avx2: 20.1
inv_txfm_add_32x8_dct_dct_1_12bpc_c: 7832.7
inv_txfm_add_32x8_dct_dct_1_12bpc_avx2: 710.6
inv_txfm_add_32x8_dct_dct_2_12bpc_c: 7838.1
inv_txfm_add_32x8_dct_dct_2_12bpc_avx2: 711.6
inv_txfm_add_32x8_dct_dct_3_12bpc_c: 7818.3
inv_txfm_add_32x8_dct_dct_3_12bpc_avx2: 710.9
inv_txfm_add_32x8_dct_dct_4_12bpc_c: 7820.6
inv_txfm_add_32x8_dct_dct_4_12bpc_avx2: 710.5
inv_txfm_add_32x8_identity_identity_0_12bpc_c: 1526.6
inv_txfm_add_32x8_identity_identity_0_12bpc_avx2: 19.3
inv_txfm_add_32x8_identity_identity_1_12bpc_c: 1519.4
inv_txfm_add_32x8_identity_identity_1_12bpc_avx2: 19.9
inv_txfm_add_32x8_identity_identity_2_12bpc_c: 1519.9
inv_txfm_add_32x8_identity_identity_2_12bpc_avx2: 43.6
inv_txfm_add_32x8_identity_identity_3_12bpc_c: 1519.4
inv_txfm_add_32x8_identity_identity_3_12bpc_avx2: 67.8
inv_txfm_add_32x8_identity_identity_4_12bpc_c: 1523.2
inv_txfm_add_32x8_identity_identity_4_12bpc_avx2: 91.6
|
|
inv_txfm_add_8x32_dct_dct_0_12bpc_c: 334.6
inv_txfm_add_8x32_dct_dct_0_12bpc_avx2: 66.0
inv_txfm_add_8x32_dct_dct_1_12bpc_c: 7929.7
inv_txfm_add_8x32_dct_dct_1_12bpc_avx2: 489.3
inv_txfm_add_8x32_dct_dct_2_12bpc_c: 7925.8
inv_txfm_add_8x32_dct_dct_2_12bpc_avx2: 547.1
inv_txfm_add_8x32_dct_dct_3_12bpc_c: 7928.9
inv_txfm_add_8x32_dct_dct_3_12bpc_avx2: 647.8
inv_txfm_add_8x32_dct_dct_4_12bpc_c: 7916.1
inv_txfm_add_8x32_dct_dct_4_12bpc_avx2: 701.0
inv_txfm_add_8x32_identity_identity_0_12bpc_c: 2413.1
inv_txfm_add_8x32_identity_identity_0_12bpc_avx2: 28.6
inv_txfm_add_8x32_identity_identity_1_12bpc_c: 2415.2
inv_txfm_add_8x32_identity_identity_1_12bpc_avx2: 28.6
inv_txfm_add_8x32_identity_identity_2_12bpc_c: 2413.7
inv_txfm_add_8x32_identity_identity_2_12bpc_avx2: 55.1
inv_txfm_add_8x32_identity_identity_3_12bpc_c: 2415.4
inv_txfm_add_8x32_identity_identity_3_12bpc_avx2: 85.3
inv_txfm_add_8x32_identity_identity_4_12bpc_c: 2401.8
inv_txfm_add_8x32_identity_identity_4_12bpc_avx2: 116.8
|
|
|
|
|
|
From section 6.8.2 in the AV1 spec:
"It is a requirement of bitstream conformance that when show_existing_frame is
used to show a previous frame with RefFrameType[ frame_to_show_map_idx ] equal
to KEY_FRAME, that the frame is output via the show_existing_frame mechanism at
most once."
|
|
From section 6.8.2 in the AV1 spec:
"It is a requirement of bitstream conformance that when show_existing_frame
is used to show a previous frame, that the value of showable_frame for the
previous frame was equal to 1."
|
|
From section 6.8.2 in the AV1 spec:
"If frame_type is equal to INTRA_ONLY_FRAME, it is a requirement of bitstream
conformance that refresh_frame_flags is not equal to 0xff."
Make this a soft requirement by checking that strict standard complaince is
enabled.
|
|
There's an assert on n_fc == 1 at the beginning of the function. There cannot
be a second pass used here.
Signed-off-by: Steve Lhomme <robux4@videolabs.io>
|
|
the next visible picture in display order
If the first picture in coding order after a new sequence header is parsed is
not visible, the first picture output by dav1d after the fact (which is coded
after the aforementioned invisible picture) would not trigger the new seq
header event flag as expected, despite being the first containing a reference
to a new sequence header.
Assuming the invisible picture is ever output, the result of this change will
be two pictures signaling a new sequence header was seen despite there being
only one new sequence header.
|
|
|
|
|
|
|
|
|
|
Set f->n_tile_data to 0 after the dav1d_decode_frame_exit() call in
dav1d_decode_frame(). dav1d_decode_frame_exit() unrefs every element in
use in the f->tile array, so it is good to set f->n_tile_data to 0 to
indicate that no elements are in use.
We are already doing this after all other dav1d_decode_frame_exit()
calls.
NOTE: It is tempting to have dav1d_decode_frame_exit() itself set
f->n_tile_data to 0. I did not do that in this merge request, because
the following is a common pattern:
dav1d_decode_frame_exit(f, error);
f->n_tile_data = 0;
pthread_cond_signal(&f->task_thread.cond);
corresponding to the waiting code:
while (f->n_tile_data > 0)
pthread_cond_wait(&f->task_thread.cond,
&c->task_thread.lock);
I wonder if f->n_tile_data is set to 0 outside dav1d_decode_frame_exit()
to make clear the association of f->n_tile_data with the condition
variable f->task_thread.cond.
|
|
|
|
Split out common parts into separate functions. This reduces the
overall binary size by more than 5 KiB.
|
|
|
|
This avoids build errors if such features are enabled while targeting
another binary format. (Using such features on other platforms
might require some other form of signaling/setup though, but
the ELF specific .note section isn't applicable at least.)
|
|
|
|
|
|
failed to decode
|
|
|