Age | Commit message (Collapse) | Author |
|
Fixes #188.
|
|
Refs #188, adds a dav1d CLI option. Defaults to 1 to allow adjustment of the
tests of scalable bitstreams.
|
|
Refs #188, adds a corrosponding dav1d CLI option and skips not required
temporal and spatial layers based on the selected operating point.
|
|
|
|
Fixes #59.
|
|
Also mark all planes broken after tile error.
Fixes an use-of-uninitialized-value in apply_to_row_y() with
clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5652400153559040. Credits
to oss-fuzz.
|
|
Fixes #191
|
|
|
|
|
|
The images sources are available on
https://code.videolan.org/videolan/docker-images-aarch64.
|
|
|
|
Fixes #183. Fixes use of uninitialized data in apply_to_row_uv with odd
width in clusterfuzz-testcase-minimized-dav1d_fuzzer-5684823666982912.
Credits to oss-fuzz.
|
|
|
|
Fixed 00000802.ivf.
|
|
|
|
This avoids a misoptimization in clang,
https://bugs.llvm.org/show_bug.cgi?id=39550, where the root cause
has been around for a number of years, but a change in LLVM 6.0
allowed for better optimizations, exposing this bug. This bug is
on good track to be fixed in LLVM for the 8.0 release and hopefully
also for backporting into 7.0.1. It is however present in 6.0, 6.0.1
and 7.0, and other downstream users such as Xcode 10.0/10.1.
|
|
Fixes 00000527.ivf in #186.
|
|
|
|
This catches the redefinition of _WIN32_WINNT warnings in the windows
jobs.
|
|
Fixes warnings about redefinition of _WIN32_WINNT on Windows targets
introduced by b716083c7a.
|
|
Also ensure we apply film-grain to delayed pictures.
|
|
|
|
|
|
|
|
|
|
Fixes clusterfuzz-testcase-minimized-dav1d_fuzzer-5730334348410880,
with credits to oss-fuzz.
|
|
Fixes an undefined left shift of a negative value in
clusterfuzz-testcase-minimized-dav1d_fuzzer-5707215277654016. Credits to
oss-fuzz.
|
|
This does not adjust the AVX2 asm. The asm clips in many places to the
required range (16-bit signed) for performance reason. No mismatch
observed with coefs generated by the forward transform in checkasm in
10 thousand runs.
|
|
These edges don't encode LR coefficients anyway. Fixes
clusterfuzz-testcase-minimized-dav1d_fuzzer-5731769337249792.
Credits to oss-fuzz.
|
|
This fixes compiler errors like these:
src/film_grain_tmpl.c(238): error C2036: 'void *': unknown size
Don't rely on sizeof(void) == 1 in pointer arithmetic, but instead
cast the row pointers to the pixel datatype immediately, use PXSTRIDE()
for converting a stride in byte units to pixel units, and skip
sizeof(pixel) for horizontal offsets that previously were applied on
a void pointer.
|
|
Fixes a deadlock on teardown with
clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5636065151418368. Credits
to oss-fuzz.
|
|
Current versions of meson have a bug that causes the need to add
the nasm generated objects to checkasm, even though this should
already be covered by the extract_all_objects() for libdav1d.
Meson versions >= 0.48.999 (that is, Meson 0.49 and development
versions states of that on git) fixed this issue so now adding
this is not longer needed.
Adding it regardless would actually cause an error because of
symbols being present twice.
|
|
Fixes warnings about redefinition of _WIN32_WINNT on Windows targets
|
|
|
|
This is using a slightly adapted version of my GPU-based algorithm. The
major difference to the algorithm suggested by the spec (and implemented
in libaom) is that instead of using a line buffer to hold the previous
row's film grain blocks, we compute each row/block fully independently.
This opens up the door to exploit parallelism in the future, since we
don't have any left->right or top->down dependency except for the PRNG
state. (Which we could pre-compute for a massively parallel / GPU
implementation)
That being said, it's probably somewhat slower than using a line buffer
for the serial / single CPU case, although most likely not by much
(since the areas with the most redundant work get progressively smaller,
down to a single 2x2 square for the worst case).
|
|
This becomes part of the picture properties, since users may want to
apply film grain themselves (e.g. for a GPU implementation).
|
|
The spec subtracts the signed offset from all of these when using them,
like it does for e.g. ar_coeffs_y_plus_128, although for some reason
the naming scheme is inconsistent here. Either way, it makes more sense
to treat them as signed integers than unsigned integers.
To avoid confusion since the name of the field is the same as the one in
the spec, we mark the type as int8_t (resp. int16_t for the 9-bit field)
to make it clear to the user that these are already signed integers.
|
|
|
|
|
|
Fixes #183.
|
|
Also fix a calculation for u_idx. Fixes 5646860283281408 of #183.
|
|
|
|
These functions have been tuned against Cortex A53 and Snapdragon
835. The bilin functions have mainly been written with code size
in mind, as they aren't used much in practice.
Relative speedups for the actual filtering fuctions (that don't
just do a plain copy) are around 4-15x, some over 20x. This is
in comparison with GCC 5.4 with autovectorization disabled; the
actual real-world speedup against autovectorized C code is around
4-10x.
Relative speedups measured with checkasm:
Cortex A53 Snapdragon 835
mc_8tap_regular_w2_0_8bpc_neon: 6.96 5.28
mc_8tap_regular_w2_h_8bpc_neon: 5.16 4.35
mc_8tap_regular_w2_hv_8bpc_neon: 5.37 4.98
mc_8tap_regular_w2_v_8bpc_neon: 6.35 4.85
mc_8tap_regular_w4_0_8bpc_neon: 6.78 5.73
mc_8tap_regular_w4_h_8bpc_neon: 8.40 6.60
mc_8tap_regular_w4_hv_8bpc_neon: 7.23 7.10
mc_8tap_regular_w4_v_8bpc_neon: 9.06 7.76
mc_8tap_regular_w8_0_8bpc_neon: 6.96 5.55
mc_8tap_regular_w8_h_8bpc_neon: 10.36 6.88
mc_8tap_regular_w8_hv_8bpc_neon: 9.49 6.86
mc_8tap_regular_w8_v_8bpc_neon: 12.06 9.61
mc_8tap_regular_w16_0_8bpc_neon: 6.68 4.51
mc_8tap_regular_w16_h_8bpc_neon: 12.30 7.77
mc_8tap_regular_w16_hv_8bpc_neon: 9.50 6.68
mc_8tap_regular_w16_v_8bpc_neon: 12.93 9.68
mc_8tap_regular_w32_0_8bpc_neon: 3.91 2.93
mc_8tap_regular_w32_h_8bpc_neon: 13.06 7.89
mc_8tap_regular_w32_hv_8bpc_neon: 9.37 6.70
mc_8tap_regular_w32_v_8bpc_neon: 12.88 9.49
mc_8tap_regular_w64_0_8bpc_neon: 2.89 1.68
mc_8tap_regular_w64_h_8bpc_neon: 13.48 8.00
mc_8tap_regular_w64_hv_8bpc_neon: 9.23 6.53
mc_8tap_regular_w64_v_8bpc_neon: 13.11 9.68
mc_8tap_regular_w128_0_8bpc_neon: 1.89 1.24
mc_8tap_regular_w128_h_8bpc_neon: 13.58 7.98
mc_8tap_regular_w128_hv_8bpc_neon: 8.86 6.53
mc_8tap_regular_w128_v_8bpc_neon: 12.46 9.63
mc_bilinear_w2_0_8bpc_neon: 7.02 5.40
mc_bilinear_w2_h_8bpc_neon: 3.65 3.14
mc_bilinear_w2_hv_8bpc_neon: 4.36 4.84
mc_bilinear_w2_v_8bpc_neon: 5.22 4.28
mc_bilinear_w4_0_8bpc_neon: 6.87 5.99
mc_bilinear_w4_h_8bpc_neon: 6.50 8.61
mc_bilinear_w4_hv_8bpc_neon: 7.70 7.99
mc_bilinear_w4_v_8bpc_neon: 7.04 9.10
mc_bilinear_w8_0_8bpc_neon: 7.03 5.70
mc_bilinear_w8_h_8bpc_neon: 11.30 15.14
mc_bilinear_w8_hv_8bpc_neon: 15.74 13.50
mc_bilinear_w8_v_8bpc_neon: 13.40 17.54
mc_bilinear_w16_0_8bpc_neon: 6.75 4.48
mc_bilinear_w16_h_8bpc_neon: 17.02 13.95
mc_bilinear_w16_hv_8bpc_neon: 17.37 13.78
mc_bilinear_w16_v_8bpc_neon: 23.69 22.98
mc_bilinear_w32_0_8bpc_neon: 3.88 3.18
mc_bilinear_w32_h_8bpc_neon: 18.80 14.97
mc_bilinear_w32_hv_8bpc_neon: 17.74 14.02
mc_bilinear_w32_v_8bpc_neon: 24.46 23.04
mc_bilinear_w64_0_8bpc_neon: 2.87 1.66
mc_bilinear_w64_h_8bpc_neon: 19.54 16.02
mc_bilinear_w64_hv_8bpc_neon: 17.80 14.32
mc_bilinear_w64_v_8bpc_neon: 24.79 23.63
mc_bilinear_w128_0_8bpc_neon: 2.13 1.23
mc_bilinear_w128_h_8bpc_neon: 19.89 16.24
mc_bilinear_w128_hv_8bpc_neon: 17.55 14.15
mc_bilinear_w128_v_8bpc_neon: 24.45 23.54
mct_8tap_regular_w4_0_8bpc_neon: 5.56 5.51
mct_8tap_regular_w4_h_8bpc_neon: 7.48 5.80
mct_8tap_regular_w4_hv_8bpc_neon: 7.27 7.09
mct_8tap_regular_w4_v_8bpc_neon: 7.80 6.84
mct_8tap_regular_w8_0_8bpc_neon: 9.54 9.25
mct_8tap_regular_w8_h_8bpc_neon: 9.08 6.55
mct_8tap_regular_w8_hv_8bpc_neon: 9.16 6.30
mct_8tap_regular_w8_v_8bpc_neon: 10.79 8.66
mct_8tap_regular_w16_0_8bpc_neon: 15.35 10.50
mct_8tap_regular_w16_h_8bpc_neon: 10.18 6.76
mct_8tap_regular_w16_hv_8bpc_neon: 9.17 6.11
mct_8tap_regular_w16_v_8bpc_neon: 11.52 8.72
mct_8tap_regular_w32_0_8bpc_neon: 15.82 10.09
mct_8tap_regular_w32_h_8bpc_neon: 10.75 6.85
mct_8tap_regular_w32_hv_8bpc_neon: 9.00 6.22
mct_8tap_regular_w32_v_8bpc_neon: 11.58 8.67
mct_8tap_regular_w64_0_8bpc_neon: 15.28 9.68
mct_8tap_regular_w64_h_8bpc_neon: 10.93 6.96
mct_8tap_regular_w64_hv_8bpc_neon: 8.81 6.53
mct_8tap_regular_w64_v_8bpc_neon: 11.42 8.73
mct_8tap_regular_w128_0_8bpc_neon: 14.41 7.67
mct_8tap_regular_w128_h_8bpc_neon: 10.92 6.96
mct_8tap_regular_w128_hv_8bpc_neon: 8.56 6.51
mct_8tap_regular_w128_v_8bpc_neon: 11.16 8.70
mct_bilinear_w4_0_8bpc_neon: 5.66 5.77
mct_bilinear_w4_h_8bpc_neon: 5.16 6.40
mct_bilinear_w4_hv_8bpc_neon: 6.86 6.82
mct_bilinear_w4_v_8bpc_neon: 4.75 6.09
mct_bilinear_w8_0_8bpc_neon: 9.78 10.00
mct_bilinear_w8_h_8bpc_neon: 8.98 11.37
mct_bilinear_w8_hv_8bpc_neon: 14.42 10.83
mct_bilinear_w8_v_8bpc_neon: 9.12 11.62
mct_bilinear_w16_0_8bpc_neon: 15.59 10.76
mct_bilinear_w16_h_8bpc_neon: 11.98 8.77
mct_bilinear_w16_hv_8bpc_neon: 15.83 10.73
mct_bilinear_w16_v_8bpc_neon: 14.70 14.60
mct_bilinear_w32_0_8bpc_neon: 15.89 10.32
mct_bilinear_w32_h_8bpc_neon: 13.47 9.07
mct_bilinear_w32_hv_8bpc_neon: 16.01 10.95
mct_bilinear_w32_v_8bpc_neon: 14.85 14.16
mct_bilinear_w64_0_8bpc_neon: 15.36 10.51
mct_bilinear_w64_h_8bpc_neon: 14.00 9.61
mct_bilinear_w64_hv_8bpc_neon: 15.82 11.27
mct_bilinear_w64_v_8bpc_neon: 14.61 14.76
mct_bilinear_w128_0_8bpc_neon: 14.41 7.92
mct_bilinear_w128_h_8bpc_neon: 13.31 9.58
mct_bilinear_w128_hv_8bpc_neon: 14.07 11.18
mct_bilinear_w128_v_8bpc_neon: 11.57 14.42
|
|
|
|
Fixes #149.
|
|
This reverts commit 597a6eb9cee41ddbebf019f3b20f50e8da48061c. It leads to
assertion failures in oss-fuzz.
|
|
Fixes unaligned writes while splatting coefs for skip blocks with
clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5684725352497152 and
clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5728508249112576.
|
|
|
|
|
|
Catches warnings in assert statements.
|