Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/videolan/dav1d.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-10-01Minor cleanupRonald S. Bultje
2019-10-01arm64: ipred: NEON implementation of dc/h/v prediction modesMartin Storsjö
Relative speedups over the C code: Cortex A53 A72 A73 intra_pred_dc_128_w4_8bpc_neon: 2.08 1.47 2.17 intra_pred_dc_128_w8_8bpc_neon: 3.33 2.49 4.03 intra_pred_dc_128_w16_8bpc_neon: 3.93 3.86 3.75 intra_pred_dc_128_w32_8bpc_neon: 3.14 3.79 2.90 intra_pred_dc_128_w64_8bpc_neon: 3.68 1.97 2.42 intra_pred_dc_left_w4_8bpc_neon: 2.41 1.70 2.23 intra_pred_dc_left_w8_8bpc_neon: 3.53 2.41 3.32 intra_pred_dc_left_w16_8bpc_neon: 3.87 3.54 3.34 intra_pred_dc_left_w32_8bpc_neon: 4.10 3.60 2.76 intra_pred_dc_left_w64_8bpc_neon: 3.72 2.00 2.39 intra_pred_dc_top_w4_8bpc_neon: 2.27 1.66 2.07 intra_pred_dc_top_w8_8bpc_neon: 3.83 2.69 3.43 intra_pred_dc_top_w16_8bpc_neon: 3.66 3.60 3.20 intra_pred_dc_top_w32_8bpc_neon: 3.92 3.54 2.66 intra_pred_dc_top_w64_8bpc_neon: 3.60 1.98 2.30 intra_pred_dc_w4_8bpc_neon: 2.29 1.42 2.16 intra_pred_dc_w8_8bpc_neon: 3.56 2.83 3.05 intra_pred_dc_w16_8bpc_neon: 3.46 3.37 3.15 intra_pred_dc_w32_8bpc_neon: 3.79 3.41 2.74 intra_pred_dc_w64_8bpc_neon: 3.52 2.01 2.41 intra_pred_h_w4_8bpc_neon: 10.34 5.74 5.94 intra_pred_h_w8_8bpc_neon: 12.13 6.33 6.43 intra_pred_h_w16_8bpc_neon: 10.66 7.31 5.85 intra_pred_h_w32_8bpc_neon: 6.28 4.18 2.88 intra_pred_h_w64_8bpc_neon: 3.96 1.85 1.75 intra_pred_v_w4_8bpc_neon: 11.44 6.12 7.57 intra_pred_v_w8_8bpc_neon: 14.76 7.58 7.95 intra_pred_v_w16_8bpc_neon: 11.34 6.28 5.88 intra_pred_v_w32_8bpc_neon: 6.56 3.33 3.34 intra_pred_v_w64_8bpc_neon: 4.57 1.24 1.97
2019-09-30x86: add warp_affine SSE4 and SSSE3 asmVictorien Le Couviour--Tuffet
------------------------------------------ x86_64: warp_8x8_8bpc_c: 1773.4 x86_32: warp_8x8_8bpc_c: 1740.4 ---------- x86_64: warp_8x8_8bpc_ssse3: 317.5 x86_32: warp_8x8_8bpc_ssse3: 378.4 ---------- x86_64: warp_8x8_8bpc_sse4: 303.7 x86_32: warp_8x8_8bpc_sse4: 367.7 ---------- x86_64: warp_8x8_8bpc_avx2: 224.9 --------------------- --------------------- x86_64: warp_8x8t_8bpc_c: 1664.6 x86_32: warp_8x8t_8bpc_c: 1674.0 ---------- x86_64: warp_8x8t_8bpc_ssse3: 320.7 x86_32: warp_8x8t_8bpc_ssse3: 379.5 ---------- x86_64: warp_8x8t_8bpc_sse4: 304.8 x86_32: warp_8x8t_8bpc_sse4: 369.8 ---------- x86_64: warp_8x8t_8bpc_avx2: 228.5 ------------------------------------------
2019-09-29arm64: itx: Fix overflows in idctMartin Storsjö
Don't add two 16 bit coefficients in 16 bit, if the result isn't supposed to be clipped. This fixes mismatches for some samples, see issue #299. Before: Cortex A53 A72 A73 inv_txfm_add_4x4_dct_dct_1_8bpc_neon: 93.0 52.8 49.5 inv_txfm_add_8x8_dct_dct_1_8bpc_neon: 260.0 186.0 196.4 inv_txfm_add_16x16_dct_dct_2_8bpc_neon: 1371.0 953.4 1028.6 inv_txfm_add_32x32_dct_dct_4_8bpc_neon: 7363.2 4887.5 5135.8 inv_txfm_add_64x64_dct_dct_4_8bpc_neon: 25029.0 17492.3 18404.5 After: inv_txfm_add_4x4_dct_dct_1_8bpc_neon: 105.0 58.7 55.2 inv_txfm_add_8x8_dct_dct_1_8bpc_neon: 294.0 211.5 209.9 inv_txfm_add_16x16_dct_dct_2_8bpc_neon: 1495.8 1050.4 1070.6 inv_txfm_add_32x32_dct_dct_4_8bpc_neon: 7866.7 5197.8 5321.4 inv_txfm_add_64x64_dct_dct_4_8bpc_neon: 25807.2 18619.3 18526.9
2019-09-29arm64: itx: Consistently use the factor 2896 in adstMartin Storsjö
The scaled form 2896>>4 shouldn't be necessary with valid bistreams.
2019-09-29arm64: itx: Use smull+smlal instead of addl+mulMartin Storsjö
Even though smull+smlal does two multiplications instead of one, the combination seems to be better handled by actual cores. Before: Cortex A53 A72 A73 inv_txfm_add_8x8_adst_adst_1_8bpc_neon: 356.0 279.2 278.0 inv_txfm_add_16x16_adst_adst_2_8bpc_neon: 1785.0 1329.5 1308.8 After: inv_txfm_add_8x8_adst_adst_1_8bpc_neon: 360.0 253.2 269.3 inv_txfm_add_16x16_adst_adst_2_8bpc_neon: 1793.1 1300.9 1254.0 (In this particular cases, it seems like it is a minor regression on A53, which is probably more due to having to change the ordering of some instructions, due to how smull+smlal+smull2+smlal2 overwrites the second output register sooner than an addl+addl2 would have, but in general, smull+smlal seems to be equally good or better than addl+mul on A53 as well.)
2019-09-28dav1dplay: initial support for --zerocopyNiklas Haas
Right now this just allocates a new buffer for every frame, uses it, then discards it immediately. This is not optimal, either dav1d should start reusing buffers internally or we need to pool them in dav1dplay. As it stands, this is not really a performance gain. I'll have to investigate why, but my suspicion is that seeing any gains might require reusing buffers somewhere. Note: Thrashing buffers is not as bad as it seems, initially. Not only does libplacebo pool and reuse GPU memory and buffer state objects internally, but this also absolves us from having to do any manual polling to figure out when the buffer is reusable again. Creating, using and immediately destroying buffers actually isn't as bad an approach as it might otherwise seem. It's entirely possible that this is only bad because of lock contention. As said, I'll have to investigate further...
2019-09-28dav1dplay: add --untimed for benchmarking purposesNiklas Haas
Useful to test the effects of performance changes to the decoding/rendering loop as a whole.
2019-09-28dav1dplay: add --highquality to toggle render qualityNiklas Haas
Only meaningful with libplacebo. The defaults are higher quality than SDL so it's an unfair comparison and definitely too much for slow iGPUs at 4K res. Make the defaults fast/dumb processing only, and guard the debanding/dithering/upscaling/etc. behind a new --highquality flag.
2019-09-19x86: add 32-bit support to SSSE3 deblock lpfVictorien Le Couviour--Tuffet
------------------------------------------ x86_64: lpf_h_sb_uv_w4_8bpc_c: 430.6 x86_32: lpf_h_sb_uv_w4_8bpc_c: 788.6 x86_64: lpf_h_sb_uv_w4_8bpc_ssse3: 322.0 x86_32: lpf_h_sb_uv_w4_8bpc_ssse3: 302.4 --------------------- x86_64: lpf_h_sb_uv_w6_8bpc_c: 981.9 x86_32: lpf_h_sb_uv_w6_8bpc_c: 1579.6 x86_64: lpf_h_sb_uv_w6_8bpc_ssse3: 421.5 x86_32: lpf_h_sb_uv_w6_8bpc_ssse3: 431.6 --------------------- x86_64: lpf_h_sb_y_w4_8bpc_c: 3001.7 x86_32: lpf_h_sb_y_w4_8bpc_c: 7021.3 x86_64: lpf_h_sb_y_w4_8bpc_ssse3: 466.3 x86_32: lpf_h_sb_y_w4_8bpc_ssse3: 564.7 --------------------- x86_64: lpf_h_sb_y_w8_8bpc_c: 4457.7 x86_32: lpf_h_sb_y_w8_8bpc_c: 3657.8 x86_64: lpf_h_sb_y_w8_8bpc_ssse3: 818.9 x86_32: lpf_h_sb_y_w8_8bpc_ssse3: 927.9 --------------------- x86_64: lpf_h_sb_y_w16_8bpc_c: 1967.9 x86_32: lpf_h_sb_y_w16_8bpc_c: 3343.5 x86_64: lpf_h_sb_y_w16_8bpc_ssse3: 1836.7 x86_32: lpf_h_sb_y_w16_8bpc_ssse3: 1975.0 --------------------- x86_64: lpf_v_sb_uv_w4_8bpc_c: 369.4 x86_32: lpf_v_sb_uv_w4_8bpc_c: 793.6 x86_64: lpf_v_sb_uv_w4_8bpc_ssse3: 110.9 x86_32: lpf_v_sb_uv_w4_8bpc_ssse3: 133.0 --------------------- x86_64: lpf_v_sb_uv_w6_8bpc_c: 769.6 x86_32: lpf_v_sb_uv_w6_8bpc_c: 1576.7 x86_64: lpf_v_sb_uv_w6_8bpc_ssse3: 222.2 x86_32: lpf_v_sb_uv_w6_8bpc_ssse3: 232.2 --------------------- x86_64: lpf_v_sb_y_w4_8bpc_c: 772.4 x86_32: lpf_v_sb_y_w4_8bpc_c: 2596.5 x86_64: lpf_v_sb_y_w4_8bpc_ssse3: 179.8 x86_32: lpf_v_sb_y_w4_8bpc_ssse3: 234.7 --------------------- x86_64: lpf_v_sb_y_w8_8bpc_c: 1660.2 x86_32: lpf_v_sb_y_w8_8bpc_c: 3979.9 x86_64: lpf_v_sb_y_w8_8bpc_ssse3: 468.3 x86_32: lpf_v_sb_y_w8_8bpc_ssse3: 580.9 --------------------- x86_64: lpf_v_sb_y_w16_8bpc_c: 1889.6 x86_32: lpf_v_sb_y_w16_8bpc_c: 4728.7 x86_64: lpf_v_sb_y_w16_8bpc_ssse3: 1142.0 x86_32: lpf_v_sb_y_w16_8bpc_ssse3: 1174.8 ------------------------------------------
2019-09-19x86: add deblocking loopfilters SSSE3 asm (64-bit)Ronald S. Bultje
--------------------- x86_64: ------------------------------------------ lpf_h_sb_uv_w4_8bpc_c: 430.6 lpf_h_sb_uv_w4_8bpc_ssse3: 322.0 lpf_h_sb_uv_w4_8bpc_avx2: 200.4 --------------------- lpf_h_sb_uv_w6_8bpc_c: 981.9 lpf_h_sb_uv_w6_8bpc_ssse3: 421.5 lpf_h_sb_uv_w6_8bpc_avx2: 270.0 --------------------- lpf_h_sb_y_w4_8bpc_c: 3001.7 lpf_h_sb_y_w4_8bpc_ssse3: 466.3 lpf_h_sb_y_w4_8bpc_avx2: 383.1 --------------------- lpf_h_sb_y_w8_8bpc_c: 4457.7 lpf_h_sb_y_w8_8bpc_ssse3: 818.9 lpf_h_sb_y_w8_8bpc_avx2: 537.0 --------------------- lpf_h_sb_y_w16_8bpc_c: 1967.9 lpf_h_sb_y_w16_8bpc_ssse3: 1836.7 lpf_h_sb_y_w16_8bpc_avx2: 1078.2 --------------------- lpf_v_sb_uv_w4_8bpc_c: 369.4 lpf_v_sb_uv_w4_8bpc_ssse3: 110.9 lpf_v_sb_uv_w4_8bpc_avx2: 58.1 --------------------- lpf_v_sb_uv_w6_8bpc_c: 769.6 lpf_v_sb_uv_w6_8bpc_ssse3: 222.2 lpf_v_sb_uv_w6_8bpc_avx2: 117.8 --------------------- lpf_v_sb_y_w4_8bpc_c: 772.4 lpf_v_sb_y_w4_8bpc_ssse3: 179.8 lpf_v_sb_y_w4_8bpc_avx2: 173.6 --------------------- lpf_v_sb_y_w8_8bpc_c: 1660.2 lpf_v_sb_y_w8_8bpc_ssse3: 468.3 lpf_v_sb_y_w8_8bpc_avx2: 345.8 --------------------- lpf_v_sb_y_w16_8bpc_c: 1889.6 lpf_v_sb_y_w16_8bpc_ssse3: 1142.0 lpf_v_sb_y_w16_8bpc_avx2: 568.1 ------------------------------------------
2019-09-10AVX2 for chroma 4:2:0 film grain reconstructionRonald S. Bultje
fguv_32x32xn_8bpc_420_csfl0_c: 8945.4 fguv_32x32xn_8bpc_420_csfl0_avx2: 1001.6 fguv_32x32xn_8bpc_420_csfl1_c: 6363.4 fguv_32x32xn_8bpc_420_csfl1_avx2: 1299.5
2019-09-10Remove luma width check in fguv_32x32xnRonald S. Bultje
This would affect the output in samples with an odd width and horizontal chroma subsampling. The check does not exist in libaom, and might cause mismatches. This causes issues in the sample from #210, which uses super-resolution and has odd width. To work around this, make super-resolution's resize() always write an even number of pixels. This should not interfere with SIMD in the future.
2019-09-10Y grain AVX2 implementationsRonald S. Bultje
fgy_32x32xn_8bpc_c: 16181.8 fgy_32x32xn_8bpc_avx2: 3231.4 gen_grain_y_ar0_8bpc_c: 108857.6 gen_grain_y_ar0_8bpc_avx2: 22826.7 gen_grain_y_ar1_8bpc_c: 168239.8 gen_grain_y_ar1_8bpc_avx2: 72117.2 gen_grain_y_ar2_8bpc_c: 266165.9 gen_grain_y_ar2_8bpc_avx2: 126281.8 gen_grain_y_ar3_8bpc_c: 448139.4 gen_grain_y_ar3_8bpc_avx2: 137047.1
2019-09-10Add film grain checkasm testsRonald S. Bultje
2019-09-10Split out film grain block functions into a DSPContextRonald S. Bultje
2019-09-06obu: fix deriving render_width and render_height from reference framesJames Almer
Both values can be independently coded in the bitstream, and are not always equal to frame_width and frame_height.
2019-09-06Silence some clang-cl warningsHenrik Gramner
For some reason the MSVC CRT _wassert() function is not flagged as __declspec(noreturn), so when using those headers the compiler will expect execution to continue after an assertion has been triggered and will therefore complain about the use of uninitialized variables when compiled in debug mode in certain code paths. Reorder some case statements as a workaround.
2019-09-05x86: Fix buffer overead in mc putHenrik Gramner
For w <= 32 we can't process more than two rows per loop iteration. Credit to OSS-Fuzz.
2019-09-05x86: Increase precision of the final inverse ADST transform stagesHenrik Gramner
16-bit precision is sufficient for the second pass, but the first pass requires 32-bit precision to correctly handle some esoteric edge cases.
2019-09-05arm64: itx: Do the final calculation of adst4/adst8/adst16 in 32 bit to ↵Martin Storsjö
avoid too narrow clipping See issue #295, this fixes it for arm64. Before: Cortex A53 A72 A73 inv_txfm_add_4x4_adst_adst_1_8bpc_neon: 103.0 63.2 65.2 inv_txfm_add_4x8_adst_adst_1_8bpc_neon: 197.0 145.0 134.2 inv_txfm_add_8x8_adst_adst_1_8bpc_neon: 332.0 248.0 247.1 inv_txfm_add_16x16_adst_adst_2_8bpc_neon: 1676.8 1197.0 1186.8 After: inv_txfm_add_4x4_adst_adst_1_8bpc_neon: 103.0 76.4 67.0 inv_txfm_add_4x8_adst_adst_1_8bpc_neon: 205.0 155.0 143.8 inv_txfm_add_8x8_adst_adst_1_8bpc_neon: 358.0 269.0 276.2 inv_txfm_add_16x16_adst_adst_2_8bpc_neon: 1785.2 1347.8 1312.1 This would probably only be needed for adst in the first pass, but the additional code complexity from splitting the implementations (as we currently don't have transforms differentiated between first and second pass) isn't necessarily worth it (the speedup over C code is still 8-10x).
2019-09-04Prefer __builtin_unreachable() over __assume() on clang-clHenrik Gramner
__assume() doesn't work correctly in clang-cl versions prior to 7.0.0 which causes bogus warnings regarding use of uninitialized variables to be printed. Avoid that by using __builtin_unreachable() instead.
2019-09-04Fix clang-cl assertion warningHenrik Gramner
clang-cl doesn't like function calls in __assume statements, even trivial inline ones.
2019-09-04arm: Fix assembling with older binutilsJanne Grunau
This large constant needs a movw instruction, which newer binutils can figure out, but older versions need stated explicitly. This fixes #296.
2019-09-03TileContext: reorder scratch buffer to avoid conflictsJanne Grunau
The chroma part of pal_idx potentially conflicts during intra reconstruction with edge_{8,16}bpc. Fixes out of range pixel values caused by invalid palette indices in clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5076736684851200. Fixes #294. Reported as integer overflows in boxsum5sqr with undefined behavior sanitizer. Credits to oss-fuzz.
2019-09-01CI: use "needs:" to break the static build, test stage dependencyJanne Grunau
2019-08-30Apply high-bitdepth adjustment of pixel index after delta calculationRonald S. Bultje
Fixes libaom/dav1d mismatch in av1-1-b10-23-film_grain-50.ivf.
2019-08-30Use linear interpolation for high bit-depth pixel valuesRonald S. Bultje
2019-08-30Fix bugs in film grain generationRonald S. Bultje
- calculate chroma grain based on src (not dst) luma pixels; - division should precede multiplication in delta calculation. Together, these fix differences in film grain reconstruction between libaom and dav1d for various generated samples.
2019-08-29arm: mc: Making code style consistentB Krishnan Iyer
2019-08-29arm: mc: Push fewer registers in w_maskMartin Storsjö
Use the so far unused lr register instead of r10.
2019-08-29arm: mc: Remove an unused instruction in w_maskMartin Storsjö
2019-08-29Check absolute tile positions in sb-to-tile_idx table generationRonald S. Bultje
Otherwise the table can get out of sync when the frame size and tile count stays the same, but the tile coordinates change. Fixes #266.
2019-08-28Use 64-bit integers for warp_affine mvx/mvy calculationsHenrik Gramner
Fixes integer overflows with very large frame sizes. Credit to OSS-Fuzz.
2019-08-28x86: Fix inverse ADST transform overflowsHenrik Gramner
2019-08-23Optimize coef ctx calculationsHenrik Gramner
2019-08-23Consolidate horizontal scan tablesHenrik Gramner
2019-08-22Change scan tables from int16_t to uint16_tHenrik Gramner
Eliminates some sign extensions.
2019-08-19Utilize the constraints in assertions to improve code generationHenrik Gramner
When compiling in release mode, instead of just deleting assertions, use them to give hints to the compiler. This allows for slightly better code generation in some cases.
2019-08-15arm64: mc: NEON implementation of w_mask_444/422/420 functionB Krishnan Iyer
A73 A53 w_mask_420_w4_8bpc_c: 818 1082.9 w_mask_420_w4_8bpc_neon: 79 126.6 w_mask_420_w8_8bpc_c: 2486 3399.8 w_mask_420_w8_8bpc_neon: 200.2 343.7 w_mask_420_w16_8bpc_c: 8022.3 10989.6 w_mask_420_w16_8bpc_neon: 528.1 889 w_mask_420_w32_8bpc_c: 31851.8 42808.6 w_mask_420_w32_8bpc_neon: 2062.5 3380.8 w_mask_420_w64_8bpc_c: 79268.5 102683.9 w_mask_420_w64_8bpc_neon: 5252.9 8575.4 w_mask_420_w128_8bpc_c: 193704.1 255586.5 w_mask_420_w128_8bpc_neon: 14602.3 22167.7 w_mask_422_w4_8bpc_c: 777.3 1038.5 w_mask_422_w4_8bpc_neon: 72.1 112.9 w_mask_422_w8_8bpc_c: 2405.7 3168 w_mask_422_w8_8bpc_neon: 191.9 314.1 w_mask_422_w16_8bpc_c: 7783.7 10543.9 w_mask_422_w16_8bpc_neon: 559.8 835.5 w_mask_422_w32_8bpc_c: 30895.7 41141.2 w_mask_422_w32_8bpc_neon: 2089.7 3187.2 w_mask_422_w64_8bpc_c: 75500.2 98766.3 w_mask_422_w64_8bpc_neon: 5379 8208.2 w_mask_422_w128_8bpc_c: 186967.1 245809.1 w_mask_422_w128_8bpc_neon: 15159.9 21474.5 w_mask_444_w4_8bpc_c: 850.1 1136.6 w_mask_444_w4_8bpc_neon: 66.5 104.7 w_mask_444_w8_8bpc_c: 2373.5 3262.9 w_mask_444_w8_8bpc_neon: 180.5 290.2 w_mask_444_w16_8bpc_c: 7291.6 10590.7 w_mask_444_w16_8bpc_neon: 550.9 809.7 w_mask_444_w32_8bpc_c: 8048.3 10140.8 w_mask_444_w32_8bpc_neon: 2136.2 3095 w_mask_444_w64_8bpc_c: 18055.3 23060 w_mask_444_w64_8bpc_neon: 5522.5 8124.8 w_mask_444_w128_8bpc_c: 42754.3 56072 w_mask_444_w128_8bpc_neon: 15569.5 21531.5
2019-08-14arm64: mc: NEON implementation of blend, blend_h and blend_v functionB Krishnan Iyer
A73 A53 blend_h_w2_8bpc_c: 184.7 301.5 blend_h_w2_8bpc_neon: 58.8 104.1 blend_h_w4_8bpc_c: 291.4 507.3 blend_h_w4_8bpc_neon: 48.7 108.9 blend_h_w8_8bpc_c: 510.1 992.7 blend_h_w8_8bpc_neon: 66.5 99.3 blend_h_w16_8bpc_c: 972 1835.3 blend_h_w16_8bpc_neon: 82.7 145.2 blend_h_w32_8bpc_c: 776.7 912.9 blend_h_w32_8bpc_neon: 155.1 266.9 blend_h_w64_8bpc_c: 1424.3 1635.4 blend_h_w64_8bpc_neon: 273.4 480.9 blend_h_w128_8bpc_c: 3318.1 3774 blend_h_w128_8bpc_neon: 614.1 1097.9 blend_v_w2_8bpc_c: 278.8 427.5 blend_v_w2_8bpc_neon: 113.7 170.4 blend_v_w4_8bpc_c: 960.2 1597.7 blend_v_w4_8bpc_neon: 222.9 351.4 blend_v_w8_8bpc_c: 1694.2 3333.5 blend_v_w8_8bpc_neon: 200.9 333.6 blend_v_w16_8bpc_c: 3115.2 5971.6 blend_v_w16_8bpc_neon: 233.2 494.8 blend_v_w32_8bpc_c: 3949.7 6070.6 blend_v_w32_8bpc_neon: 460.4 841.6 blend_w4_8bpc_c: 244.2 388.3 blend_w4_8bpc_neon: 25.5 66.7 blend_w8_8bpc_c: 616.3 1120.8 blend_w8_8bpc_neon: 46 110.7 blend_w16_8bpc_c: 2193.1 4056.4 blend_w16_8bpc_neon: 140.7 299.3 blend_w32_8bpc_c: 2502.8 2998.5 blend_w32_8bpc_neon: 381.4 725.3
2019-08-14Prefer `do {} while (0);` over `while (0);`Michael Bradshaw
2019-08-13Cosmetics: CDF tablesHenrik Gramner
2019-08-13x86: Add an msac function for coefficient hi_tok decodingHenrik Gramner
This particular sequence is executed often enough to justify having a separate slightly more optimized code path instead of just chaining multiple generic symbol decoding function calls together.
2019-08-13Add msac optimizationsHenrik Gramner
* Eliminate the trailing zero after the CDF probabilities. We can reuse the count value as a terminator instead. This reduces the size of the CDF context by around 8%. * Align the CDF arrays. * Various other minor optimizations.
2019-08-13Remove unused CDF:sHenrik Gramner
2019-08-10dav1dplay: abort if no input filename is providedJames Almer
2019-08-10meson: move dav1dplay to a new examples sectionJames Almer
dav1dplay shouldn't be built by default. And it's an example more than a tool.
2019-08-09decode_coefs reuse lossless variableLuc Trudeau
2019-08-09Unroll hi_token loop in decode_coeffLuc Trudeau