github.com/videolan/dav1d.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2019-10-01	Minor cleanup	Ronald S. Bultje

2019-10-01	arm64: ipred: NEON implementation of dc/h/v prediction modes	Martin Storsjö
	Relative speedups over the C code: Cortex A53 A72 A73 intra_pred_dc_128_w4_8bpc_neon: 2.08 1.47 2.17 intra_pred_dc_128_w8_8bpc_neon: 3.33 2.49 4.03 intra_pred_dc_128_w16_8bpc_neon: 3.93 3.86 3.75 intra_pred_dc_128_w32_8bpc_neon: 3.14 3.79 2.90 intra_pred_dc_128_w64_8bpc_neon: 3.68 1.97 2.42 intra_pred_dc_left_w4_8bpc_neon: 2.41 1.70 2.23 intra_pred_dc_left_w8_8bpc_neon: 3.53 2.41 3.32 intra_pred_dc_left_w16_8bpc_neon: 3.87 3.54 3.34 intra_pred_dc_left_w32_8bpc_neon: 4.10 3.60 2.76 intra_pred_dc_left_w64_8bpc_neon: 3.72 2.00 2.39 intra_pred_dc_top_w4_8bpc_neon: 2.27 1.66 2.07 intra_pred_dc_top_w8_8bpc_neon: 3.83 2.69 3.43 intra_pred_dc_top_w16_8bpc_neon: 3.66 3.60 3.20 intra_pred_dc_top_w32_8bpc_neon: 3.92 3.54 2.66 intra_pred_dc_top_w64_8bpc_neon: 3.60 1.98 2.30 intra_pred_dc_w4_8bpc_neon: 2.29 1.42 2.16 intra_pred_dc_w8_8bpc_neon: 3.56 2.83 3.05 intra_pred_dc_w16_8bpc_neon: 3.46 3.37 3.15 intra_pred_dc_w32_8bpc_neon: 3.79 3.41 2.74 intra_pred_dc_w64_8bpc_neon: 3.52 2.01 2.41 intra_pred_h_w4_8bpc_neon: 10.34 5.74 5.94 intra_pred_h_w8_8bpc_neon: 12.13 6.33 6.43 intra_pred_h_w16_8bpc_neon: 10.66 7.31 5.85 intra_pred_h_w32_8bpc_neon: 6.28 4.18 2.88 intra_pred_h_w64_8bpc_neon: 3.96 1.85 1.75 intra_pred_v_w4_8bpc_neon: 11.44 6.12 7.57 intra_pred_v_w8_8bpc_neon: 14.76 7.58 7.95 intra_pred_v_w16_8bpc_neon: 11.34 6.28 5.88 intra_pred_v_w32_8bpc_neon: 6.56 3.33 3.34 intra_pred_v_w64_8bpc_neon: 4.57 1.24 1.97
2019-09-30	x86: add warp_affine SSE4 and SSSE3 asm	Victorien Le Couviour--Tuffet
	------------------------------------------ x86_64: warp_8x8_8bpc_c: 1773.4 x86_32: warp_8x8_8bpc_c: 1740.4 ---------- x86_64: warp_8x8_8bpc_ssse3: 317.5 x86_32: warp_8x8_8bpc_ssse3: 378.4 ---------- x86_64: warp_8x8_8bpc_sse4: 303.7 x86_32: warp_8x8_8bpc_sse4: 367.7 ---------- x86_64: warp_8x8_8bpc_avx2: 224.9 --------------------- --------------------- x86_64: warp_8x8t_8bpc_c: 1664.6 x86_32: warp_8x8t_8bpc_c: 1674.0 ---------- x86_64: warp_8x8t_8bpc_ssse3: 320.7 x86_32: warp_8x8t_8bpc_ssse3: 379.5 ---------- x86_64: warp_8x8t_8bpc_sse4: 304.8 x86_32: warp_8x8t_8bpc_sse4: 369.8 ---------- x86_64: warp_8x8t_8bpc_avx2: 228.5 ------------------------------------------
2019-09-29	arm64: itx: Fix overflows in idct	Martin Storsjö
	Don't add two 16 bit coefficients in 16 bit, if the result isn't supposed to be clipped. This fixes mismatches for some samples, see issue #299. Before: Cortex A53 A72 A73 inv_txfm_add_4x4_dct_dct_1_8bpc_neon: 93.0 52.8 49.5 inv_txfm_add_8x8_dct_dct_1_8bpc_neon: 260.0 186.0 196.4 inv_txfm_add_16x16_dct_dct_2_8bpc_neon: 1371.0 953.4 1028.6 inv_txfm_add_32x32_dct_dct_4_8bpc_neon: 7363.2 4887.5 5135.8 inv_txfm_add_64x64_dct_dct_4_8bpc_neon: 25029.0 17492.3 18404.5 After: inv_txfm_add_4x4_dct_dct_1_8bpc_neon: 105.0 58.7 55.2 inv_txfm_add_8x8_dct_dct_1_8bpc_neon: 294.0 211.5 209.9 inv_txfm_add_16x16_dct_dct_2_8bpc_neon: 1495.8 1050.4 1070.6 inv_txfm_add_32x32_dct_dct_4_8bpc_neon: 7866.7 5197.8 5321.4 inv_txfm_add_64x64_dct_dct_4_8bpc_neon: 25807.2 18619.3 18526.9
2019-09-29	arm64: itx: Consistently use the factor 2896 in adst	Martin Storsjö
	The scaled form 2896>>4 shouldn't be necessary with valid bistreams.
2019-09-29	arm64: itx: Use smull+smlal instead of addl+mul	Martin Storsjö
	Even though smull+smlal does two multiplications instead of one, the combination seems to be better handled by actual cores. Before: Cortex A53 A72 A73 inv_txfm_add_8x8_adst_adst_1_8bpc_neon: 356.0 279.2 278.0 inv_txfm_add_16x16_adst_adst_2_8bpc_neon: 1785.0 1329.5 1308.8 After: inv_txfm_add_8x8_adst_adst_1_8bpc_neon: 360.0 253.2 269.3 inv_txfm_add_16x16_adst_adst_2_8bpc_neon: 1793.1 1300.9 1254.0 (In this particular cases, it seems like it is a minor regression on A53, which is probably more due to having to change the ordering of some instructions, due to how smull+smlal+smull2+smlal2 overwrites the second output register sooner than an addl+addl2 would have, but in general, smull+smlal seems to be equally good or better than addl+mul on A53 as well.)
2019-09-28	dav1dplay: initial support for --zerocopy	Niklas Haas
	Right now this just allocates a new buffer for every frame, uses it, then discards it immediately. This is not optimal, either dav1d should start reusing buffers internally or we need to pool them in dav1dplay. As it stands, this is not really a performance gain. I'll have to investigate why, but my suspicion is that seeing any gains might require reusing buffers somewhere. Note: Thrashing buffers is not as bad as it seems, initially. Not only does libplacebo pool and reuse GPU memory and buffer state objects internally, but this also absolves us from having to do any manual polling to figure out when the buffer is reusable again. Creating, using and immediately destroying buffers actually isn't as bad an approach as it might otherwise seem. It's entirely possible that this is only bad because of lock contention. As said, I'll have to investigate further...
2019-09-28	dav1dplay: add --untimed for benchmarking purposes	Niklas Haas
	Useful to test the effects of performance changes to the decoding/rendering loop as a whole.
2019-09-28	dav1dplay: add --highquality to toggle render quality	Niklas Haas
	Only meaningful with libplacebo. The defaults are higher quality than SDL so it's an unfair comparison and definitely too much for slow iGPUs at 4K res. Make the defaults fast/dumb processing only, and guard the debanding/dithering/upscaling/etc. behind a new --highquality flag.
2019-09-19	x86: add 32-bit support to SSSE3 deblock lpf	Victorien Le Couviour--Tuffet
	------------------------------------------ x86_64: lpf_h_sb_uv_w4_8bpc_c: 430.6 x86_32: lpf_h_sb_uv_w4_8bpc_c: 788.6 x86_64: lpf_h_sb_uv_w4_8bpc_ssse3: 322.0 x86_32: lpf_h_sb_uv_w4_8bpc_ssse3: 302.4 --------------------- x86_64: lpf_h_sb_uv_w6_8bpc_c: 981.9 x86_32: lpf_h_sb_uv_w6_8bpc_c: 1579.6 x86_64: lpf_h_sb_uv_w6_8bpc_ssse3: 421.5 x86_32: lpf_h_sb_uv_w6_8bpc_ssse3: 431.6 --------------------- x86_64: lpf_h_sb_y_w4_8bpc_c: 3001.7 x86_32: lpf_h_sb_y_w4_8bpc_c: 7021.3 x86_64: lpf_h_sb_y_w4_8bpc_ssse3: 466.3 x86_32: lpf_h_sb_y_w4_8bpc_ssse3: 564.7 --------------------- x86_64: lpf_h_sb_y_w8_8bpc_c: 4457.7 x86_32: lpf_h_sb_y_w8_8bpc_c: 3657.8 x86_64: lpf_h_sb_y_w8_8bpc_ssse3: 818.9 x86_32: lpf_h_sb_y_w8_8bpc_ssse3: 927.9 --------------------- x86_64: lpf_h_sb_y_w16_8bpc_c: 1967.9 x86_32: lpf_h_sb_y_w16_8bpc_c: 3343.5 x86_64: lpf_h_sb_y_w16_8bpc_ssse3: 1836.7 x86_32: lpf_h_sb_y_w16_8bpc_ssse3: 1975.0 --------------------- x86_64: lpf_v_sb_uv_w4_8bpc_c: 369.4 x86_32: lpf_v_sb_uv_w4_8bpc_c: 793.6 x86_64: lpf_v_sb_uv_w4_8bpc_ssse3: 110.9 x86_32: lpf_v_sb_uv_w4_8bpc_ssse3: 133.0 --------------------- x86_64: lpf_v_sb_uv_w6_8bpc_c: 769.6 x86_32: lpf_v_sb_uv_w6_8bpc_c: 1576.7 x86_64: lpf_v_sb_uv_w6_8bpc_ssse3: 222.2 x86_32: lpf_v_sb_uv_w6_8bpc_ssse3: 232.2 --------------------- x86_64: lpf_v_sb_y_w4_8bpc_c: 772.4 x86_32: lpf_v_sb_y_w4_8bpc_c: 2596.5 x86_64: lpf_v_sb_y_w4_8bpc_ssse3: 179.8 x86_32: lpf_v_sb_y_w4_8bpc_ssse3: 234.7 --------------------- x86_64: lpf_v_sb_y_w8_8bpc_c: 1660.2 x86_32: lpf_v_sb_y_w8_8bpc_c: 3979.9 x86_64: lpf_v_sb_y_w8_8bpc_ssse3: 468.3 x86_32: lpf_v_sb_y_w8_8bpc_ssse3: 580.9 --------------------- x86_64: lpf_v_sb_y_w16_8bpc_c: 1889.6 x86_32: lpf_v_sb_y_w16_8bpc_c: 4728.7 x86_64: lpf_v_sb_y_w16_8bpc_ssse3: 1142.0 x86_32: lpf_v_sb_y_w16_8bpc_ssse3: 1174.8 ------------------------------------------
2019-09-19	x86: add deblocking loopfilters SSSE3 asm (64-bit)	Ronald S. Bultje
	--------------------- x86_64: ------------------------------------------ lpf_h_sb_uv_w4_8bpc_c: 430.6 lpf_h_sb_uv_w4_8bpc_ssse3: 322.0 lpf_h_sb_uv_w4_8bpc_avx2: 200.4 --------------------- lpf_h_sb_uv_w6_8bpc_c: 981.9 lpf_h_sb_uv_w6_8bpc_ssse3: 421.5 lpf_h_sb_uv_w6_8bpc_avx2: 270.0 --------------------- lpf_h_sb_y_w4_8bpc_c: 3001.7 lpf_h_sb_y_w4_8bpc_ssse3: 466.3 lpf_h_sb_y_w4_8bpc_avx2: 383.1 --------------------- lpf_h_sb_y_w8_8bpc_c: 4457.7 lpf_h_sb_y_w8_8bpc_ssse3: 818.9 lpf_h_sb_y_w8_8bpc_avx2: 537.0 --------------------- lpf_h_sb_y_w16_8bpc_c: 1967.9 lpf_h_sb_y_w16_8bpc_ssse3: 1836.7 lpf_h_sb_y_w16_8bpc_avx2: 1078.2 --------------------- lpf_v_sb_uv_w4_8bpc_c: 369.4 lpf_v_sb_uv_w4_8bpc_ssse3: 110.9 lpf_v_sb_uv_w4_8bpc_avx2: 58.1 --------------------- lpf_v_sb_uv_w6_8bpc_c: 769.6 lpf_v_sb_uv_w6_8bpc_ssse3: 222.2 lpf_v_sb_uv_w6_8bpc_avx2: 117.8 --------------------- lpf_v_sb_y_w4_8bpc_c: 772.4 lpf_v_sb_y_w4_8bpc_ssse3: 179.8 lpf_v_sb_y_w4_8bpc_avx2: 173.6 --------------------- lpf_v_sb_y_w8_8bpc_c: 1660.2 lpf_v_sb_y_w8_8bpc_ssse3: 468.3 lpf_v_sb_y_w8_8bpc_avx2: 345.8 --------------------- lpf_v_sb_y_w16_8bpc_c: 1889.6 lpf_v_sb_y_w16_8bpc_ssse3: 1142.0 lpf_v_sb_y_w16_8bpc_avx2: 568.1 ------------------------------------------
2019-09-10	AVX2 for chroma 4:2:0 film grain reconstruction	Ronald S. Bultje
	fguv_32x32xn_8bpc_420_csfl0_c: 8945.4 fguv_32x32xn_8bpc_420_csfl0_avx2: 1001.6 fguv_32x32xn_8bpc_420_csfl1_c: 6363.4 fguv_32x32xn_8bpc_420_csfl1_avx2: 1299.5
2019-09-10	Remove luma width check in fguv_32x32xn	Ronald S. Bultje
	This would affect the output in samples with an odd width and horizontal chroma subsampling. The check does not exist in libaom, and might cause mismatches. This causes issues in the sample from #210, which uses super-resolution and has odd width. To work around this, make super-resolution's resize() always write an even number of pixels. This should not interfere with SIMD in the future.
2019-09-10	Y grain AVX2 implementations	Ronald S. Bultje
	fgy_32x32xn_8bpc_c: 16181.8 fgy_32x32xn_8bpc_avx2: 3231.4 gen_grain_y_ar0_8bpc_c: 108857.6 gen_grain_y_ar0_8bpc_avx2: 22826.7 gen_grain_y_ar1_8bpc_c: 168239.8 gen_grain_y_ar1_8bpc_avx2: 72117.2 gen_grain_y_ar2_8bpc_c: 266165.9 gen_grain_y_ar2_8bpc_avx2: 126281.8 gen_grain_y_ar3_8bpc_c: 448139.4 gen_grain_y_ar3_8bpc_avx2: 137047.1
2019-09-10	Add film grain checkasm tests	Ronald S. Bultje

2019-09-10	Split out film grain block functions into a DSPContext	Ronald S. Bultje

2019-09-06	obu: fix deriving render_width and render_height from reference frames	James Almer
	Both values can be independently coded in the bitstream, and are not always equal to frame_width and frame_height.
2019-09-06	Silence some clang-cl warnings	Henrik Gramner
	For some reason the MSVC CRT _wassert() function is not flagged as __declspec(noreturn), so when using those headers the compiler will expect execution to continue after an assertion has been triggered and will therefore complain about the use of uninitialized variables when compiled in debug mode in certain code paths. Reorder some case statements as a workaround.
2019-09-05	x86: Fix buffer overead in mc put	Henrik Gramner
	For w <= 32 we can't process more than two rows per loop iteration. Credit to OSS-Fuzz.
2019-09-05	x86: Increase precision of the final inverse ADST transform stages	Henrik Gramner
	16-bit precision is sufficient for the second pass, but the first pass requires 32-bit precision to correctly handle some esoteric edge cases.
2019-09-05	arm64: itx: Do the final calculation of adst4/adst8/adst16 in 32 bit to ↵	Martin Storsjö
	avoid too narrow clipping See issue #295, this fixes it for arm64. Before: Cortex A53 A72 A73 inv_txfm_add_4x4_adst_adst_1_8bpc_neon: 103.0 63.2 65.2 inv_txfm_add_4x8_adst_adst_1_8bpc_neon: 197.0 145.0 134.2 inv_txfm_add_8x8_adst_adst_1_8bpc_neon: 332.0 248.0 247.1 inv_txfm_add_16x16_adst_adst_2_8bpc_neon: 1676.8 1197.0 1186.8 After: inv_txfm_add_4x4_adst_adst_1_8bpc_neon: 103.0 76.4 67.0 inv_txfm_add_4x8_adst_adst_1_8bpc_neon: 205.0 155.0 143.8 inv_txfm_add_8x8_adst_adst_1_8bpc_neon: 358.0 269.0 276.2 inv_txfm_add_16x16_adst_adst_2_8bpc_neon: 1785.2 1347.8 1312.1 This would probably only be needed for adst in the first pass, but the additional code complexity from splitting the implementations (as we currently don't have transforms differentiated between first and second pass) isn't necessarily worth it (the speedup over C code is still 8-10x).
2019-09-04	Prefer __builtin_unreachable() over __assume() on clang-cl	Henrik Gramner
	__assume() doesn't work correctly in clang-cl versions prior to 7.0.0 which causes bogus warnings regarding use of uninitialized variables to be printed. Avoid that by using __builtin_unreachable() instead.
2019-09-04	Fix clang-cl assertion warning	Henrik Gramner
	clang-cl doesn't like function calls in __assume statements, even trivial inline ones.
2019-09-04	arm: Fix assembling with older binutils	Janne Grunau
	This large constant needs a movw instruction, which newer binutils can figure out, but older versions need stated explicitly. This fixes #296.
2019-09-03	TileContext: reorder scratch buffer to avoid conflicts	Janne Grunau
	The chroma part of pal_idx potentially conflicts during intra reconstruction with edge_{8,16}bpc. Fixes out of range pixel values caused by invalid palette indices in clusterfuzz-testcase-minimized-dav1d_fuzzer_mt-5076736684851200. Fixes #294. Reported as integer overflows in boxsum5sqr with undefined behavior sanitizer. Credits to oss-fuzz.
2019-09-01	CI: use "needs:" to break the static build, test stage dependency	Janne Grunau

2019-08-30	Apply high-bitdepth adjustment of pixel index after delta calculation	Ronald S. Bultje
	Fixes libaom/dav1d mismatch in av1-1-b10-23-film_grain-50.ivf.
2019-08-30	Use linear interpolation for high bit-depth pixel values	Ronald S. Bultje

2019-08-30	Fix bugs in film grain generation	Ronald S. Bultje
	- calculate chroma grain based on src (not dst) luma pixels; - division should precede multiplication in delta calculation. Together, these fix differences in film grain reconstruction between libaom and dav1d for various generated samples.
2019-08-29	arm: mc: Making code style consistent	B Krishnan Iyer

2019-08-29	arm: mc: Push fewer registers in w_mask	Martin Storsjö
	Use the so far unused lr register instead of r10.
2019-08-29	arm: mc: Remove an unused instruction in w_mask	Martin Storsjö

2019-08-29	Check absolute tile positions in sb-to-tile_idx table generation	Ronald S. Bultje
	Otherwise the table can get out of sync when the frame size and tile count stays the same, but the tile coordinates change. Fixes #266.
2019-08-28	Use 64-bit integers for warp_affine mvx/mvy calculations	Henrik Gramner
	Fixes integer overflows with very large frame sizes. Credit to OSS-Fuzz.
2019-08-28	x86: Fix inverse ADST transform overflows	Henrik Gramner

2019-08-23	Optimize coef ctx calculations	Henrik Gramner

2019-08-23	Consolidate horizontal scan tables	Henrik Gramner

2019-08-22	Change scan tables from int16_t to uint16_t	Henrik Gramner
	Eliminates some sign extensions.
2019-08-19	Utilize the constraints in assertions to improve code generation	Henrik Gramner
	When compiling in release mode, instead of just deleting assertions, use them to give hints to the compiler. This allows for slightly better code generation in some cases.
2019-08-15	arm64: mc: NEON implementation of w_mask_444/422/420 function	B Krishnan Iyer
	A73 A53 w_mask_420_w4_8bpc_c: 818 1082.9 w_mask_420_w4_8bpc_neon: 79 126.6 w_mask_420_w8_8bpc_c: 2486 3399.8 w_mask_420_w8_8bpc_neon: 200.2 343.7 w_mask_420_w16_8bpc_c: 8022.3 10989.6 w_mask_420_w16_8bpc_neon: 528.1 889 w_mask_420_w32_8bpc_c: 31851.8 42808.6 w_mask_420_w32_8bpc_neon: 2062.5 3380.8 w_mask_420_w64_8bpc_c: 79268.5 102683.9 w_mask_420_w64_8bpc_neon: 5252.9 8575.4 w_mask_420_w128_8bpc_c: 193704.1 255586.5 w_mask_420_w128_8bpc_neon: 14602.3 22167.7 w_mask_422_w4_8bpc_c: 777.3 1038.5 w_mask_422_w4_8bpc_neon: 72.1 112.9 w_mask_422_w8_8bpc_c: 2405.7 3168 w_mask_422_w8_8bpc_neon: 191.9 314.1 w_mask_422_w16_8bpc_c: 7783.7 10543.9 w_mask_422_w16_8bpc_neon: 559.8 835.5 w_mask_422_w32_8bpc_c: 30895.7 41141.2 w_mask_422_w32_8bpc_neon: 2089.7 3187.2 w_mask_422_w64_8bpc_c: 75500.2 98766.3 w_mask_422_w64_8bpc_neon: 5379 8208.2 w_mask_422_w128_8bpc_c: 186967.1 245809.1 w_mask_422_w128_8bpc_neon: 15159.9 21474.5 w_mask_444_w4_8bpc_c: 850.1 1136.6 w_mask_444_w4_8bpc_neon: 66.5 104.7 w_mask_444_w8_8bpc_c: 2373.5 3262.9 w_mask_444_w8_8bpc_neon: 180.5 290.2 w_mask_444_w16_8bpc_c: 7291.6 10590.7 w_mask_444_w16_8bpc_neon: 550.9 809.7 w_mask_444_w32_8bpc_c: 8048.3 10140.8 w_mask_444_w32_8bpc_neon: 2136.2 3095 w_mask_444_w64_8bpc_c: 18055.3 23060 w_mask_444_w64_8bpc_neon: 5522.5 8124.8 w_mask_444_w128_8bpc_c: 42754.3 56072 w_mask_444_w128_8bpc_neon: 15569.5 21531.5
2019-08-14	arm64: mc: NEON implementation of blend, blend_h and blend_v function	B Krishnan Iyer
	A73 A53 blend_h_w2_8bpc_c: 184.7 301.5 blend_h_w2_8bpc_neon: 58.8 104.1 blend_h_w4_8bpc_c: 291.4 507.3 blend_h_w4_8bpc_neon: 48.7 108.9 blend_h_w8_8bpc_c: 510.1 992.7 blend_h_w8_8bpc_neon: 66.5 99.3 blend_h_w16_8bpc_c: 972 1835.3 blend_h_w16_8bpc_neon: 82.7 145.2 blend_h_w32_8bpc_c: 776.7 912.9 blend_h_w32_8bpc_neon: 155.1 266.9 blend_h_w64_8bpc_c: 1424.3 1635.4 blend_h_w64_8bpc_neon: 273.4 480.9 blend_h_w128_8bpc_c: 3318.1 3774 blend_h_w128_8bpc_neon: 614.1 1097.9 blend_v_w2_8bpc_c: 278.8 427.5 blend_v_w2_8bpc_neon: 113.7 170.4 blend_v_w4_8bpc_c: 960.2 1597.7 blend_v_w4_8bpc_neon: 222.9 351.4 blend_v_w8_8bpc_c: 1694.2 3333.5 blend_v_w8_8bpc_neon: 200.9 333.6 blend_v_w16_8bpc_c: 3115.2 5971.6 blend_v_w16_8bpc_neon: 233.2 494.8 blend_v_w32_8bpc_c: 3949.7 6070.6 blend_v_w32_8bpc_neon: 460.4 841.6 blend_w4_8bpc_c: 244.2 388.3 blend_w4_8bpc_neon: 25.5 66.7 blend_w8_8bpc_c: 616.3 1120.8 blend_w8_8bpc_neon: 46 110.7 blend_w16_8bpc_c: 2193.1 4056.4 blend_w16_8bpc_neon: 140.7 299.3 blend_w32_8bpc_c: 2502.8 2998.5 blend_w32_8bpc_neon: 381.4 725.3
2019-08-14	Prefer `do {} while (0);` over `while (0);`	Michael Bradshaw

2019-08-13	Cosmetics: CDF tables	Henrik Gramner

2019-08-13	x86: Add an msac function for coefficient hi_tok decoding	Henrik Gramner
	This particular sequence is executed often enough to justify having a separate slightly more optimized code path instead of just chaining multiple generic symbol decoding function calls together.
2019-08-13	Add msac optimizations	Henrik Gramner
	* Eliminate the trailing zero after the CDF probabilities. We can reuse the count value as a terminator instead. This reduces the size of the CDF context by around 8%. * Align the CDF arrays. * Various other minor optimizations.
2019-08-13	Remove unused CDF:s	Henrik Gramner

2019-08-10	dav1dplay: abort if no input filename is provided	James Almer

2019-08-10	meson: move dav1dplay to a new examples section	James Almer
	dav1dplay shouldn't be built by default. And it's an example more than a tool.
2019-08-09	decode_coefs reuse lossless variable	Luc Trudeau

2019-08-09	Unroll hi_token loop in decode_coeff	Luc Trudeau