github.com/videolan/dav1d.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2019-05-10	Add __attribute__((cold)) to rarely used functions	Henrik Gramner

2019-05-09	Add fps counter and --realtime, --frametimes and --realtimecache options	Pablo Stebler
	Fixes #262.
2019-05-09	build: Use -mcmodel=small on 64-bit Windows	Henrik Gramner
	GCC (MinGW) uses -mcmodel=medium by default which results in somewhat inefficient code, and there's no benefit for us in using that.
2019-05-09	Increase thread stack size	Henrik Gramner
	Stack usage can increase significantly when running under certain sanitizers which may exceed the previously used value.
2019-05-09	arm: Simplify includes for util.S	Martin Storsjö
	Other source files use this form for this include.
2019-05-09	fuzzer: fix oss-fuzz undefined behavior sanitizer build	Janne Grunau

2019-05-09	Add a DAV1D_ERR define to negate errno values when needed	James Almer

2019-05-09	Fix buffer overflow in 64x16 ssse3 idct	Henrik Gramner
	With frame threading enabled the code could previously clobber the coefficients of the next block. Update the checkasm test to check for this.
2019-05-08	arm64: remove invalid macro argument delimiter	Janne Grunau

2019-05-08	Add SSSE3 implementation for ipred_filter	Liwei Wang
	Cycle times: intra_pred_filter_w4_8bpc_c: 457.3 intra_pred_filter_w4_8bpc_ssse3: 66.7 intra_pred_filter_w8_8bpc_c: 1401.6 intra_pred_filter_w8_8bpc_ssse3: 208.6 intra_pred_filter_w16_8bpc_c: 2719.5 intra_pred_filter_w16_8bpc_ssse3: 431.3 intra_pred_filter_w32_8bpc_c: 6666.4 intra_pred_filter_w32_8bpc_ssse3: 936.7
2019-05-07	ci: Check for unprefixed global symbols	Henrik Gramner

2019-05-07	Fix all remaining symbols without a dav1d prefix	Henrik Gramner

2019-05-06	ci: Ignore binary files in style check	Henrik Gramner

2019-05-06	Add missing dav1d prefixes to picture allocation functions	Henrik Gramner

2019-05-04	Control the stack size of spawned threads	Henrik Gramner
	On some systems (e.g. Google Fuchsia) the default stack size of new threads is insufficient, resulting in crashes. On other systems the default stack size is unnecessarily large, which can waste a lot of virtual memory. By setting it to a sufficiently large fixed value we can ensure that we don't run out of stack space while keeping down memory usage.
2019-05-04	arm64: msac: Implement NEON msac_decode_symbol_adapt	Martin Storsjö
	Cortex A53 A72 A73 msac_decode_symbol_adapt4_c: 107.6 57.1 67.8 msac_decode_symbol_adapt4_neon: 70.4 56.4 55.1 msac_decode_symbol_adapt8_c: 157.1 74.5 90.3 msac_decode_symbol_adapt8_neon: 75.6 57.2 56.9 msac_decode_symbol_adapt16_c: 257.4 106.6 135.9 msac_decode_symbol_adapt16_neon: 101.8 62.0 65.2
2019-05-04	itx_tmpl: Fix the assert in inv_txfm_add_c	Martin Storsjö
	The previous form of the assert was automatically true for any value of w and h.
2019-04-29	Release 0.3.00.3.0	Jean-Baptiste Kempf

2019-04-24	ci: Add a test for x86-64 with 16-byte stack alignment	Henrik Gramner

2019-04-24	Update NEWS for 0.3.0 - Sailfish	Jean-Baptiste Kempf

2019-04-22	Fix crash in SSSE3 inverse transform	Henrik Gramner
	The 32x32 identity_identity transform would corrupt the stack, including the return address, when compiling with a 16-byte stack alignment on non-Windows systems.
2019-04-19	Update NEWS for 0.2.20.2.2	Jean-Baptiste Kempf

2019-04-18	Add SSSE3 implementation for the {16, 32, 64}x64 and 64 x{16, 32} blocks in itx	Liwei Wang
	Cycle times: inv_txfm_add_16x64_dct_dct_0_8bpc_c: 3973.5 inv_txfm_add_16x64_dct_dct_0_8bpc_ssse3: 185.7 inv_txfm_add_16x64_dct_dct_1_8bpc_c: 37869.1 inv_txfm_add_16x64_dct_dct_1_8bpc_ssse3: 2103.1 inv_txfm_add_16x64_dct_dct_2_8bpc_c: 37822.9 inv_txfm_add_16x64_dct_dct_2_8bpc_ssse3: 2099.1 inv_txfm_add_16x64_dct_dct_3_8bpc_c: 37871.7 inv_txfm_add_16x64_dct_dct_3_8bpc_ssse3: 2663.5 inv_txfm_add_16x64_dct_dct_4_8bpc_c: 38002.9 inv_txfm_add_16x64_dct_dct_4_8bpc_ssse3: 2589.7 inv_txfm_add_32x64_dct_dct_0_8bpc_c: 8319.2 inv_txfm_add_32x64_dct_dct_0_8bpc_ssse3: 376.9 inv_txfm_add_32x64_dct_dct_1_8bpc_c: 85956.8 inv_txfm_add_32x64_dct_dct_1_8bpc_ssse3: 4298.1 inv_txfm_add_32x64_dct_dct_2_8bpc_c: 89906.2 inv_txfm_add_32x64_dct_dct_2_8bpc_ssse3: 4291.3 inv_txfm_add_32x64_dct_dct_3_8bpc_c: 83710.9 inv_txfm_add_32x64_dct_dct_3_8bpc_ssse3: 5589.5 inv_txfm_add_32x64_dct_dct_4_8bpc_c: 87733.5 inv_txfm_add_32x64_dct_dct_4_8bpc_ssse3: 5658.4 inv_txfm_add_64x16_dct_dct_0_8bpc_c: 3895.9 inv_txfm_add_64x16_dct_dct_0_8bpc_ssse3: 179.5 inv_txfm_add_64x16_dct_dct_1_8bpc_c: 51375.2 inv_txfm_add_64x16_dct_dct_1_8bpc_ssse3: 3859.2 inv_txfm_add_64x16_dct_dct_2_8bpc_c: 52562.9 inv_txfm_add_64x16_dct_dct_2_8bpc_ssse3: 4044.1 inv_txfm_add_64x16_dct_dct_3_8bpc_c: 51347.0 inv_txfm_add_64x16_dct_dct_3_8bpc_ssse3: 5259.5 inv_txfm_add_64x16_dct_dct_4_8bpc_c: 49642.2 inv_txfm_add_64x16_dct_dct_4_8bpc_ssse3: 4008.4 inv_txfm_add_64x32_dct_dct_0_8bpc_c: 7196.4 inv_txfm_add_64x32_dct_dct_0_8bpc_ssse3: 355.8 inv_txfm_add_64x32_dct_dct_1_8bpc_c: 106588.4 inv_txfm_add_64x32_dct_dct_1_8bpc_ssse3: 4965.3 inv_txfm_add_64x32_dct_dct_2_8bpc_c: 106230.7 inv_txfm_add_64x32_dct_dct_2_8bpc_ssse3: 4772.0 inv_txfm_add_64x32_dct_dct_3_8bpc_c: 107427.0 inv_txfm_add_64x32_dct_dct_3_8bpc_ssse3: 7146.9 inv_txfm_add_64x32_dct_dct_4_8bpc_c: 111785.7 inv_txfm_add_64x32_dct_dct_4_8bpc_ssse3: 7156.2 inv_txfm_add_64x64_dct_dct_0_8bpc_c: 14512.4 inv_txfm_add_64x64_dct_dct_0_8bpc_ssse3: 674.2 inv_txfm_add_64x64_dct_dct_1_8bpc_c: 173246.3 inv_txfm_add_64x64_dct_dct_1_8bpc_ssse3: 8790.8 inv_txfm_add_64x64_dct_dct_2_8bpc_c: 174264.6 inv_txfm_add_64x64_dct_dct_2_8bpc_ssse3: 8767.6 inv_txfm_add_64x64_dct_dct_3_8bpc_c: 170047.3 inv_txfm_add_64x64_dct_dct_3_8bpc_ssse3: 10784.9 inv_txfm_add_64x64_dct_dct_4_8bpc_c: 170182.2 inv_txfm_add_64x64_dct_dct_4_8bpc_ssse3: 10795.6
2019-04-17	Over-allocate level array by 3-bytes	Ronald S. Bultje
	This is a workaround so that the AVX2 implementation of deblock can index the levels array starting from the level type, which causes it to over-read by up to 3 bytes. This is intended to fix #269.
2019-04-16	arm64: loopfilter: Implement NEON loop filters	Martin Storsjö
	The exact relative speedup compared to C code is a bit vague and hard to measure, depending on eactly how many filtered blocks are skipped, as the NEON version always filters 16 pixels at a time, while the C code can skip processing individual 4 pixel blocks. Additionally, the checkasm benchmarking code runs the same function repeatedly on the same buffer, which can make the filter take different codepaths on each run, as the function updates the buffer which will be used as input for the next run. If tweaking the checkasm test data to try to avoid skipped blocks, the relative speedups compared to C is between 2x and 5x, while it is around 1x to 4x with the current checkasm test as such. Benchmark numbers from a tweaked checkasm that avoids skipped blocks: Cortex A53 A72 A73 lpf_h_sb_uv_w4_8bpc_c: 2954.7 1399.3 1655.3 lpf_h_sb_uv_w4_8bpc_neon: 895.5 650.8 692.0 lpf_h_sb_uv_w6_8bpc_c: 3879.2 1917.2 2257.7 lpf_h_sb_uv_w6_8bpc_neon: 1125.6 759.5 838.4 lpf_h_sb_y_w4_8bpc_c: 6711.0 3275.5 3913.7 lpf_h_sb_y_w4_8bpc_neon: 1744.0 1342.1 1351.5 lpf_h_sb_y_w8_8bpc_c: 10695.7 6155.8 6638.9 lpf_h_sb_y_w8_8bpc_neon: 2146.5 1560.4 1609.1 lpf_h_sb_y_w16_8bpc_c: 11355.8 6292.0 6995.9 lpf_h_sb_y_w16_8bpc_neon: 2475.4 1949.6 1968.4 lpf_v_sb_uv_w4_8bpc_c: 2639.7 1204.8 1425.9 lpf_v_sb_uv_w4_8bpc_neon: 510.7 351.4 334.7 lpf_v_sb_uv_w6_8bpc_c: 3468.3 1757.1 2021.5 lpf_v_sb_uv_w6_8bpc_neon: 625.0 415.0 397.8 lpf_v_sb_y_w4_8bpc_c: 5428.7 2731.7 3068.5 lpf_v_sb_y_w4_8bpc_neon: 1172.6 792.1 768.0 lpf_v_sb_y_w8_8bpc_c: 8946.1 4412.8 5121.0 lpf_v_sb_y_w8_8bpc_neon: 1565.5 1063.6 1062.7 lpf_v_sb_y_w16_8bpc_c: 8978.9 4411.7 5112.0 lpf_v_sb_y_w16_8bpc_neon: 1775.0 1288.1 1236.7
2019-04-16	arm64: looprestoration: Add a NEON implementation of SGR	Martin Storsjö
	Relative speedup vs (autovectorized) C code: Cortex A53 A72 A73 selfguided_3x3_8bpc_neon: 2.91 2.12 2.68 selfguided_5x5_8bpc_neon: 3.18 2.65 3.39 selfguided_mix_8bpc_neon: 3.04 2.29 2.98 The relative speedup vs non-vectorized C code is around 2.6-4.6x.
2019-04-16	msac: Add a cast to indicate intended narrowing from size_t to unsigned	Martin Storsjö
	This fixes this compiler warning with MSVC: ../src/msac.c(148): warning C4267: '+=': conversion from 'size_t' to 'unsigned int', possible loss of data
2019-04-15	x86-64: Add msac_decode_symbol_adapt SSE2 asm	Henrik Gramner
	Also make various minor optimizations/style fixes to the MSAC C functions.
2019-04-10	Add SSSE3 implementation for ipred_paeth	Xuefeng Jiang
	intra_pred_paeth_w4_8bpc_c: 561.6 intra_pred_paeth_w4_8bpc_ssse3: 49.2 intra_pred_paeth_w8_8bpc_c: 1475.8 intra_pred_paeth_w8_8bpc_ssse3: 103.0 intra_pred_paeth_w16_8bpc_c: 4697.8 intra_pred_paeth_w16_8bpc_ssse3: 279.0 intra_pred_paeth_w32_8bpc_c: 13245.1 intra_pred_paeth_w32_8bpc_ssse3: 614.7 intra_pred_paeth_w64_8bpc_c: 32638.9 intra_pred_paeth_w64_8bpc_ssse3: 1477.6
2019-04-08	arm: Add a _neon suffix to all internal functions	Martin Storsjö
	This eases disambiguating these functions when looking at perf profiles.
2019-04-08	arm: Fix typos in comments	Martin Storsjö
	The width register has been set to clz(w)-24, not the other way around. And the 32 bit prep function has got the h parameter in r4, not in r5.
2019-04-04	arm: Consistently use 8/24 columns indentation for assembly	Martin Storsjö
	For cases with indented, nested .if/.macro in asm.S, ident those by 4 chars. Some initial assembly files were indented to 4/16 columns, while all the actual implementation files, starting with src/arm/64/mc.S, have used 8/24 for indentation.
2019-04-04	Add SSSE3 implementation for ipred_cfl_ac_444	Xuefeng Jiang
	cfl_ac_444_w4_8bpc_c: 978.2 cfl_ac_444_w4_8bpc_ssse3: 110.4 cfl_ac_444_w8_8bpc_c: 2312.3 cfl_ac_444_w8_8bpc_ssse3: 197.5 cfl_ac_444_w16_8bpc_c: 4081.1 cfl_ac_444_w16_8bpc_ssse3: 274.1 cfl_ac_444_w32_8bpc_c: 9544.3 cfl_ac_444_w32_8bpc_ssse3: 617.1
2019-03-28	CI: Check for newline at end of file	Henrik Gramner

2019-03-28	x86: cdef_dir: optimize best cost finding for SSE	Victorien Le Couviour--Tuffet
	Port of 65ee1233cf86f03e029d0520f7cc5a3e152d3bbd for AVX-2 from Kyle Siefring to SSE41, and optimize SSSE3. --------------------- x86_64: ------------------------------------------ before: cdef_dir_8bpc_ssse3: 110.3 after: cdef_dir_8bpc_ssse3: 105.9 new: cdef_dir_8bpc_sse4: 96.4 ------------------------------------------ --------------------- x86_32: ------------------------------------------ before: cdef_dir_8bpc_ssse3: 120.6 after: cdef_dir_8bpc_ssse3: 110.7 new: cdef_dir_8bpc_sse4: 106.5 ------------------------------------------
2019-03-28	x86: cdef_filter: use 8-bit arithmetic for SSE	Victorien Le Couviour--Tuffet
	Port of c204da0ff33a0d563d6c632b42799e4fbc48f402 for AVX-2 from Kyle Siefring. --------------------- x86_64: ------------------------------------------ before: cdef_filter_4x4_8bpc_ssse3: 141.7 after: cdef_filter_4x4_8bpc_ssse3: 131.6 before: cdef_filter_4x4_8bpc_sse4: 128.3 after: cdef_filter_4x4_8bpc_sse4: 119.0 ------------------------------------------ before: cdef_filter_4x8_8bpc_ssse3: 253.4 after: cdef_filter_4x8_8bpc_ssse3: 236.1 before: cdef_filter_4x8_8bpc_sse4: 228.5 after: cdef_filter_4x8_8bpc_sse4: 213.2 ------------------------------------------ before: cdef_filter_8x8_8bpc_ssse3: 429.6 after: cdef_filter_8x8_8bpc_ssse3: 386.9 before: cdef_filter_8x8_8bpc_sse4: 379.9 after: cdef_filter_8x8_8bpc_sse4: 335.9 ------------------------------------------ --------------------- x86_32: ------------------------------------------ before: cdef_filter_4x4_8bpc_ssse3: 184.3 after: cdef_filter_4x4_8bpc_ssse3: 163.3 before: cdef_filter_4x4_8bpc_sse4: 168.9 after: cdef_filter_4x4_8bpc_sse4: 146.1 ------------------------------------------ before: cdef_filter_4x8_8bpc_ssse3: 335.3 after: cdef_filter_4x8_8bpc_ssse3: 280.7 before: cdef_filter_4x8_8bpc_sse4: 305.1 after: cdef_filter_4x8_8bpc_sse4: 257.9 ------------------------------------------ before: cdef_filter_8x8_8bpc_ssse3: 579.1 after: cdef_filter_8x8_8bpc_ssse3: 500.5 before: cdef_filter_8x8_8bpc_sse4: 517.0 after: cdef_filter_8x8_8bpc_sse4: 455.8 ------------------------------------------
2019-03-28	x86: cdef_filter: use a better constant for SSE4	Victorien Le Couviour--Tuffet
	Port of dc2ae517648accc0fe4ac0737f9ee850accda278 for AVX-2 from Kyle Siefring. --------------------- x86_64: ------------------------------------------ cdef_filter_4x4_8bpc_ssse3: 141.7 cdef_filter_4x4_8bpc_sse4: 128.3 ------------------------------------------ cdef_filter_4x8_8bpc_ssse3: 253.4 cdef_filter_4x8_8bpc_sse4: 228.5 ------------------------------------------ cdef_filter_8x8_8bpc_ssse3: 429.6 cdef_filter_8x8_8bpc_sse4: 379.9 ------------------------------------------ --------------------- x86_32: ------------------------------------------ cdef_filter_4x4_8bpc_ssse3: 184.3 cdef_filter_4x4_8bpc_sse4: 168.9 ------------------------------------------ cdef_filter_4x8_8bpc_ssse3: 335.3 cdef_filter_4x8_8bpc_sse4: 305.1 ------------------------------------------ cdef_filter_8x8_8bpc_ssse3: 579.1 cdef_filter_8x8_8bpc_sse4: 517.0 ------------------------------------------
2019-03-28	x86: cdef_filter: fix macro case (lower to upper)	Victorien Le Couviour--Tuffet

2019-03-27	Add SSSE3 implementation for the 16x32,32x16 and 32x32 blocks in itx	Liwei Wang
	Cycle times: inv_txfm_add_16x32_dct_dct_0_8bpc_c: 2464.6 inv_txfm_add_16x32_dct_dct_0_8bpc_ssse3: 121.6 inv_txfm_add_16x32_dct_dct_1_8bpc_c: 24751.6 inv_txfm_add_16x32_dct_dct_1_8bpc_ssse3: 1101.9 inv_txfm_add_16x32_dct_dct_2_8bpc_c: 24377.0 inv_txfm_add_16x32_dct_dct_2_8bpc_ssse3: 1117.2 inv_txfm_add_16x32_dct_dct_3_8bpc_c: 24155.6 inv_txfm_add_16x32_dct_dct_3_8bpc_ssse3: 2349.3 inv_txfm_add_16x32_dct_dct_4_8bpc_c: 24175.6 inv_txfm_add_16x32_dct_dct_4_8bpc_ssse3: 1642.0 inv_txfm_add_16x32_identity_identity_0_8bpc_c: 10304.7 inv_txfm_add_16x32_identity_identity_0_8bpc_ssse3: 137.7 inv_txfm_add_16x32_identity_identity_1_8bpc_c: 10341.6 inv_txfm_add_16x32_identity_identity_1_8bpc_ssse3: 137.9 inv_txfm_add_16x32_identity_identity_2_8bpc_c: 10299.9 inv_txfm_add_16x32_identity_identity_2_8bpc_ssse3: 253.9 inv_txfm_add_16x32_identity_identity_3_8bpc_c: 10331.4 inv_txfm_add_16x32_identity_identity_3_8bpc_ssse3: 369.7 inv_txfm_add_16x32_identity_identity_4_8bpc_c: 10360.4 inv_txfm_add_16x32_identity_identity_4_8bpc_ssse3: 484.0 inv_txfm_add_32x16_dct_dct_0_8bpc_c: 2288.4 inv_txfm_add_32x16_dct_dct_0_8bpc_ssse3: 142.3 inv_txfm_add_32x16_dct_dct_1_8bpc_c: 23819.9 inv_txfm_add_32x16_dct_dct_1_8bpc_ssse3: 1740.1 inv_txfm_add_32x16_dct_dct_2_8bpc_c: 23755.8 inv_txfm_add_32x16_dct_dct_2_8bpc_ssse3: 1641.4 inv_txfm_add_32x16_dct_dct_3_8bpc_c: 23839.9 inv_txfm_add_32x16_dct_dct_3_8bpc_ssse3: 1559.0 inv_txfm_add_32x16_dct_dct_4_8bpc_c: 23757.7 inv_txfm_add_32x16_dct_dct_4_8bpc_ssse3: 1579.0 inv_txfm_add_32x16_identity_identity_0_8bpc_c: 10381.7 inv_txfm_add_32x16_identity_identity_0_8bpc_ssse3: 126.3 inv_txfm_add_32x16_identity_identity_1_8bpc_c: 10402.5 inv_txfm_add_32x16_identity_identity_1_8bpc_ssse3: 126.5 inv_txfm_add_32x16_identity_identity_2_8bpc_c: 10429.2 inv_txfm_add_32x16_identity_identity_2_8bpc_ssse3: 244.9 inv_txfm_add_32x16_identity_identity_3_8bpc_c: 10382.0 inv_txfm_add_32x16_identity_identity_3_8bpc_ssse3: 491.0 inv_txfm_add_32x16_identity_identity_4_8bpc_c: 10381.0 inv_txfm_add_32x16_identity_identity_4_8bpc_ssse3: 468.0 inv_txfm_add_32x32_dct_dct_0_8bpc_c: 4168.2 inv_txfm_add_32x32_dct_dct_0_8bpc_ssse3: 204.0 inv_txfm_add_32x32_dct_dct_1_8bpc_c: 46306.2 inv_txfm_add_32x32_dct_dct_1_8bpc_ssse3: 2216.0 inv_txfm_add_32x32_dct_dct_2_8bpc_c: 46300.2 inv_txfm_add_32x32_dct_dct_2_8bpc_ssse3: 2194.2 inv_txfm_add_32x32_dct_dct_3_8bpc_c: 46350.1 inv_txfm_add_32x32_dct_dct_3_8bpc_ssse3: 3484.4 inv_txfm_add_32x32_dct_dct_4_8bpc_c: 46318.1 inv_txfm_add_32x32_dct_dct_4_8bpc_ssse3: 3440.9 inv_txfm_add_32x32_identity_identity_0_8bpc_c: 14663.1 inv_txfm_add_32x32_identity_identity_0_8bpc_ssse3: 179.0 inv_txfm_add_32x32_identity_identity_1_8bpc_c: 14737.0 inv_txfm_add_32x32_identity_identity_1_8bpc_ssse3: 179.2 inv_txfm_add_32x32_identity_identity_2_8bpc_c: 14640.4 inv_txfm_add_32x32_identity_identity_2_8bpc_ssse3: 179.1 inv_txfm_add_32x32_identity_identity_3_8bpc_c: 14638.5 inv_txfm_add_32x32_identity_identity_3_8bpc_ssse3: 663.8 inv_txfm_add_32x32_identity_identity_4_8bpc_c: 14635.6 inv_txfm_add_32x32_identity_identity_4_8bpc_ssse3: 663.9
2019-03-26	build: Split x86 asm files per bitdepth	Henrik Gramner

2019-03-24	Only define DAV1D_API to dllexport when building dav1d itself	Martin Storsjö
	As meson still doesn't allow specifying different cflags between static and dynamic libraries, this still includes the dllexport in the static library when built with default_library=both, but it at least is avoided in static-only builds, and avoids defining these symbols as dllexport in the callers' translation units.
2019-03-24	Simplify C for inverse transforms	Henrik Gramner
	The second shift is constant.
2019-03-20	x86: Add minor CDEF AVX2 optimizations	Henrik Gramner

2019-03-19	Add SSSE3 implementation for the 8x32 and 32x8 blocks in itx	Liwei Wang
	Cycle times: inv_txfm_add_8x32_dct_dct_0_8bpc_c: 1164.7 inv_txfm_add_8x32_dct_dct_0_8bpc_ssse3: 79.5 inv_txfm_add_8x32_dct_dct_1_8bpc_c: 11291.6 inv_txfm_add_8x32_dct_dct_1_8bpc_ssse3: 508.5 inv_txfm_add_8x32_dct_dct_2_8bpc_c: 10720.4 inv_txfm_add_8x32_dct_dct_2_8bpc_ssse3: 507.9 inv_txfm_add_8x32_dct_dct_3_8bpc_c: 12351.5 inv_txfm_add_8x32_dct_dct_3_8bpc_ssse3: 687.2 inv_txfm_add_8x32_dct_dct_4_8bpc_c: 10402.3 inv_txfm_add_8x32_dct_dct_4_8bpc_ssse3: 687.9 inv_txfm_add_8x32_identity_identity_0_8bpc_c: 3485.0 inv_txfm_add_8x32_identity_identity_0_8bpc_ssse3: 97.7 inv_txfm_add_8x32_identity_identity_1_8bpc_c: 3495.7 inv_txfm_add_8x32_identity_identity_1_8bpc_ssse3: 97.7 inv_txfm_add_8x32_identity_identity_2_8bpc_c: 3503.7 inv_txfm_add_8x32_identity_identity_2_8bpc_ssse3: 97.8 inv_txfm_add_8x32_identity_identity_3_8bpc_c: 3489.5 inv_txfm_add_8x32_identity_identity_3_8bpc_ssse3: 184.4 inv_txfm_add_8x32_identity_identity_4_8bpc_c: 3498.1 inv_txfm_add_8x32_identity_identity_4_8bpc_ssse3: 182.8 inv_txfm_add_32x8_dct_dct_0_8bpc_c: 1220.4 inv_txfm_add_32x8_dct_dct_0_8bpc_ssse3: 65.6 inv_txfm_add_32x8_dct_dct_1_8bpc_c: 11120.7 inv_txfm_add_32x8_dct_dct_1_8bpc_ssse3: 623.8 inv_txfm_add_32x8_dct_dct_2_8bpc_c: 12236.3 inv_txfm_add_32x8_dct_dct_2_8bpc_ssse3: 624.7 inv_txfm_add_32x8_dct_dct_3_8bpc_c: 10866.3 inv_txfm_add_32x8_dct_dct_3_8bpc_ssse3: 694.1 inv_txfm_add_32x8_dct_dct_4_8bpc_c: 10322.8 inv_txfm_add_32x8_dct_dct_4_8bpc_ssse3: 692.5 inv_txfm_add_32x8_identity_identity_0_8bpc_c: 3368.1 inv_txfm_add_32x8_identity_identity_0_8bpc_ssse3: 98.6 inv_txfm_add_32x8_identity_identity_1_8bpc_c: 3381.1 inv_txfm_add_32x8_identity_identity_1_8bpc_ssse3: 98.3 inv_txfm_add_32x8_identity_identity_2_8bpc_c: 3376.6 inv_txfm_add_32x8_identity_identity_2_8bpc_ssse3: 98.3 inv_txfm_add_32x8_identity_identity_3_8bpc_c: 3364.3 inv_txfm_add_32x8_identity_identity_3_8bpc_ssse3: 182.2 inv_txfm_add_32x8_identity_identity_4_8bpc_c: 3390.0 inv_txfm_add_32x8_identity_identity_4_8bpc_ssse3: 182.2
2019-03-18	Add SSSE3 implementation for ipred_cfl_ac_420 and ipred_cfl_ac_422	Xuefeng Jiang
	cfl_ac_420_w4_8bpc_c: 1621.0 cfl_ac_420_w4_8bpc_ssse3: 92.5 cfl_ac_420_w8_8bpc_c: 3344.1 cfl_ac_420_w8_8bpc_ssse3: 115.4 cfl_ac_420_w16_8bpc_c: 6024.9 cfl_ac_420_w16_8bpc_ssse3: 187.8 cfl_ac_422_w4_8bpc_c: 1762.5 cfl_ac_422_w4_8bpc_ssse3: 81.4 cfl_ac_422_w8_8bpc_c: 4941.2 cfl_ac_422_w8_8bpc_ssse3: 166.5 cfl_ac_422_w16_8bpc_c: 8261.8 cfl_ac_422_w16_8bpc_ssse3: 272.3
2019-03-17	decode: add a frame tile data buffer size check	James Almer
	This check was already done in dav1d_parse_obus(), so it's added as an assert here for extra precaution.
2019-03-17	decode: don't realloc the tile data buffer when it needs to be enlarged	James Almer
	Its previous contents don't need to be preserved.
2019-03-14	tools/dav1d/md5: bswap big endian high bit depth pixel data	Janne Grunau

2019-03-14	tools/dav1d: make the md5 muxer endian-aware	Janne Grunau
	Fixes tests on big endian architectures.
2019-03-14	On the road to 0.2.2	Jean-Baptiste Kempf