Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/videolan/dav1d.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-05-10Add __attribute__((cold)) to rarely used functionsHenrik Gramner
2019-05-09Add fps counter and --realtime, --frametimes and --realtimecache optionsPablo Stebler
Fixes #262.
2019-05-09build: Use -mcmodel=small on 64-bit WindowsHenrik Gramner
GCC (MinGW) uses -mcmodel=medium by default which results in somewhat inefficient code, and there's no benefit for us in using that.
2019-05-09Increase thread stack sizeHenrik Gramner
Stack usage can increase significantly when running under certain sanitizers which may exceed the previously used value.
2019-05-09arm: Simplify includes for util.SMartin Storsjö
Other source files use this form for this include.
2019-05-09fuzzer: fix oss-fuzz undefined behavior sanitizer buildJanne Grunau
2019-05-09Add a DAV1D_ERR define to negate errno values when neededJames Almer
2019-05-09Fix buffer overflow in 64x16 ssse3 idctHenrik Gramner
With frame threading enabled the code could previously clobber the coefficients of the next block. Update the checkasm test to check for this.
2019-05-08arm64: remove invalid macro argument delimiterJanne Grunau
2019-05-08Add SSSE3 implementation for ipred_filterLiwei Wang
Cycle times: intra_pred_filter_w4_8bpc_c: 457.3 intra_pred_filter_w4_8bpc_ssse3: 66.7 intra_pred_filter_w8_8bpc_c: 1401.6 intra_pred_filter_w8_8bpc_ssse3: 208.6 intra_pred_filter_w16_8bpc_c: 2719.5 intra_pred_filter_w16_8bpc_ssse3: 431.3 intra_pred_filter_w32_8bpc_c: 6666.4 intra_pred_filter_w32_8bpc_ssse3: 936.7
2019-05-07ci: Check for unprefixed global symbolsHenrik Gramner
2019-05-07Fix all remaining symbols without a dav1d prefixHenrik Gramner
2019-05-06ci: Ignore binary files in style checkHenrik Gramner
2019-05-06Add missing dav1d prefixes to picture allocation functionsHenrik Gramner
2019-05-04Control the stack size of spawned threadsHenrik Gramner
On some systems (e.g. Google Fuchsia) the default stack size of new threads is insufficient, resulting in crashes. On other systems the default stack size is unnecessarily large, which can waste a lot of virtual memory. By setting it to a sufficiently large fixed value we can ensure that we don't run out of stack space while keeping down memory usage.
2019-05-04arm64: msac: Implement NEON msac_decode_symbol_adaptMartin Storsjö
Cortex A53 A72 A73 msac_decode_symbol_adapt4_c: 107.6 57.1 67.8 msac_decode_symbol_adapt4_neon: 70.4 56.4 55.1 msac_decode_symbol_adapt8_c: 157.1 74.5 90.3 msac_decode_symbol_adapt8_neon: 75.6 57.2 56.9 msac_decode_symbol_adapt16_c: 257.4 106.6 135.9 msac_decode_symbol_adapt16_neon: 101.8 62.0 65.2
2019-05-04itx_tmpl: Fix the assert in inv_txfm_add_cMartin Storsjö
The previous form of the assert was automatically true for any value of w and h.
2019-04-29Release 0.3.00.3.0Jean-Baptiste Kempf
2019-04-24ci: Add a test for x86-64 with 16-byte stack alignmentHenrik Gramner
2019-04-24Update NEWS for 0.3.0 - SailfishJean-Baptiste Kempf
2019-04-22Fix crash in SSSE3 inverse transformHenrik Gramner
The 32x32 identity_identity transform would corrupt the stack, including the return address, when compiling with a 16-byte stack alignment on non-Windows systems.
2019-04-19Update NEWS for 0.2.20.2.2Jean-Baptiste Kempf
2019-04-18Add SSSE3 implementation for the {16, 32, 64}x64 and 64 x{16, 32} blocks in itxLiwei Wang
Cycle times: inv_txfm_add_16x64_dct_dct_0_8bpc_c: 3973.5 inv_txfm_add_16x64_dct_dct_0_8bpc_ssse3: 185.7 inv_txfm_add_16x64_dct_dct_1_8bpc_c: 37869.1 inv_txfm_add_16x64_dct_dct_1_8bpc_ssse3: 2103.1 inv_txfm_add_16x64_dct_dct_2_8bpc_c: 37822.9 inv_txfm_add_16x64_dct_dct_2_8bpc_ssse3: 2099.1 inv_txfm_add_16x64_dct_dct_3_8bpc_c: 37871.7 inv_txfm_add_16x64_dct_dct_3_8bpc_ssse3: 2663.5 inv_txfm_add_16x64_dct_dct_4_8bpc_c: 38002.9 inv_txfm_add_16x64_dct_dct_4_8bpc_ssse3: 2589.7 inv_txfm_add_32x64_dct_dct_0_8bpc_c: 8319.2 inv_txfm_add_32x64_dct_dct_0_8bpc_ssse3: 376.9 inv_txfm_add_32x64_dct_dct_1_8bpc_c: 85956.8 inv_txfm_add_32x64_dct_dct_1_8bpc_ssse3: 4298.1 inv_txfm_add_32x64_dct_dct_2_8bpc_c: 89906.2 inv_txfm_add_32x64_dct_dct_2_8bpc_ssse3: 4291.3 inv_txfm_add_32x64_dct_dct_3_8bpc_c: 83710.9 inv_txfm_add_32x64_dct_dct_3_8bpc_ssse3: 5589.5 inv_txfm_add_32x64_dct_dct_4_8bpc_c: 87733.5 inv_txfm_add_32x64_dct_dct_4_8bpc_ssse3: 5658.4 inv_txfm_add_64x16_dct_dct_0_8bpc_c: 3895.9 inv_txfm_add_64x16_dct_dct_0_8bpc_ssse3: 179.5 inv_txfm_add_64x16_dct_dct_1_8bpc_c: 51375.2 inv_txfm_add_64x16_dct_dct_1_8bpc_ssse3: 3859.2 inv_txfm_add_64x16_dct_dct_2_8bpc_c: 52562.9 inv_txfm_add_64x16_dct_dct_2_8bpc_ssse3: 4044.1 inv_txfm_add_64x16_dct_dct_3_8bpc_c: 51347.0 inv_txfm_add_64x16_dct_dct_3_8bpc_ssse3: 5259.5 inv_txfm_add_64x16_dct_dct_4_8bpc_c: 49642.2 inv_txfm_add_64x16_dct_dct_4_8bpc_ssse3: 4008.4 inv_txfm_add_64x32_dct_dct_0_8bpc_c: 7196.4 inv_txfm_add_64x32_dct_dct_0_8bpc_ssse3: 355.8 inv_txfm_add_64x32_dct_dct_1_8bpc_c: 106588.4 inv_txfm_add_64x32_dct_dct_1_8bpc_ssse3: 4965.3 inv_txfm_add_64x32_dct_dct_2_8bpc_c: 106230.7 inv_txfm_add_64x32_dct_dct_2_8bpc_ssse3: 4772.0 inv_txfm_add_64x32_dct_dct_3_8bpc_c: 107427.0 inv_txfm_add_64x32_dct_dct_3_8bpc_ssse3: 7146.9 inv_txfm_add_64x32_dct_dct_4_8bpc_c: 111785.7 inv_txfm_add_64x32_dct_dct_4_8bpc_ssse3: 7156.2 inv_txfm_add_64x64_dct_dct_0_8bpc_c: 14512.4 inv_txfm_add_64x64_dct_dct_0_8bpc_ssse3: 674.2 inv_txfm_add_64x64_dct_dct_1_8bpc_c: 173246.3 inv_txfm_add_64x64_dct_dct_1_8bpc_ssse3: 8790.8 inv_txfm_add_64x64_dct_dct_2_8bpc_c: 174264.6 inv_txfm_add_64x64_dct_dct_2_8bpc_ssse3: 8767.6 inv_txfm_add_64x64_dct_dct_3_8bpc_c: 170047.3 inv_txfm_add_64x64_dct_dct_3_8bpc_ssse3: 10784.9 inv_txfm_add_64x64_dct_dct_4_8bpc_c: 170182.2 inv_txfm_add_64x64_dct_dct_4_8bpc_ssse3: 10795.6
2019-04-17Over-allocate level array by 3-bytesRonald S. Bultje
This is a workaround so that the AVX2 implementation of deblock can index the levels array starting from the level type, which causes it to over-read by up to 3 bytes. This is intended to fix #269.
2019-04-16arm64: loopfilter: Implement NEON loop filtersMartin Storsjö
The exact relative speedup compared to C code is a bit vague and hard to measure, depending on eactly how many filtered blocks are skipped, as the NEON version always filters 16 pixels at a time, while the C code can skip processing individual 4 pixel blocks. Additionally, the checkasm benchmarking code runs the same function repeatedly on the same buffer, which can make the filter take different codepaths on each run, as the function updates the buffer which will be used as input for the next run. If tweaking the checkasm test data to try to avoid skipped blocks, the relative speedups compared to C is between 2x and 5x, while it is around 1x to 4x with the current checkasm test as such. Benchmark numbers from a tweaked checkasm that avoids skipped blocks: Cortex A53 A72 A73 lpf_h_sb_uv_w4_8bpc_c: 2954.7 1399.3 1655.3 lpf_h_sb_uv_w4_8bpc_neon: 895.5 650.8 692.0 lpf_h_sb_uv_w6_8bpc_c: 3879.2 1917.2 2257.7 lpf_h_sb_uv_w6_8bpc_neon: 1125.6 759.5 838.4 lpf_h_sb_y_w4_8bpc_c: 6711.0 3275.5 3913.7 lpf_h_sb_y_w4_8bpc_neon: 1744.0 1342.1 1351.5 lpf_h_sb_y_w8_8bpc_c: 10695.7 6155.8 6638.9 lpf_h_sb_y_w8_8bpc_neon: 2146.5 1560.4 1609.1 lpf_h_sb_y_w16_8bpc_c: 11355.8 6292.0 6995.9 lpf_h_sb_y_w16_8bpc_neon: 2475.4 1949.6 1968.4 lpf_v_sb_uv_w4_8bpc_c: 2639.7 1204.8 1425.9 lpf_v_sb_uv_w4_8bpc_neon: 510.7 351.4 334.7 lpf_v_sb_uv_w6_8bpc_c: 3468.3 1757.1 2021.5 lpf_v_sb_uv_w6_8bpc_neon: 625.0 415.0 397.8 lpf_v_sb_y_w4_8bpc_c: 5428.7 2731.7 3068.5 lpf_v_sb_y_w4_8bpc_neon: 1172.6 792.1 768.0 lpf_v_sb_y_w8_8bpc_c: 8946.1 4412.8 5121.0 lpf_v_sb_y_w8_8bpc_neon: 1565.5 1063.6 1062.7 lpf_v_sb_y_w16_8bpc_c: 8978.9 4411.7 5112.0 lpf_v_sb_y_w16_8bpc_neon: 1775.0 1288.1 1236.7
2019-04-16arm64: looprestoration: Add a NEON implementation of SGRMartin Storsjö
Relative speedup vs (autovectorized) C code: Cortex A53 A72 A73 selfguided_3x3_8bpc_neon: 2.91 2.12 2.68 selfguided_5x5_8bpc_neon: 3.18 2.65 3.39 selfguided_mix_8bpc_neon: 3.04 2.29 2.98 The relative speedup vs non-vectorized C code is around 2.6-4.6x.
2019-04-16msac: Add a cast to indicate intended narrowing from size_t to unsignedMartin Storsjö
This fixes this compiler warning with MSVC: ../src/msac.c(148): warning C4267: '+=': conversion from 'size_t' to 'unsigned int', possible loss of data
2019-04-15x86-64: Add msac_decode_symbol_adapt SSE2 asmHenrik Gramner
Also make various minor optimizations/style fixes to the MSAC C functions.
2019-04-10Add SSSE3 implementation for ipred_paethXuefeng Jiang
intra_pred_paeth_w4_8bpc_c: 561.6 intra_pred_paeth_w4_8bpc_ssse3: 49.2 intra_pred_paeth_w8_8bpc_c: 1475.8 intra_pred_paeth_w8_8bpc_ssse3: 103.0 intra_pred_paeth_w16_8bpc_c: 4697.8 intra_pred_paeth_w16_8bpc_ssse3: 279.0 intra_pred_paeth_w32_8bpc_c: 13245.1 intra_pred_paeth_w32_8bpc_ssse3: 614.7 intra_pred_paeth_w64_8bpc_c: 32638.9 intra_pred_paeth_w64_8bpc_ssse3: 1477.6
2019-04-08arm: Add a _neon suffix to all internal functionsMartin Storsjö
This eases disambiguating these functions when looking at perf profiles.
2019-04-08arm: Fix typos in commentsMartin Storsjö
The width register has been set to clz(w)-24, not the other way around. And the 32 bit prep function has got the h parameter in r4, not in r5.
2019-04-04arm: Consistently use 8/24 columns indentation for assemblyMartin Storsjö
For cases with indented, nested .if/.macro in asm.S, ident those by 4 chars. Some initial assembly files were indented to 4/16 columns, while all the actual implementation files, starting with src/arm/64/mc.S, have used 8/24 for indentation.
2019-04-04Add SSSE3 implementation for ipred_cfl_ac_444Xuefeng Jiang
cfl_ac_444_w4_8bpc_c: 978.2 cfl_ac_444_w4_8bpc_ssse3: 110.4 cfl_ac_444_w8_8bpc_c: 2312.3 cfl_ac_444_w8_8bpc_ssse3: 197.5 cfl_ac_444_w16_8bpc_c: 4081.1 cfl_ac_444_w16_8bpc_ssse3: 274.1 cfl_ac_444_w32_8bpc_c: 9544.3 cfl_ac_444_w32_8bpc_ssse3: 617.1
2019-03-28CI: Check for newline at end of fileHenrik Gramner
2019-03-28x86: cdef_dir: optimize best cost finding for SSEVictorien Le Couviour--Tuffet
Port of 65ee1233cf86f03e029d0520f7cc5a3e152d3bbd for AVX-2 from Kyle Siefring to SSE41, and optimize SSSE3. --------------------- x86_64: ------------------------------------------ before: cdef_dir_8bpc_ssse3: 110.3 after: cdef_dir_8bpc_ssse3: 105.9 new: cdef_dir_8bpc_sse4: 96.4 ------------------------------------------ --------------------- x86_32: ------------------------------------------ before: cdef_dir_8bpc_ssse3: 120.6 after: cdef_dir_8bpc_ssse3: 110.7 new: cdef_dir_8bpc_sse4: 106.5 ------------------------------------------
2019-03-28x86: cdef_filter: use 8-bit arithmetic for SSEVictorien Le Couviour--Tuffet
Port of c204da0ff33a0d563d6c632b42799e4fbc48f402 for AVX-2 from Kyle Siefring. --------------------- x86_64: ------------------------------------------ before: cdef_filter_4x4_8bpc_ssse3: 141.7 after: cdef_filter_4x4_8bpc_ssse3: 131.6 before: cdef_filter_4x4_8bpc_sse4: 128.3 after: cdef_filter_4x4_8bpc_sse4: 119.0 ------------------------------------------ before: cdef_filter_4x8_8bpc_ssse3: 253.4 after: cdef_filter_4x8_8bpc_ssse3: 236.1 before: cdef_filter_4x8_8bpc_sse4: 228.5 after: cdef_filter_4x8_8bpc_sse4: 213.2 ------------------------------------------ before: cdef_filter_8x8_8bpc_ssse3: 429.6 after: cdef_filter_8x8_8bpc_ssse3: 386.9 before: cdef_filter_8x8_8bpc_sse4: 379.9 after: cdef_filter_8x8_8bpc_sse4: 335.9 ------------------------------------------ --------------------- x86_32: ------------------------------------------ before: cdef_filter_4x4_8bpc_ssse3: 184.3 after: cdef_filter_4x4_8bpc_ssse3: 163.3 before: cdef_filter_4x4_8bpc_sse4: 168.9 after: cdef_filter_4x4_8bpc_sse4: 146.1 ------------------------------------------ before: cdef_filter_4x8_8bpc_ssse3: 335.3 after: cdef_filter_4x8_8bpc_ssse3: 280.7 before: cdef_filter_4x8_8bpc_sse4: 305.1 after: cdef_filter_4x8_8bpc_sse4: 257.9 ------------------------------------------ before: cdef_filter_8x8_8bpc_ssse3: 579.1 after: cdef_filter_8x8_8bpc_ssse3: 500.5 before: cdef_filter_8x8_8bpc_sse4: 517.0 after: cdef_filter_8x8_8bpc_sse4: 455.8 ------------------------------------------
2019-03-28x86: cdef_filter: use a better constant for SSE4Victorien Le Couviour--Tuffet
Port of dc2ae517648accc0fe4ac0737f9ee850accda278 for AVX-2 from Kyle Siefring. --------------------- x86_64: ------------------------------------------ cdef_filter_4x4_8bpc_ssse3: 141.7 cdef_filter_4x4_8bpc_sse4: 128.3 ------------------------------------------ cdef_filter_4x8_8bpc_ssse3: 253.4 cdef_filter_4x8_8bpc_sse4: 228.5 ------------------------------------------ cdef_filter_8x8_8bpc_ssse3: 429.6 cdef_filter_8x8_8bpc_sse4: 379.9 ------------------------------------------ --------------------- x86_32: ------------------------------------------ cdef_filter_4x4_8bpc_ssse3: 184.3 cdef_filter_4x4_8bpc_sse4: 168.9 ------------------------------------------ cdef_filter_4x8_8bpc_ssse3: 335.3 cdef_filter_4x8_8bpc_sse4: 305.1 ------------------------------------------ cdef_filter_8x8_8bpc_ssse3: 579.1 cdef_filter_8x8_8bpc_sse4: 517.0 ------------------------------------------
2019-03-28x86: cdef_filter: fix macro case (lower to upper)Victorien Le Couviour--Tuffet
2019-03-27Add SSSE3 implementation for the 16x32,32x16 and 32x32 blocks in itxLiwei Wang
Cycle times: inv_txfm_add_16x32_dct_dct_0_8bpc_c: 2464.6 inv_txfm_add_16x32_dct_dct_0_8bpc_ssse3: 121.6 inv_txfm_add_16x32_dct_dct_1_8bpc_c: 24751.6 inv_txfm_add_16x32_dct_dct_1_8bpc_ssse3: 1101.9 inv_txfm_add_16x32_dct_dct_2_8bpc_c: 24377.0 inv_txfm_add_16x32_dct_dct_2_8bpc_ssse3: 1117.2 inv_txfm_add_16x32_dct_dct_3_8bpc_c: 24155.6 inv_txfm_add_16x32_dct_dct_3_8bpc_ssse3: 2349.3 inv_txfm_add_16x32_dct_dct_4_8bpc_c: 24175.6 inv_txfm_add_16x32_dct_dct_4_8bpc_ssse3: 1642.0 inv_txfm_add_16x32_identity_identity_0_8bpc_c: 10304.7 inv_txfm_add_16x32_identity_identity_0_8bpc_ssse3: 137.7 inv_txfm_add_16x32_identity_identity_1_8bpc_c: 10341.6 inv_txfm_add_16x32_identity_identity_1_8bpc_ssse3: 137.9 inv_txfm_add_16x32_identity_identity_2_8bpc_c: 10299.9 inv_txfm_add_16x32_identity_identity_2_8bpc_ssse3: 253.9 inv_txfm_add_16x32_identity_identity_3_8bpc_c: 10331.4 inv_txfm_add_16x32_identity_identity_3_8bpc_ssse3: 369.7 inv_txfm_add_16x32_identity_identity_4_8bpc_c: 10360.4 inv_txfm_add_16x32_identity_identity_4_8bpc_ssse3: 484.0 inv_txfm_add_32x16_dct_dct_0_8bpc_c: 2288.4 inv_txfm_add_32x16_dct_dct_0_8bpc_ssse3: 142.3 inv_txfm_add_32x16_dct_dct_1_8bpc_c: 23819.9 inv_txfm_add_32x16_dct_dct_1_8bpc_ssse3: 1740.1 inv_txfm_add_32x16_dct_dct_2_8bpc_c: 23755.8 inv_txfm_add_32x16_dct_dct_2_8bpc_ssse3: 1641.4 inv_txfm_add_32x16_dct_dct_3_8bpc_c: 23839.9 inv_txfm_add_32x16_dct_dct_3_8bpc_ssse3: 1559.0 inv_txfm_add_32x16_dct_dct_4_8bpc_c: 23757.7 inv_txfm_add_32x16_dct_dct_4_8bpc_ssse3: 1579.0 inv_txfm_add_32x16_identity_identity_0_8bpc_c: 10381.7 inv_txfm_add_32x16_identity_identity_0_8bpc_ssse3: 126.3 inv_txfm_add_32x16_identity_identity_1_8bpc_c: 10402.5 inv_txfm_add_32x16_identity_identity_1_8bpc_ssse3: 126.5 inv_txfm_add_32x16_identity_identity_2_8bpc_c: 10429.2 inv_txfm_add_32x16_identity_identity_2_8bpc_ssse3: 244.9 inv_txfm_add_32x16_identity_identity_3_8bpc_c: 10382.0 inv_txfm_add_32x16_identity_identity_3_8bpc_ssse3: 491.0 inv_txfm_add_32x16_identity_identity_4_8bpc_c: 10381.0 inv_txfm_add_32x16_identity_identity_4_8bpc_ssse3: 468.0 inv_txfm_add_32x32_dct_dct_0_8bpc_c: 4168.2 inv_txfm_add_32x32_dct_dct_0_8bpc_ssse3: 204.0 inv_txfm_add_32x32_dct_dct_1_8bpc_c: 46306.2 inv_txfm_add_32x32_dct_dct_1_8bpc_ssse3: 2216.0 inv_txfm_add_32x32_dct_dct_2_8bpc_c: 46300.2 inv_txfm_add_32x32_dct_dct_2_8bpc_ssse3: 2194.2 inv_txfm_add_32x32_dct_dct_3_8bpc_c: 46350.1 inv_txfm_add_32x32_dct_dct_3_8bpc_ssse3: 3484.4 inv_txfm_add_32x32_dct_dct_4_8bpc_c: 46318.1 inv_txfm_add_32x32_dct_dct_4_8bpc_ssse3: 3440.9 inv_txfm_add_32x32_identity_identity_0_8bpc_c: 14663.1 inv_txfm_add_32x32_identity_identity_0_8bpc_ssse3: 179.0 inv_txfm_add_32x32_identity_identity_1_8bpc_c: 14737.0 inv_txfm_add_32x32_identity_identity_1_8bpc_ssse3: 179.2 inv_txfm_add_32x32_identity_identity_2_8bpc_c: 14640.4 inv_txfm_add_32x32_identity_identity_2_8bpc_ssse3: 179.1 inv_txfm_add_32x32_identity_identity_3_8bpc_c: 14638.5 inv_txfm_add_32x32_identity_identity_3_8bpc_ssse3: 663.8 inv_txfm_add_32x32_identity_identity_4_8bpc_c: 14635.6 inv_txfm_add_32x32_identity_identity_4_8bpc_ssse3: 663.9
2019-03-26build: Split x86 asm files per bitdepthHenrik Gramner
2019-03-24Only define DAV1D_API to dllexport when building dav1d itselfMartin Storsjö
As meson still doesn't allow specifying different cflags between static and dynamic libraries, this still includes the dllexport in the static library when built with default_library=both, but it at least is avoided in static-only builds, and avoids defining these symbols as dllexport in the callers' translation units.
2019-03-24Simplify C for inverse transformsHenrik Gramner
The second shift is constant.
2019-03-20x86: Add minor CDEF AVX2 optimizationsHenrik Gramner
2019-03-19Add SSSE3 implementation for the 8x32 and 32x8 blocks in itxLiwei Wang
Cycle times: inv_txfm_add_8x32_dct_dct_0_8bpc_c: 1164.7 inv_txfm_add_8x32_dct_dct_0_8bpc_ssse3: 79.5 inv_txfm_add_8x32_dct_dct_1_8bpc_c: 11291.6 inv_txfm_add_8x32_dct_dct_1_8bpc_ssse3: 508.5 inv_txfm_add_8x32_dct_dct_2_8bpc_c: 10720.4 inv_txfm_add_8x32_dct_dct_2_8bpc_ssse3: 507.9 inv_txfm_add_8x32_dct_dct_3_8bpc_c: 12351.5 inv_txfm_add_8x32_dct_dct_3_8bpc_ssse3: 687.2 inv_txfm_add_8x32_dct_dct_4_8bpc_c: 10402.3 inv_txfm_add_8x32_dct_dct_4_8bpc_ssse3: 687.9 inv_txfm_add_8x32_identity_identity_0_8bpc_c: 3485.0 inv_txfm_add_8x32_identity_identity_0_8bpc_ssse3: 97.7 inv_txfm_add_8x32_identity_identity_1_8bpc_c: 3495.7 inv_txfm_add_8x32_identity_identity_1_8bpc_ssse3: 97.7 inv_txfm_add_8x32_identity_identity_2_8bpc_c: 3503.7 inv_txfm_add_8x32_identity_identity_2_8bpc_ssse3: 97.8 inv_txfm_add_8x32_identity_identity_3_8bpc_c: 3489.5 inv_txfm_add_8x32_identity_identity_3_8bpc_ssse3: 184.4 inv_txfm_add_8x32_identity_identity_4_8bpc_c: 3498.1 inv_txfm_add_8x32_identity_identity_4_8bpc_ssse3: 182.8 inv_txfm_add_32x8_dct_dct_0_8bpc_c: 1220.4 inv_txfm_add_32x8_dct_dct_0_8bpc_ssse3: 65.6 inv_txfm_add_32x8_dct_dct_1_8bpc_c: 11120.7 inv_txfm_add_32x8_dct_dct_1_8bpc_ssse3: 623.8 inv_txfm_add_32x8_dct_dct_2_8bpc_c: 12236.3 inv_txfm_add_32x8_dct_dct_2_8bpc_ssse3: 624.7 inv_txfm_add_32x8_dct_dct_3_8bpc_c: 10866.3 inv_txfm_add_32x8_dct_dct_3_8bpc_ssse3: 694.1 inv_txfm_add_32x8_dct_dct_4_8bpc_c: 10322.8 inv_txfm_add_32x8_dct_dct_4_8bpc_ssse3: 692.5 inv_txfm_add_32x8_identity_identity_0_8bpc_c: 3368.1 inv_txfm_add_32x8_identity_identity_0_8bpc_ssse3: 98.6 inv_txfm_add_32x8_identity_identity_1_8bpc_c: 3381.1 inv_txfm_add_32x8_identity_identity_1_8bpc_ssse3: 98.3 inv_txfm_add_32x8_identity_identity_2_8bpc_c: 3376.6 inv_txfm_add_32x8_identity_identity_2_8bpc_ssse3: 98.3 inv_txfm_add_32x8_identity_identity_3_8bpc_c: 3364.3 inv_txfm_add_32x8_identity_identity_3_8bpc_ssse3: 182.2 inv_txfm_add_32x8_identity_identity_4_8bpc_c: 3390.0 inv_txfm_add_32x8_identity_identity_4_8bpc_ssse3: 182.2
2019-03-18Add SSSE3 implementation for ipred_cfl_ac_420 and ipred_cfl_ac_422Xuefeng Jiang
cfl_ac_420_w4_8bpc_c: 1621.0 cfl_ac_420_w4_8bpc_ssse3: 92.5 cfl_ac_420_w8_8bpc_c: 3344.1 cfl_ac_420_w8_8bpc_ssse3: 115.4 cfl_ac_420_w16_8bpc_c: 6024.9 cfl_ac_420_w16_8bpc_ssse3: 187.8 cfl_ac_422_w4_8bpc_c: 1762.5 cfl_ac_422_w4_8bpc_ssse3: 81.4 cfl_ac_422_w8_8bpc_c: 4941.2 cfl_ac_422_w8_8bpc_ssse3: 166.5 cfl_ac_422_w16_8bpc_c: 8261.8 cfl_ac_422_w16_8bpc_ssse3: 272.3
2019-03-17decode: add a frame tile data buffer size checkJames Almer
This check was already done in dav1d_parse_obus(), so it's added as an assert here for extra precaution.
2019-03-17decode: don't realloc the tile data buffer when it needs to be enlargedJames Almer
Its previous contents don't need to be preserved.
2019-03-14tools/dav1d/md5: bswap big endian high bit depth pixel dataJanne Grunau
2019-03-14tools/dav1d: make the md5 muxer endian-awareJanne Grunau
Fixes tests on big endian architectures.
2019-03-14On the road to 0.2.2Jean-Baptiste Kempf