Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/FFmpeg/FFmpeg.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-10-22doc: fix spelling errorsAndreas Cadhalpun
Thanks to Mathieu Malaterre <malat@debian.org> for reporting the Que/Queue typo. (https://bugs.debian.org/839542) Reviewed-by: Lou Logan <lou@lrcd.com> Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2016-10-18aacenc: add SIMD optimizations for abs_pow34 and quantizationRostislav Pehlivanov
Performance improvements: quant_bands: with: 681 decicycles in quant_bands, 8388453 runs, 155 skips without: 1190 decicycles in quant_bands, 8388386 runs, 222 skips Around 42% for the function Twoloop coder: abs_pow34: with/without: 7.82s/8.17s Around 4% for the entire encoder Both: with/without: 7.15s/8.17s Around 12% for the entire encoder Fast coder: abs_pow34: with/without: 3.40s/3.77s Around 10% for the entire encoder Both: with/without: 3.02s/3.77s Around 20% faster for the entire encoder Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com> Tested-by: Michael Niedermayer <michael@niedermayer.cc> Reviewed-by: James Almer <jamrial@gmail.com>
2016-10-02avcodec: fix arguments on xmm/neon clobber test wrappersJames Almer
Signed-off-by: James Almer <jamrial@gmail.com>
2016-10-01avcodec: add missing xmm/neon clobber test wrappers for the new encode APIJames Almer
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
2016-09-23x86/h264_weight: use appropriate register size for weight parametersHendrik Leppkes
Fixes trac 5579 Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Acked-by: Michael Niedermayer <michael@niedermayer.cc>
2016-09-23avcodec/h264: Use ptrdiff_t for (bi)weight functionsMichael Niedermayer
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-08-07avcodec/ttadsp: cosmeticsJames Almer
Clean some header includes and use the same naming scheme as in ttaencdsp Signed-off-by: James Almer <jamrial@gmail.com>
2016-08-02x86/ttaenc: add ff_ttaenc_filter_process_{ssse3,sse4}James Almer
Signed-off-by: James Almer <jamrial@gmail.com>
2016-07-29Merge commit '9df889a5f116c1ee78c2f239e0ba599c492431aa'Clément Bœsch
* commit '9df889a5f116c1ee78c2f239e0ba599c492431aa': h264: rename h264.[ch] to h264dec.[ch] Merged-by: Clément Bœsch <u@pkh.me>
2016-07-26vp9: add mxext versions of the single-block (w=8,npx=8) h/v loopfilters.Ronald S. Bultje
Each takes about 0.1% of runtime in my profiles, and they didn't have any SIMD yet so far (we only had simd for npx=16 double-block versions).
2016-07-26vp9: add mxext versions of the single-block (w=4,npx=8) h/v loopfilters.Ronald S. Bultje
Each takes about 0.5% of runtime in my profiles, and they didn't have any SIMD yet so far (we only had simd for npx=16 double-block versions).
2016-07-26vp9: add 32x32 idct AVX2 implementation.Ronald S. Bultje
About 1.8x speedup compared to AVX version for full IDCT. Other sub-IDCT scenarios also see speedups. Full --bench output for idct_32x32_add_{bpp}_${subidct}_${opt} (50k cycles): nop: 16.5 vp9_inv_dct_dct_32x32_add_8_1_c: 2284.4 vp9_inv_dct_dct_32x32_add_8_1_sse2: 145.0 vp9_inv_dct_dct_32x32_add_8_1_ssse3: 137.4 vp9_inv_dct_dct_32x32_add_8_1_avx: 137.1 vp9_inv_dct_dct_32x32_add_8_1_avx2: 73.2 vp9_inv_dct_dct_32x32_add_8_2_c: 14680.8 vp9_inv_dct_dct_32x32_add_8_2_sse2: 2617.2 vp9_inv_dct_dct_32x32_add_8_2_ssse3: 982.9 vp9_inv_dct_dct_32x32_add_8_2_avx: 958.5 vp9_inv_dct_dct_32x32_add_8_2_avx2: 704.2 vp9_inv_dct_dct_32x32_add_8_4_c: 14443.1 vp9_inv_dct_dct_32x32_add_8_4_sse2: 2717.1 vp9_inv_dct_dct_32x32_add_8_4_ssse3: 965.7 vp9_inv_dct_dct_32x32_add_8_4_avx: 1000.7 vp9_inv_dct_dct_32x32_add_8_4_avx2: 717.1 vp9_inv_dct_dct_32x32_add_8_8_c: 14436.4 vp9_inv_dct_dct_32x32_add_8_8_sse2: 2671.8 vp9_inv_dct_dct_32x32_add_8_8_ssse3: 1038.5 vp9_inv_dct_dct_32x32_add_8_8_avx: 983.0 vp9_inv_dct_dct_32x32_add_8_8_avx2: 729.4 vp9_inv_dct_dct_32x32_add_8_16_c: 14614.7 vp9_inv_dct_dct_32x32_add_8_16_sse2: 2701.7 vp9_inv_dct_dct_32x32_add_8_16_ssse3: 1334.4 vp9_inv_dct_dct_32x32_add_8_16_avx: 1276.7 vp9_inv_dct_dct_32x32_add_8_16_avx2: 719.5 vp9_inv_dct_dct_32x32_add_8_32_c: 14363.6 vp9_inv_dct_dct_32x32_add_8_32_sse2: 2575.6 vp9_inv_dct_dct_32x32_add_8_32_ssse3: 2633.9 vp9_inv_dct_dct_32x32_add_8_32_avx: 2539.6 vp9_inv_dct_dct_32x32_add_8_32_avx2: 1395.0
2016-07-20x86/diracdsp: make ff_put_signed_rect_clamped_10_sse4 work on x86_32James Almer
Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
2016-07-12diracdsp_init: add missing ARCH_X86_64 checkRostislav Pehlivanov
That SIMD is still x86_64 only for now. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2016-07-12diracdsp: add SIMD for the 10 bit version of put_signed_rect_clampedRostislav Pehlivanov
Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>
2016-07-12diracdsp: add dequantization SIMDRostislav Pehlivanov
Currently unused, to be used in the following commits. Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>
2016-07-11vp9: add 16x16 idct avx2 (8-bit).Ronald S. Bultje
checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows that it's about 1.65x as fast as the AVX version for the full IDCT, and similar speedups for the sub-IDCTs: nop: 24.6 vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8 vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6 vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4 vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2 vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5 vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7 vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9 vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2 vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9 vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3 vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7 vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4 vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1 vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1 vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0 vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4 vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6 vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7 vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9 vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2 vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6 vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5 vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0 vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9 vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4
2016-07-09Merge commit 'f1a9eee41c4b5ea35db9ff0088ce4e6f1e187f2c'Clément Bœsch
* commit 'f1a9eee41c4b5ea35db9ff0088ce4e6f1e187f2c': x86: Add missing movsxd for the int stride parameter Merged-by: Clément Bœsch <u@pkh.me>
2016-07-05x86/dcadsp: optimize lfe_fir0_float_fma3 on x86_32James Almer
About 10% faster. Signed-off-by: James Almer <jamrial@gmail.com>
2016-07-04avcodec: add missing xmm/neon clobber test wrappers for the new decode APIJames Almer
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
2016-06-27asm: FF_-prefix internal macros used in inline assemblyMatthieu Bouron
See merge commit '39d6d3618d48625decaff7d9bdbb45b44ef2a805'.
2016-06-26Merge commit 'dc40a70c5755bccfb1a1349639943e1f408bea50'Hendrik Leppkes
* commit 'dc40a70c5755bccfb1a1349639943e1f408bea50': Drop unnecessary libavutil/x86/asm.h #includes Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-06-22Merge commit 'a6a750c7ef240b72ce01e9653343a0ddf247d196'Clément Bœsch
* commit 'a6a750c7ef240b72ce01e9653343a0ddf247d196': tests: Move all test programs to a subdirectory Merged-by: Clément Bœsch <clement@stupeflix.com>
2016-06-21Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb'Clément Bœsch
* commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb': cosmetics: Fix spelling mistakes Merged-by: Clément Bœsch <u@pkh.me>
2016-06-21h264: rename h264.[ch] to h264dec.[ch]Anton Khirnov
This is more consistent with the naming of other decoders.
2016-06-17x86: Add missing movsxd for the int stride parameterMartin Storsjö
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-06-14x86/aacpsdsp: optimize add_squares loopJames Almer
Signed-off-by: James Almer <jamrial@gmail.com>
2016-06-08x86/aacdec: use HADDPS macroJames Almer
Signed-off-by: James Almer <jamrial@gmail.com>
2016-05-28Drop unnecessary libavutil/x86/asm.h #includesDiego Biurrun
2016-05-28asm: FF_-prefix internal macros used in inline assemblyDiego Biurrun
These warnings conflict with system macros on Solaris, producing truckloads of warnings about macro redefinition.
2016-05-13tests: Move all test programs to a subdirectoryDiego Biurrun
2016-05-08x86: lossless audio: SSE4 madd 32bitsChristophe Gisquet
The unique user so far is wmalossless 24bits. The few samples tested show an order of 8, so more unrolling or an avx2 version do not make sense. Timings: 68 -> 49 cycles Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-05-04cosmetics: Fix spelling mistakesVittorio Giovara
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-04-12Merge commit '73ff983e8dd22ccee166403d0bbbc9c1cd543622'Derek Buitenhuis
* commit '73ff983e8dd22ccee166403d0bbbc9c1cd543622': fft: x86: cosmetics: Drop silly comments, add comment, whitespace Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-04-07build: miscellaneous cosmeticsDiego Biurrun
Restore alphabetical order in lists, break overly long lines, do some prettyprinting, add some explanatory section comments, group parts together that belong together logically.
2016-03-04avcodec/fft: Add revtab32 for FFTs with more than 65536 samplesMichael Niedermayer
x86 optimizations are used only for the cases they support (<=65536 samples) Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-03-04avcodec: Extend fft to size 2^17Michael Niedermayer
Asked-for-by: durandal_1707 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-03-01fft: Split MDCT bits off from FFTDiego Biurrun
2016-02-29x86/vc1dsp: Split the file into MC and loopfilterTimothy Gu
2016-02-26fft: x86: cosmetics: Drop silly comments, add comment, whitespaceDiego Biurrun
2016-02-24Merge commit '15a24614aef5836af3cd2c7cc3b2b737eee6bf3c'Derek Buitenhuis
* commit '15a24614aef5836af3cd2c7cc3b2b737eee6bf3c': build: Add vc1dsp component for more fine-grained dependencies Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-02-23x86: hevc: Fix linking with both yasm and optimizations disabledDiego Biurrun
Some optimized functions reference optimized symbols, so the functions must be explicitly disabled when those symbols are unavailable.
2016-02-23x86/dcadec: add ff_lfe_fir1_float_{sse3,avx}James Almer
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-19build: Add vc1dsp component for more fine-grained dependenciesDiego Biurrun
2016-02-16Merge commit 'e280fe13291e9c712a5f4aa13b5263f3e8afed45'Derek Buitenhuis
* commit 'e280fe13291e9c712a5f4aa13b5263f3e8afed45': v210: Use separate sample_factors Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-02-16Merge commit 'eafb05fcf37cd19a910ca3b17824384f9006bc0a'Derek Buitenhuis
* commit 'eafb05fcf37cd19a910ca3b17824384f9006bc0a': v210: x86: Add the correct guards around the asm code Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2016-02-15x86: use the new helper macros where usefulJames Almer
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-14x86/vc1dsp: Port vc1_*_hor_16b_shift2 to NASM formatTimothy Gu
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
2016-02-07huffyuvencdsp: Undefine "i" macro after each useTimothy Gu
2016-02-06x86/dcadec: add ff_lfe_fir0_float_{sse,sse2,avx,fma3}James Almer
Up to ~4 times faster on x86_64, ~8 times on x86_32 if compiling using x87 fp math. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>