Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/FFmpeg/FFmpeg.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-08-18Add macros to x86util.asm .Ivan Kalvachev
Improved version of VBROADCASTSS that works like the avx2 instruction. Emulation of vpbroadcastd. Horizontal sum HSUMPS that places the result in all elements. Emulation of blendvps and pblendvb. Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
2017-06-19x86/aacpsdsp: add ff_ps_hybrid_synthesis_deint_{sse,sse4}James Almer
About 2x faster than the c version.
2017-03-22avutil/x86util: don't use movss in VBROADCASTSS macro when src and dst args ↵James Almer
are the same Reviewed-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: James Almer <jamrial@gmail.com>
2017-03-20Merge commit '07e1f99a1bb41d1a615676140eefc85cf69fa793'Clément Bœsch
* commit '07e1f99a1bb41d1a615676140eefc85cf69fa793': x86util: Document SBUTTERFLY macro Merged-by: Clément Bœsch <u@pkh.me>
2017-02-18avcodec/h264: sse2, avx h luma mbaff deblock/loop filterJames Darnley
x86-64 only Yorkfield: - sse2: ~2.17x (434 vs. 200 cycles) Nehalem: - sse2: ~2.94x (409 vs. 139 cycles) Skylake: - sse2: ~3.10x (370 vs. 119 cycles) - avx: ~3.29x (370 vs. 112 cycles)
2017-02-18x86util: import MOVHL macroJames Darnley
Originally committed to x264 in 1637239a by Henrik Gramner who has agreed to re-license it as LGPL. Original commit message follows. x86: Avoid some bypass delays and false dependencies A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning between int and float domains, so try to avoid that if possible.
2017-02-18avcodec/x86: deduplicate PASS8ROWS macroJames Darnley
2016-09-19x86util: Document SBUTTERFLY macroAlexandra Hájková
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-07-18x86util: Extend SPLATW for avx2James Almer
Integration to Libav by Josh de Kock <josh@itanimul.li>. Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
2016-07-11vp9: add 16x16 idct avx2 (8-bit).Ronald S. Bultje
checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows that it's about 1.65x as fast as the AVX version for the full IDCT, and similar speedups for the sub-IDCTs: nop: 24.6 vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8 vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6 vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4 vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2 vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5 vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7 vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9 vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2 vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9 vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3 vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7 vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4 vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1 vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1 vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0 vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4 vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6 vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7 vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9 vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2 vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6 vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5 vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0 vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9 vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4
2016-06-09x86/showcqt: use three operand format for some instructionsJames Almer
Fixes failures with yasm 1.1.0 and older Signed-off-by: James Almer <jamrial@gmail.com>
2016-06-08avutil/x86util: move haddps sse emulation from showcqtJames Almer
Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-12x86: port PSIGNW to cpuflagsJames Almer
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
2015-08-03x86: move XOP emulation code back to x86incJames Almer
Only two functions that use xop multiply-accumulate instructions where the first operand is the same as the fourth actually took advantage of the macros. This further reduces differences with x264's x86inc. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-31x86/swr: add SSE2/AVX pack_8ch functionsJames Almer
Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-05v210enc: Add SIMD optimised 8-bit and 10-bit encodersKieran Kunhya
Signed-off-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2014-11-26v210enc: Add SIMD optimised 8-bit and 10-bit encodersKieran Kunhya
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-03x86/hevc_deblock: improve 8bit transpose store macrosJames Almer
Up to four instructions less depending on function and instruction set. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-26x86/hevc_idct: replace old and unused idct functionsJames Almer
Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial). Benchmarks on an Intel Core i5-4200U: idct8x8_dc SSE2 MMXEXT C cycles 22 26 57 idct16x16_dc AVX2 SSE2 C cycles 27 32 249 idct32x32_dc AVX2 SSE2 C cycles 62 126 1375 Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-15x86util: add and use RSHIFT/LSHIFT macrosChristophe Gisquet
Those macros take a byte number as shift argument, as this argument differs between MMX and SSE2 instructions. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-29x86: hpeldsp: better factorizationChristophe Gisquet
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-29x86/dsputilenc: implement SSE2 versions of pix_{sum16, norm1}James Almer
Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-17x86: move horizontal add macros to x86utilJames Almer
Also port relevant AVX2/XOP optimizations from x264 with permission to relicense to LGPL from the corresponding authors Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-24x86: Move XOP emulation to x86utilJames Almer
We need the emulation to support the cases where the first argument is the same as the fourth. To achieve this a fifth argument working as a temporary may be needed. Emulation that doesn't obey the original instruction semantics can't be in x86inc. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14Merge commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497'Michael Niedermayer
* commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497': x86inc: FMA3/4 Support Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14Merge commit '206895708ea2b464755d340e44501daf9a07c310'Michael Niedermayer
* commit '206895708ea2b464755d340e44501daf9a07c310': x86inc: Remove our FMA4 support Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14x86inc: FMA3/4 SupportJason Garrett-Glaser
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14x86inc: Remove our FMA4 supportDerek Buitenhuis
This is so we can sync to x264's version of FMA4 support. This partialy reverts commit 79687079a97a039c325ab79d7a95920d800b791f. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-01-19Merge commit 'd633d12b2cc999cee3ac25bf9a810fe7ff03726d'Michael Niedermayer
* commit 'd633d12b2cc999cee3ac25bf9a810fe7ff03726d': x86inc: Add cvisible macro for C functions with public prefix Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-19Merge commit 'ef5d41a5534b65f03d02f2e11a503ab8416bfc3b'Michael Niedermayer
* commit 'ef5d41a5534b65f03d02f2e11a503ab8416bfc3b': x86inc: Rename "program_name" to "private_prefix" configure: Run SHFLAGS through ldflags_filter() Conflicts: configure Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-19x86inc: Add cvisible macro for C functions with public prefixDiego Biurrun
This allows defining externally visible library symbols. Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-01-18x86inc: Rename "program_name" to "private_prefix"Diego Biurrun
The new name is more descriptive and will allow defining a separate public prefix for externally visible library symbols. Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-01-16Merge commit 'dae1d507af94261bafd3b11549884e5d1eca590e'Michael Niedermayer
* commit 'dae1d507af94261bafd3b11549884e5d1eca590e': x86: Add PAVGB macro to abstract pavgb/pavgusb instruction via cpuflags vf_fps: add final flushed frames to the dropped frame count rv34_parser: Adjust #if for disabling individual parsers Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-15x86: Add PAVGB macro to abstract pavgb/pavgusb instruction via cpuflagsDiego Biurrun
2013-01-15Merge remote-tracking branch 'qatar/master'Michael Niedermayer
* qatar/master: x86: ABSB2: port to cpuflags Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-15Merge commit '094a7405e5d8463d7d167d893e04934ec1a84ecd'Michael Niedermayer
* commit '094a7405e5d8463d7d167d893e04934ec1a84ecd': x86: ABSB: port to cpuflags sdp: Include SRTP crypto params if using the srtp protocol Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-15Merge commit 'd8c772de53d29afb1bada88afa859fce8489c668'Michael Niedermayer
* commit 'd8c772de53d29afb1bada88afa859fce8489c668': nutdec: Always return a value from nut_read_timestamp() configure: Make warnings from -Wreturn-type fatal errors x86: ABS2: port to cpuflags vdpau: Remove av_unused attribute from function declaration h264: fix ff_generate_sliding_window_mmcos() prototype. Conflicts: configure libavformat/nutdec.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-15x86: ABSB2: port to cpuflagsDiego Biurrun
2013-01-15x86: ABSB: port to cpuflagsDiego Biurrun
2013-01-15x86: ABS2: port to cpuflagsDiego Biurrun
2013-01-07Merge commit '5b4dfbffc258f90a7d2540d21209ac23afcf7cd0'Michael Niedermayer
* commit '5b4dfbffc258f90a7d2540d21209ac23afcf7cd0': x86: ABS1: port to cpuflags v210x: cosmetics, reformat Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-06x86: ABS1: port to cpuflagsDiego Biurrun
2012-12-06Merge commit '9d5c62ba5b586c80af508b5914934b1c439f6652'Michael Niedermayer
* commit '9d5c62ba5b586c80af508b5914934b1c439f6652': lavu/opt: do not filter out the initial sign character except for flags eval: treat dB as decibels instead of decibytes float_dsp: add vector_dmul_scalar() to multiply a vector of doubles Conflicts: libavutil/eval.c tests/ref/fate/eval Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-12-05float_dsp: add vector_dmul_scalar() to multiply a vector of doublesJustin Ruggles
Include x86-optimized versions for SSE2 and AVX.
2012-11-19Merge remote-tracking branch 'qatar/master'Michael Niedermayer
* qatar/master: x86: h264_intrapred: Fix C function names in comments x86: SPLATD: port to cpuflags Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-18x86: SPLATD: port to cpuflagsDiego Biurrun
2012-11-14Merge remote-tracking branch 'qatar/master'Michael Niedermayer
* qatar/master: x86: mmx2 ---> mmxext in asm constructs Conflicts: libavcodec/x86/h264_chromamc_10bit.asm libavcodec/x86/h264_deblock.asm libavcodec/x86/h264dsp_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-14x86: mmx2 ---> mmxext in asm constructsDiego Biurrun
2012-11-12Merge commit '802713c4e7b41bc2deed754d78649945c3442063'Michael Niedermayer
* commit '802713c4e7b41bc2deed754d78649945c3442063': mss2: prevent potential uninitialized reads mss2: reindent after last commit mss2: fix handling of unmasked implicit WMV9 rectangles configure: add lavu dependency to lavr/lavfi .pc files x86inc: Set program_name outside of x86inc.asm Conflicts: configure Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-11x86inc: Set program_name outside of x86inc.asmDiego Biurrun
This reduces the local difference to the x264 upstream version.