Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/FFmpeg/FFmpeg.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-11-14avcodec/x86/mpegvideodsp: Fix signedness bug in need_emuMichael Niedermayer
Fixes: out of array read Fixes: 3516/attachment-311488.dat Found-by: Insu Yun, Georgia Tech. Tested-by: wuninsu@gmail.com Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-13Fix missing used attribute for inline assembly variablesThomas Köppe
Variables used in inline assembly need to be marked with attribute((used)). Static constants already were, via the define of DECLARE_ASM_CONST. But DECLARE_ALIGNED does not add this attribute, and some of the variables defined with it are const only used in inline assembly, and therefore appeared dead. This change adds a macro DECLARE_ASM_ALIGNED that marks variables as used. This change makes FFMPEG work with Clang's ThinLTO. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-07libavcodec/lossless_video_dsp : cosmetic add better separator for each ↵Martin Vignali
function, in order to make reading of the asm file easier
2017-11-07libavcodec/lossless_videodsp : add add_bytes avx2 versionMartin Vignali
2017-10-30x86/bswapdsp: add missing preprocessor wrappers for AVX2 functionsJames Almer
Fixes build with old nasm/yasm. Signed-off-by: James Almer <jamrial@gmail.com>
2017-10-29libavcodec/bswapdsp : add AVX2 func for bswap_buf (swap uint32_t)Martin Vignali
2017-10-21Merge commit '681a86aba6cb09b98ad716d986182060c7795d20'James Almer
* commit '681a86aba6cb09b98ad716d986182060c7795d20': x86: fft: Port to cpuflags Merged-by: James Almer <jamrial@gmail.com>
2017-10-21Merge commit 'e9bb77fb1012cba1951a82136df7071f71bce8fb'James Almer
* commit 'e9bb77fb1012cba1951a82136df7071f71bce8fb': x86: h264: Simplify DEQUANT macro with cpuflags Merged-by: James Almer <jamrial@gmail.com>
2017-10-21Merge commit '307eb1a8ee363db1fcf869e427a8deb6d9538881'James Almer
* commit '307eb1a8ee363db1fcf869e427a8deb6d9538881': x86: vp8dsp: port FILTER_BILINEAR macro to cpuflags Merged-by: James Almer <jamrial@gmail.com>
2017-10-21Merge commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2'James Almer
* commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2': x86util: Port all macros to cpuflags See d5f8a642f6eb1c6e305c41dabddd0fd36ffb3f77 Merged-by: James Almer <jamrial@gmail.com>
2017-10-12Merge commit '6eef263aca281fb582e1fa3d841ac20ef747a252'James Almer
* commit '6eef263aca281fb582e1fa3d841ac20ef747a252': x86: Merge align directives into SECTION_RODATA declarations where possible Merged-by: James Almer <jamrial@gmail.com>
2017-10-05x86/blockdsp: use three operand form for an instructionJames Almer
Fixes assembling with old yasm.
2017-10-05avcodec/x86/lossless_videoencdsp: Fix warning: signed dword value exceeds boundsMichael Niedermayer
Add () to regsize define Suggested-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-10-05avcodec/x86/lossless_videoencdsp: Fix handling of small widthsMichael Niedermayer
Fixes out of array access Fixes: crash-huf.avi Regression since: 6b41b4414934cc930468ccd5db598dd6ef643987 This could also be fixed by adding checks in the C code that calls the dsp Found-by: Zhibin Hu and 连一汉 <lianyihan@360.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-10-04libavcodec/blockdsp : add AVX versionMartin Vignali
Also modify the required alignment, to 32 instead of 16 for several codecs Signed-off-by: James Almer <jamrial@gmail.com>
2017-10-01libavcodec/exr : add x86 SIMD for predictorMartin Vignali
Signed-off-by: James Almer <jamrial@gmail.com>
2017-09-27Merge commit '7abdd026df6a9a52d07d8174505b33cc89db7bf6'James Almer
* commit '7abdd026df6a9a52d07d8174505b33cc89db7bf6': asm: Consistently uppercase SECTION markers Merged-by: James Almer <jamrial@gmail.com>
2017-09-26Merge commit 'fd9212f2edfe9b107c3c08ba2df5fd2cba5ab9e3'James Almer
* commit 'fd9212f2edfe9b107c3c08ba2df5fd2cba5ab9e3': Mark some arrays that never change as const. Merged-by: James Almer <jamrial@gmail.com>
2017-09-19x86/exrdsp: optimize ff_reorder_pixels_avx2()Henrik Gramner
Tested with "checkasm --test=exrdsp -bench" Before: reorder_pixels_c: 5187.8 reorder_pixels_sse2: 377.0 reorder_pixels_avx2: 331.3 After: reorder_pixels_c: 5181.5 reorder_pixels_sse2: 377.0 reorder_pixels_avx2: 313.8 Signed-off-by: James Almer <jamrial@gmail.com>
2017-09-18avcodec/exrdsp: improve the ExrDSPContext->reorder_pixels prototypeJames Almer
Make dst be the first parameter and src const. It's more in line with the rest of the codebase. Signed-off-by: James Almer <jamrial@gmail.com>
2017-09-17libavcodec/exr : add X86 SIMD for reorder_pixelsMartin Vignali
Signed-off-by: James Almer <jamrial@gmail.com>
2017-08-22avcodec/me_cmp: Fix crashes on ARM due to misalignmentMichael Niedermayer
Adds a diff_pixels_unaligned() Fixes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=872503 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-08-20opus_pvq_search: Restore the proper use of conditional define and simplify ↵Ivan Kalvachev
the function name suffix handling. Using named define properly documents the code paths. It also avoids passing additional numbered arguments through multiple levels of macro templates. The suffix handling is done by concatenation, like in other asm functions and avoid having two separate "cglobal" defines. Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
2017-08-18opus_pvq_search: split functions into exactness and only use the exact if ↵Rostislav Pehlivanov
its faster This splits the asm function into exact and non-exact version. The exact version is as fast or faster on newer CPUs (which EXTERNAL_AVX_FAST describes well) whilst the non-exact version is faster than the exact on older CPUs. Also fixes yasm compilation which doesn't accept !cpuflags(avx) syntax. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2017-08-18opus_pvq_search: only use rsqrtps approximation on CPUs with avxRostislav Pehlivanov
Makes the search produce idential results with the C version. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2017-08-18ops_pvq_search: remove dead macroRostislav Pehlivanov
There's no point in toggling it, even for debugging. Its just worse. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2017-08-18SIMD opus pvq_search implementationIvan Kalvachev
Explanation on the workings and methods used by the Pyramid Vector Quantization Search function could be found in the following Work-In-Progress mail threads: http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212146.html http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212816.html http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213030.html http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213436.html Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
2017-07-30mdct15: add inverse transform postrotation SIMDRostislav Pehlivanov
2.5ms frames: Before (c): 2638 decicycles in postrotate, 2097040 runs, 112 skips After (sse3): 1467 decicycles in postrotate, 2097083 runs, 69 skips After (avx2): 1244 decicycles in postrotate, 2097085 runs, 67 skips 5ms frames: Before (c): 4987 decicycles in postrotate, 1048371 runs, 205 skips After (sse3): 2644 decicycles in postrotate, 1048509 runs, 67 skips After (avx2): 2031 decicycles in postrotate, 1048523 runs, 53 skips 10ms frames: Before (c): 9153 decicycles in postrotate, 523575 runs, 713 skips After (sse3): 5110 decicycles in postrotate, 523726 runs, 562 skips After (avx2): 3738 decicycles in postrotate, 524223 runs, 65 skips 20ms frames: Before (c): 17857 decicycles in postrotate, 261866 runs, 278 skips After (sse3): 10041 decicycles in postrotate, 261746 runs, 398 skips After (avx2): 7050 decicycles in postrotate, 262116 runs, 28 skips Improves total decoding performance for real world content by 9% with avx2. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2017-07-21avcodec/x86/cavsdsp: Delete #include "libavcodec/x86/idctdsp.h".Wan-Teh Chang
This file already has #include "idctdsp.h", which is resolved to the idctdsp.h header in the directory where this file resides by compilers. Two other files in this directory, libavcodec/x86/idctdsp_init.c and libavcodec/x86/xvididct_init.c, also rely on #include "idctdsp.h" working this way. Signed-off-by: Wan-Teh Chang <wtc@google.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-07-05Revert "x86/sbrdsp: remove unnecessary sign extend instruction in ↵James Almer
apply_noise_main" This reverts commit 24bb7db4037876c5722b0eecf7412502e7225634. noise has to after all be sign extended, not zero extended, on tests other than checkasm. Fixes most aac tests broken by the now reverted commit.
2017-07-05x86/sbrdsp: remove unnecessary sign extend instruction in apply_noise_mainJames Almer
noise needs to be zero extended and it can be done implicitly as a side effect in a subsequent instruction. Signed-off-by: James Almer <jamrial@gmail.com>
2017-07-05x86/sbrdsp: zero extend m_max in apply_noise_mainJames Almer
Tested-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>
2017-07-05x86/utvideodsp: make restore_rgb_planes functions work on x86_32James Almer
Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-30x86/sbrdsp: sign extend start and end gprs in ff_sbr_hf_gen_sseJames Almer
Tested-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-28avcodec/x86: use new x86-64 functions for -idct simpleJames Darnley
They now match according to FATE, barring any further bugs with untested parts
2017-06-28avcodec/x86: add an 8-bit simple IDCT function based on the x86-64 high ↵James Darnley
depth functions Includes add/put functions Rounding contributed by Ronald S. Bultje
2017-06-28avcodec/x86: allow future 8-bit simple idct to have "DC only hack"James Darnley
Created by Ronald S. Bultje
2017-06-28lavc/aacpsdsp: use ptrdiff_t for stride in hybrid_analysisClément Bœsch
2017-06-28avcodec/x86/vp9dsp_init_16bpp: Fix linking to missing ↵Michael Niedermayer
ff_vp9_ipred_dr_32x32_16_avx2() on 32bit Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-06-27avcodec/vp9: add 64-bit ipred_dr_32x32_16 avx2 implementationIlia Valiakhmetov
vp9_diag_downright_32x32_12bpp_c: 429.7 vp9_diag_downright_32x32_12bpp_sse2: 158.9 vp9_diag_downright_32x32_12bpp_ssse3: 144.6 vp9_diag_downright_32x32_12bpp_avx: 141.0 vp9_diag_downright_32x32_12bpp_avx2: 73.8 Almost 50% faster than avx implementation Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2017-06-27avcodec/utvideodec: add SIMD for restore_rgb_planesPaul B Mahol
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-06-26lavc/x86: clear r2 higher bits in ff_sbr_sum_squareMatthieu Bouron
Suggested-by: James Almer <jamrial@gmail.com>
2017-06-24x86/mdct15: use three operand form for some instructionsJames Almer
Fixes compilation with old yasm
2017-06-24mdct15: add assembly optimizations for the 15-point FFTRostislav Pehlivanov
c: 1802 decicycles in fft15,16774635 runs, 2581 skips avx: 865 decicycles in fft15,16776378 runs, 838 skips Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2017-06-21build: Generalize yasm/nasm-related variable namesDiego Biurrun
None of them are specific to the YASM assembler. (Cherry-picked from libav commit 39e208f4d4756367c7cd2d581847e0c1b8a429c1) Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-20avcodec/x86: allow future 8-bit simple idct to use slightly different ↵James Darnley
coefficients
2017-06-20avcodec/x86: modify simple_idct10 macros to add an action paramterJames Darnley
2017-06-20avcodec/x86: cleanup simple_idct10James Darnley
Use named arguments for the functions so we can remove a define. The stride/linesize argument is now ptrdiff_t type so we no longer need to sign extend the register.
2017-06-20avcodec/x86/mpegenc: support transpose permuation typeJames Darnley
2017-06-20avcodec/x86/mpegenc: check IDCT permutation type is a valid valueJames Darnley