Age | Commit message (Collapse) | Author |
|
Fixes: out of array read
Fixes: 3516/attachment-311488.dat
Found-by: Insu Yun, Georgia Tech.
Tested-by: wuninsu@gmail.com
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
Variables used in inline assembly need to be marked with attribute((used)).
Static constants already were, via the define of DECLARE_ASM_CONST.
But DECLARE_ALIGNED does not add this attribute, and some of the variables
defined with it are const only used in inline assembly, and therefore
appeared dead. This change adds a macro DECLARE_ASM_ALIGNED that marks
variables as used.
This change makes FFMPEG work with Clang's ThinLTO.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
function, in order to make reading of the asm file easier
|
|
|
|
Fixes build with old nasm/yasm.
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
|
|
* commit '681a86aba6cb09b98ad716d986182060c7795d20':
x86: fft: Port to cpuflags
Merged-by: James Almer <jamrial@gmail.com>
|
|
* commit 'e9bb77fb1012cba1951a82136df7071f71bce8fb':
x86: h264: Simplify DEQUANT macro with cpuflags
Merged-by: James Almer <jamrial@gmail.com>
|
|
* commit '307eb1a8ee363db1fcf869e427a8deb6d9538881':
x86: vp8dsp: port FILTER_BILINEAR macro to cpuflags
Merged-by: James Almer <jamrial@gmail.com>
|
|
* commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2':
x86util: Port all macros to cpuflags
See d5f8a642f6eb1c6e305c41dabddd0fd36ffb3f77
Merged-by: James Almer <jamrial@gmail.com>
|
|
* commit '6eef263aca281fb582e1fa3d841ac20ef747a252':
x86: Merge align directives into SECTION_RODATA declarations where possible
Merged-by: James Almer <jamrial@gmail.com>
|
|
Fixes assembling with old yasm.
|
|
Add () to regsize define
Suggested-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
Fixes out of array access
Fixes: crash-huf.avi
Regression since: 6b41b4414934cc930468ccd5db598dd6ef643987
This could also be fixed by adding checks in the C code that calls the dsp
Found-by: Zhibin Hu and 连一汉 <lianyihan@360.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
Also modify the required alignment, to 32 instead of 16
for several codecs
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
* commit '7abdd026df6a9a52d07d8174505b33cc89db7bf6':
asm: Consistently uppercase SECTION markers
Merged-by: James Almer <jamrial@gmail.com>
|
|
* commit 'fd9212f2edfe9b107c3c08ba2df5fd2cba5ab9e3':
Mark some arrays that never change as const.
Merged-by: James Almer <jamrial@gmail.com>
|
|
Tested with "checkasm --test=exrdsp -bench"
Before:
reorder_pixels_c: 5187.8
reorder_pixels_sse2: 377.0
reorder_pixels_avx2: 331.3
After:
reorder_pixels_c: 5181.5
reorder_pixels_sse2: 377.0
reorder_pixels_avx2: 313.8
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
Make dst be the first parameter and src const. It's more in line with the rest of the codebase.
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
Adds a diff_pixels_unaligned()
Fixes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=872503
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
the function name suffix handling.
Using named define properly documents the code paths.
It also avoids passing additional numbered arguments through
multiple levels of macro templates.
The suffix handling is done by concatenation, like in
other asm functions and avoid having two separate
"cglobal" defines.
Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
|
|
its faster
This splits the asm function into exact and non-exact version. The exact
version is as fast or faster on newer CPUs (which EXTERNAL_AVX_FAST describes
well) whilst the non-exact version is faster than the exact on older CPUs.
Also fixes yasm compilation which doesn't accept !cpuflags(avx) syntax.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
|
|
Makes the search produce idential results with the C version.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
|
|
There's no point in toggling it, even for debugging. Its just worse.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
|
|
Explanation on the workings and methods used by the
Pyramid Vector Quantization Search function
could be found in the following Work-In-Progress mail threads:
http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212146.html
http://ffmpeg.org/pipermail/ffmpeg-devel/2017-June/212816.html
http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213030.html
http://ffmpeg.org/pipermail/ffmpeg-devel/2017-July/213436.html
Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
|
|
2.5ms frames:
Before (c): 2638 decicycles in postrotate, 2097040 runs, 112 skips
After (sse3): 1467 decicycles in postrotate, 2097083 runs, 69 skips
After (avx2): 1244 decicycles in postrotate, 2097085 runs, 67 skips
5ms frames:
Before (c): 4987 decicycles in postrotate, 1048371 runs, 205 skips
After (sse3): 2644 decicycles in postrotate, 1048509 runs, 67 skips
After (avx2): 2031 decicycles in postrotate, 1048523 runs, 53 skips
10ms frames:
Before (c): 9153 decicycles in postrotate, 523575 runs, 713 skips
After (sse3): 5110 decicycles in postrotate, 523726 runs, 562 skips
After (avx2): 3738 decicycles in postrotate, 524223 runs, 65 skips
20ms frames:
Before (c): 17857 decicycles in postrotate, 261866 runs, 278 skips
After (sse3): 10041 decicycles in postrotate, 261746 runs, 398 skips
After (avx2): 7050 decicycles in postrotate, 262116 runs, 28 skips
Improves total decoding performance for real world content by 9% with avx2.
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
|
|
This file already has #include "idctdsp.h", which is resolved to the
idctdsp.h header in the directory where this file resides by compilers.
Two other files in this directory, libavcodec/x86/idctdsp_init.c and
libavcodec/x86/xvididct_init.c, also rely on #include "idctdsp.h"
working this way.
Signed-off-by: Wan-Teh Chang <wtc@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
apply_noise_main"
This reverts commit 24bb7db4037876c5722b0eecf7412502e7225634.
noise has to after all be sign extended, not zero extended, on tests
other than checkasm.
Fixes most aac tests broken by the now reverted commit.
|
|
noise needs to be zero extended and it can be done implicitly as a side effect
in a subsequent instruction.
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
Tested-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
They now match according to FATE, barring any further bugs with untested
parts
|
|
depth functions
Includes add/put functions
Rounding contributed by Ronald S. Bultje
|
|
Created by Ronald S. Bultje
|
|
|
|
ff_vp9_ipred_dr_32x32_16_avx2() on 32bit
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
|
|
vp9_diag_downright_32x32_12bpp_c: 429.7
vp9_diag_downright_32x32_12bpp_sse2: 158.9
vp9_diag_downright_32x32_12bpp_ssse3: 144.6
vp9_diag_downright_32x32_12bpp_avx: 141.0
vp9_diag_downright_32x32_12bpp_avx2: 73.8
Almost 50% faster than avx implementation
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
|
Suggested-by: James Almer <jamrial@gmail.com>
|
|
Fixes compilation with old yasm
|
|
c: 1802 decicycles in fft15,16774635 runs, 2581 skips
avx: 865 decicycles in fft15,16776378 runs, 838 skips
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
|
|
None of them are specific to the YASM assembler.
(Cherry-picked from libav commit 39e208f4d4756367c7cd2d581847e0c1b8a429c1)
Signed-off-by: James Almer <jamrial@gmail.com>
|
|
coefficients
|
|
|
|
Use named arguments for the functions so we can remove a define. The
stride/linesize argument is now ptrdiff_t type so we no longer need to
sign extend the register.
|
|
|
|
|