Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/FFmpeg/FFmpeg.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2020-07-12x86/h264_deblock: fix warning about trailing empty parameterJames Almer
Fixes part of ticket #8771 Signed-off-by: James Almer <jamrial@gmail.com> (cherry picked from commit 2c844c98285ca03d9cc44db920da645cf0376c40)
2020-05-13pixblockdsp, avdct: Add get_pixels_unalignedMartin Storsjö
Use this in vf_spp.c, where the get_pixels operation is done on unaligned source addresses. Hook up the x86 (mmx and sse) versions of get_pixels to this function pointer, as those implementations seem to support unaligned use. This fixes fate-filter-spp on armv7. Signed-off-by: Martin Storsjö <martin@martin.st>
2020-03-27lavc/x86/hevc_add_res: Fix coeff overflow in ADD_RES_SSE_16_32_8Linjie Fu
Fix overflow for coeff -32768 in function ADD_RES_SSE_16_32_8 with no performance drop.(SSE2/AVX/AVX2) ./checkasm --test=hevc_add_res --bench Mainline: - hevc_add_res.add_residual [OK] hevc_add_res_32x32_8_sse2: 127.5 hevc_add_res_32x32_8_avx: 127.0 hevc_add_res_32x32_8_avx2: 86.5 Add overflow test case: - hevc_add_res.add_residual [FAILED] After: - hevc_add_res.add_residual [OK] hevc_add_res_32x32_8_sse2: 126.8 hevc_add_res_32x32_8_avx: 128.3 hevc_add_res_32x32_8_avx2: 86.8 Signed-off-by: Xu Guangxin <guangxin.xu@intel.com> Signed-off-by: Linjie Fu <linjie.fu@intel.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
2020-03-27lavc/x86/hevc_add_res: Fix overflow in ADD_RES_SSE_8_8Linjie Fu
Fix overflow for coeff -32768 in function ADD_RES_SSE_8_8 with no performance drop. ./checkasm --test=hevc_add_res --bench Mainline: - hevc_add_res.add_residual [OK] hevc_add_res_8x8_8_sse2: 15.5 Add overflow test case: - hevc_add_res.add_residual [FAILED] After: - hevc_add_res.add_residual [OK] hevc_add_res_8x8_8_sse2: 15.5 Signed-off-by: Xu Guangxin <guangxin.xu@intel.com> Signed-off-by: Linjie Fu <linjie.fu@intel.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
2020-03-27lavc/x86/hevc_add_res: Fix overflow in ADD_RES_MMX_4_8Linjie Fu
Fix overflow for coeff -32768 in function ADD_RES_MMX_4_8 with no performance drop. ./checkasm --test=hevc_add_res --bench Mainline: - hevc_add_res.add_residual [OK] hevc_add_res_4x4_8_mmxext: 15.5 Add overflow test case: - hevc_add_res.add_residual [FAILED] After: - hevc_add_res.add_residual [OK] hevc_add_res_4x4_8_mmxext: 15.0 Signed-off-by: Xu Guangxin <guangxin.xu@intel.com> Signed-off-by: Linjie Fu <linjie.fu@intel.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>
2020-01-31avcodec/x86/diracdsp: Fix high bits on Windows x86_64Michael Niedermayer
Found-by: james
2020-01-30avcodec/x86/diracdsp: Fix incorrect src addressing in dequant_subband_32()Michael Niedermayer
Fixes: Segfault (not reproducable with asm, which made this hard to debug) Fixes: decoding errors Fixes: 19854/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DIRAC_fuzzer-5729372837511168 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-10-30vp4: prevent unaligned memory access in loop filterPeter Ross
VP4 applies a loop filter during motion compensation, causing the block offset will often by unaligned. This produces a bus error on some platforms, namely ARMv7 NEON. This patch adds a unaligned version of the loop filter function pointer to VP3DSPContext. Reported-by: Mike Melanson <mike@multimedia.cx> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-09-12x85/opusdsp: enable the functions on all FMA3 CPUsJames Almer
It's not using ymm registers, so limiting it to CPUs with fast AVX is not necessary. Signed-off-by: James Almer <jamrial@gmail.com>
2019-09-12x86/opusdps: clear the high bits from some gprsJames Almer
Fixes checkasm on systems like win64. Reviewed-by: Lynne Signed-off-by: James Almer <jamrial@gmail.com>
2019-05-14avcodec/Makefile: add missing pngdsp dependency to the lscr decoderJames Almer
Signed-off-by: James Almer <jamrial@gmail.com>
2019-05-03x86/v210dec: use named registersJames Almer
Signed-off-by: James Almer <jamrial@gmail.com>
2019-05-03x86/v210dec: don't reserve more xmm regs than neededJames Almer
Prevents pointless register saving on win64 for the sse3 and avx versions of the function. Signed-off-by: James Almer <jamrial@gmail.com>
2019-05-03x86/v210dec: remove duplicate load instructionJames Almer
Signed-off-by: James Almer <jamrial@gmail.com>
2019-05-02avcodec/x86/v210: fix operands of vpblendd used in new avx2 codeJames Darnley
Assembly failed when using yasm rather than nasm.
2019-05-02libavcodec Adding ff_v210_planar_unpack AVX2Michael Stoner
Replaced VSHUFPS with VPBLENDD to relieve port 5 bottleneck AVX2 is 1.4x faster than AVX
2019-04-27x86/opusdsp: replace loads with shufflesLynne
Has a slight speedup. Can't be carried over to aarch64, since it has no shufps-like instruction. Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
2019-04-01x86/opusdsp: fix WIN64 return valueLynne
Signed-off-by: James Almer <jamrial@gmail.com>
2019-04-01x86/opusdsp: implement FMA3 accelerated postfilter and deemphasisLynne
58893 decicycles in deemphasis_c, 130548 runs, 524 skips 9475 decicycles in deemphasis_fma3, 130686 runs, 386 skips -> 6.21x speedup 24866 decicycles in postfilter_c, 65386 runs, 150 skips 5268 decicycles in postfilter_fma3, 65505 runs, 31 skips -> 4.72x speedup Total decoder speedup: ~14% Deemphasis SIMD based on the following unrolling: const float c1 = CELT_EMPH_COEFF, c2 = c1*c1, c3 = c2*c1, c4 = c3*c1; float state = coeff; for (int i = 0; i < len; i += 4) { y[0] = x[0] + c1*state; y[1] = x[1] + c2*state + c1*x[0]; y[2] = x[2] + c3*state + c1*x[1] + c2*x[0]; y[3] = x[3] + c4*state + c1*x[2] + c2*x[1] + c3*x[0]; state = y[3]; y += 4; x += 4; }
2019-04-01celt_pvq_init: only build when CONFIG_OPUS_ENCODER is enabledLynne
The entire function was defined away before.
2019-04-01x86/opus_dsp: rename to celt_pvqLynne
Its only used in the encoder and in CELT's PVQ.
2019-02-20avcodec/h264dsp: change loop filter stride argument to ptrdiff_tJames Almer
2018-12-02avcodec/proresdsp indent after prev commitMartin Vignali
2018-12-02avcodec/proresdec : rename dsp part for 10b and check dspinit for supported ↵Martin Vignali
bits per raw sample based on patch by Kieran Kunhya
2018-05-08mdct15: simplify x86 exptab permutationRostislav Pehlivanov
Removes an unneeded copy and does the 5-point permute in-place. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2018-05-08mdct15: simplify the fft15 x86 SIMDRostislav Pehlivanov
Saves 1 gpr and 2 instructions and simplifies the macros a bit. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2018-04-02mpeg4video: Add support for MPEG-4 Simple Studio Profile.Kieran Kunhya
This is a profile supporting > 8-bit video and has a higher quality DCT
2018-03-08sbcenc: add MMX optimizationsAurelien Jacobs
This was originally based on libsbc, and was fully integrated into ffmpeg. Rough speed test: C version: speed= 592x MMX version: speed= 785x
2018-02-12h264_idct: enable unmacro on newer NASM versionsRostislav Pehlivanov
Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>
2018-01-28avcodec/utvideoenc : add SIMD (avx) for sub_left_predictionMartin Vignali
asm code by Henrik Gramner
2018-01-12avcodec: increase AV_INPUT_BUFFER_PADDING_SIZE to 64James Almer
AVX-512 support has been introduced, and even if no functions currently use zmm registers (able to load as much as 64 bytes of consecutive data per instruction), they will be added eventually. Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com> Tested-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-10x86/lossless_videodsp: rename ff_add_left_pred_int16_sse4 to ↵James Almer
ff_add_left_pred_int16_unaligned_ssse3 SSSE3_FAST is the proper check for it. Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-10x86/lossless_videodsp: don't overread the dst buffer in ↵James Almer
ff_add_left_pred_unaligned_avx2 Fixes valgrind Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-09avcodec/utvideodec : add SIMD (SSSE3 and AVX2) for gradient_predMartin Vignali
2017-12-09avcodec/x86/lossless_videodsp : add avx2 version for add_left_predMartin Vignali
2017-12-09avcodec/x86/lossless_videodsp.asm : make macro for add_left_pred_unaligned ↵Martin Vignali
in order to add avx2 version
2017-12-02avcodec/x86/bswapdsp : use macro for 128 bits constants loading in xmm or ymmMartin Vignali
2017-11-25avcodec/fft: fix INTERL macro on 3dnowMikulas Patocka
The commit b7c16a3f2c4921f613319938b8ee0e3d6fa83e8d ("x86: fft: Port to cpuflags") breaks the opus decoder in ffmpeg when compiling for 3dnow. The output is audible, but there's a lot of noise. The reason for the breakage is that the commit unintentionally changed the INTERL macro so that it is empty when compiling for 3dnow. This patch fixes it. Signed-off-by: Mikulas Patocka <mikulas@twibright.com> Signed-off-by: James Almer <jamrial@gmail.com>
2017-11-23avcodec/x86/exrdsp : use ymm constant for pb_80Martin Vignali
speed seems to be similar, but simplify code
2017-11-21x86/utvideodsp: reuse shared constantsJames Almer
Remove the broadcast instructions as well now that they are wide enough. Signed-off-by: James Almer <jamrial@gmail.com>
2017-11-21x86/constants: make pb_80 32 byte wideJames Almer
Signed-off-by: James Almer <jamrial@gmail.com>
2017-11-21avcodec/huffyuvdspenc : add diff_int16 AVX2 funcMartin Vignali
2017-11-21avcodec/huffyuvdspenc : reorganize diff_int16Martin Vignali
2017-11-21avcodec/huffyuvdsp : add add_int16 AVX2 funcMartin Vignali
2017-11-21avcodec/huffyuvdsp : reorganize add_int16 asmMartin Vignali
2017-11-21avcodec/huffyuvdsp(enc) : move duplicate macro to a template fileMartin Vignali
2017-11-21avcodec/x86/utvideodsp.asm : cosmeticMartin Vignali
better func separator and add comment for the restore rgb planes10 declaration
2017-11-21avcodec/utvideodsp : add avx2 version for the dspMartin Vignali
2017-11-21avcodec/x86/utvideodsp : make macro for funcMartin Vignali
2017-11-21x86/jpeg2000dsp: add ff_ict_float_{fma3,fma4}James Almer
jpeg2000_ict_float_c: 2296.0 jpeg2000_ict_float_sse: 628.0 jpeg2000_ict_float_avx: 317.0 jpeg2000_ict_float_fma3: 262.0 Signed-off-by: James Almer <jamrial@gmail.com>