github.com/FFmpeg/FFmpeg.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2020-01-23	libavutil: x86: Include stdlib.h before using _byteswap_ulong	Martin Storsjö
	When clang works in MSVC mode, it does have the _byteswap_ulong builtin, but one has to include stdlib.h before using it. Signed-off-by: Martin Storsjö <martin@martin.st>
2018-09-14	x86/float_dsp: add ff_vector_dmul_{sse2,avx}	James Almer
	~3x to 5x faster. Signed-off-by: James Almer <jamrial@gmail.com>
2018-08-01	x86/pixelutils: don't use the AVX2 functions on CPUs known to be slow with them	James Almer
	Signed-off-by: James Almer <jamrial@gmail.com>
2018-08-01	x86/pixelutils: add missing preprocessor wrapper to the AVX2 functions	James Almer
	Should fix compilation with old yasm/nasm Signed-off-by: James Almer <jamrial@gmail.com>
2018-07-31	avutil/pixelutils: sad_32x32 sse2/avx2 optimizations.	Jun Zhao
	add ff_pixelutils_sad_32x32_sse2, ff_pixelutils_sad_{a,u}_32x32_sse2, ff_pixelutils_sad_32x32_avx22, ff_pixelutils_sad_{a,u}_32x32_avx2 use perf record/report profiling, get instructions:u for avx2 sad_32x32: 72.05% pixelutils pixelutils [.] block_sad_32x32_c 18.50% pixelutils pixelutils [.] block_sad_16x16_c 4.78% pixelutils pixelutils [.] block_sad_8x8_c 2.69% pixelutils pixelutils [.] block_sad_4x4_c 0.89% pixelutils pixelutils [.] block_sad_2x2_c 0.16% pixelutils pixelutils [.] ff_pixelutils_sad_32x32_avx2 0.16% pixelutils pixelutils [.] ff_pixelutils_sad_u_32x32_avx2 0.12% pixelutils pixelutils [.] ff_pixelutils_sad_a_32x32_avx2 sse2 sad_32x32 instructions:u like: 71.86% pixelutils pixelutils [.] block_sad_32x32_c 18.42% pixelutils pixelutils [.] block_sad_16x16_c 4.81% pixelutils pixelutils [.] block_sad_8x8_c 2.68% pixelutils pixelutils [.] block_sad_4x4_c 0.88% pixelutils pixelutils [.] block_sad_2x2_c 0.29% pixelutils pixelutils [.] ff_pixelutils_sad_32x32_sse2 0.26% pixelutils pixelutils [.] ff_pixelutils_sad_u_32x32_sse2 0.23% pixelutils pixelutils [.] ff_pixelutils_sad_a_32x32_sse2 Signed-off-by: Jun Zhao <mypopydev@gmail.com>
2018-07-19	lavu/x86/cpu: Fix aesni detection	alexander schmid

2018-07-11	avutil/pixelutils: correct the function name in comments	Jun Zhao
	Signed-off-by: Jun Zhao <mypopydev@gmail.com>
2018-02-12	Merge commit '4cf84e254ae75b524e1cacae499a97d7cc9e5906'	James Almer
	* commit '4cf84e254ae75b524e1cacae499a97d7cc9e5906': Drop some unnecessary config.h #includes Merged-by: James Almer <jamrial@gmail.com>
2018-02-06	Drop some unnecessary config.h #includes	Diego Biurrun

2018-01-20	x86inc: Drop cpuflags_slowctz	Henrik Gramner

2018-01-20	x86inc: Correctly set mmreg variables	Henrik Gramner

2018-01-20	x86inc: Support creating global symbols from local labels	Henrik Gramner
	On ELF platforms such symbols needs to be flagged as functions with the correct visibility to please certain linkers in some scenarios.
2018-01-20	x86inc: Use .rdata instead of .rodata on Windows	Henrik Gramner
	The standard section for read-only data on Windows is .rdata. Nasm will flag non-standard sections as executable by default which isn't ideal.
2018-01-20	x86inc: Enable AVX emulation for floating-point pseudo-instructions	Henrik Gramner
	There are 32 pseudo-instructions for each floating-point comparison instruction, but only 8 of them are actually valid in legacy-encoded mode. The remaining 24 requires the use of VEX-encoded (v-prefixed) instructions and can therefore be disregarded for this purpose.
2017-12-25	x86inc: set the correct amount of simd regs in x86_64 when avx512 is enabled ↵	James Almer
	but not used Fixes compilation of libavresample/x86/audio_mix.asm Reviewed-by: Gramner Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-25	x86inc: AVX-512 support	Henrik Gramner
	AVX-512 consists of a plethora of different extensions, but in order to keep things a bit more manageable we group together the following extensions under a single baseline cpu flag which should cover SKL-X and future CPUs: * AVX-512 Foundation (F) * AVX-512 Conflict Detection Instructions (CD) * AVX-512 Byte and Word Instructions (BW) * AVX-512 Doubleword and Quadword Instructions (DQ) * AVX-512 Vector Length Extensions (VL) On x86-64 AVX-512 provides 16 additional vector registers, prefer using those over existing ones since it allows us to avoid using `vzeroupper` unless more than 16 vector registers are required. They also happen to be volatile on Windows which means that we don't need to save and restore existing xmm register contents unless more than 22 vector registers are required. Big thanks to Intel for their support.
2017-12-25	avutil: add alignment needed for AVX-512	James Darnley

2017-12-25	avutil: detect when AVX-512 is available	James Darnley

2017-12-25	avutil: add AVX-512 flags	James Darnley

2017-12-02	avutil/x86util : add macro for loading a 128 bits constants in an xmm or in ↵	Martin Vignali
	each part of an ymm in order to simplify avx2 asm func
2017-10-25	Don't use _tzcnt instrinics with clang for windows w/o BMI.	Dale Curtis
	Technically _tzcnt* intrinsics are only available when the BMI instruction set is present. However the instruction encoding degrades to "rep bsf" on older processors. Clang for Windows debatably restricts the _tzcnt* instrinics behind the __BMI__ architecture define, so check for its presence or exclude the usage of these intrinics when clang is present. See also: https://ffmpeg.org/pipermail/ffmpeg-devel/2015-November/183404.html https://bugs.llvm.org/show_bug.cgi?id=30506 http://lists.llvm.org/pipermail/cfe-dev/2016-October/051034.html Signed-off-by: Dale Curtis <dalecurtis@chromium.org> Reviewed-by: Matt Oliver <protogonoi@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-10-21	Merge commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2'	James Almer
	* commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2': x86util: Port all macros to cpuflags See d5f8a642f6eb1c6e305c41dabddd0fd36ffb3f77 Merged-by: James Almer <jamrial@gmail.com>
2017-10-09	cpu: split flag checks per arch in av_cpu_max_align()	James Almer
	Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2017-09-28	avutil/cpu: split flag checks per arch in av_cpu_max_align()	James Almer
	Signed-off-by: James Almer <jamrial@gmail.com>
2017-09-27	Merge commit '7abdd026df6a9a52d07d8174505b33cc89db7bf6'	James Almer
	* commit '7abdd026df6a9a52d07d8174505b33cc89db7bf6': asm: Consistently uppercase SECTION markers Merged-by: James Almer <jamrial@gmail.com>
2017-08-18	Add macros to x86util.asm .	Ivan Kalvachev
	Improved version of VBROADCASTSS that works like the avx2 instruction. Emulation of vpbroadcastd. Horizontal sum HSUMPS that places the result in all elements. Emulation of blendvps and pblendvb. Signed-off-by: Ivan Kalvachev <ikalvachev@gmail.com>
2017-06-27	x86inc: don't use read-only data sections on COFF targets	James Almer
	Yasm: src/libavfilter/x86/af_volume.asm:24: warning: Standard COFF does not support read-only data sections src/libavfilter/x86/af_volume.asm:24: warning: Unrecognized qualifier `align' Nasm: src/libavfilter/x86/af_volume.asm:24: error: standard COFF does not support section alignment specification src/libavutil/x86/x86inc.asm:92: ... from macro `SECTION_RODATA' defined here Tested-by: Clément Bœsch <u@pkh.me> Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-21	build: Generalize yasm/nasm-related variable names	Diego Biurrun
	None of them are specific to the YASM assembler. (Cherry-picked from libav commit 39e208f4d4756367c7cd2d581847e0c1b8a429c1) Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-19	x86/aacpsdsp: add ff_ps_hybrid_synthesis_deint_{sse,sse4}	James Almer
	About 2x faster than the c version.
2017-06-12	x86inc: Add some additional cpuflag relations	Henrik Gramner
	Simplifies writing assembly code that depends on available instructions. LZCNT implies SSE2 BMI1 implies AVX+LZCNT AVX2 implies BMI2
2017-06-09	x86inc: Remove argument from WIN64_RESTORE_XMM	Anton Mitrofanov
	The use of rsp was pretty much hardcoded there and probably didn't work otherwise with stack_size > 0.
2017-06-09	x86inc: Prefer r14/r15 over r12/r13 on x86-64	Henrik Gramner
	Due to a peculiarity in the ModR/M addressing encoding, the r12 and r13 registers sometimes requires an additional byte when used as a base register. r14 and r15 doesn't have that issue, so prefer using them.
2017-06-09	x86inc: Make REP_RET identical to RET in SSSE3+ functions	Henrik Gramner
	There's no point in emitting a rep prefix before ret on modern CPUs.
2017-06-09	x86inc: Fix call with memory operands	Henrik Gramner
	We overload the `call` instruction with a macro, but it would misbehave when the macro argument wasn't a valid identifier. Fix it by explicitly checking if the argument is an identifier.
2017-05-13	x86/float_dsp: remove usage of integer instructions	James Almer

2017-04-12	x86/float_dsp: add ff_vector_fmul_reverse_avx2	James Almer
	~20% faster than AVX. Signed-off-by: James Almer <jamrial@gmail.com>
2017-04-10	x86/float_dsp: add ff_vector_dmac_scalar_{sse2,avx,fma3}	James Almer

2017-03-30	Merge commit '99434f4df81b6801b2b535d5b9143305595784f6'	Clément Bœsch
	* commit '99434f4df81b6801b2b535d5b9143305595784f6': float_dsp: Have implementation match function pointer prototype Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-03-24	Merge commit '7911186ed616ae81dd8617d6d0e8b08c818db9d8'	James Almer
	* commit '7911186ed616ae81dd8617d6d0e8b08c818db9d8': emms: Give apriv_emms_yasm() a more general name Merged-by: James Almer <jamrial@gmail.com>
2017-03-24	Merge commit '6be7944ee2ec2f045e6eb9a93237e992c8b20ac4'	James Almer
	* commit '6be7944ee2ec2f045e6eb9a93237e992c8b20ac4': x86: Add missing colons after assembly labels Merged-by: James Almer <jamrial@gmail.com>
2017-03-22	avutil/x86util: don't use movss in VBROADCASTSS macro when src and dst args ↵	James Almer
	are the same Reviewed-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: James Almer <jamrial@gmail.com>
2017-03-20	Merge commit '07e1f99a1bb41d1a615676140eefc85cf69fa793'	Clément Bœsch
	* commit '07e1f99a1bb41d1a615676140eefc85cf69fa793': x86util: Document SBUTTERFLY macro Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20	Merge commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5'	Clément Bœsch
	* commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5': imgutils: add a function for copying image data from GPU mapped memory Merged-by: Clément Bœsch <u@pkh.me>
2017-03-14	x86util: Port all macros to cpuflags	Diego Biurrun
	Also do some small cosmetic changes: Drop pointless _MMX suffix from ABSD2 macro name, drop pointless check for MMX support, we always assume MMX is available in our SIMD code, fix spelling.
2017-03-01	build: Generalize yasm/nasm-related variable names	Diego Biurrun
	None of them are specific to the YASM assembler.
2017-02-18	avcodec/h264: sse2, avx h luma mbaff deblock/loop filter	James Darnley
	x86-64 only Yorkfield: - sse2: ~2.17x (434 vs. 200 cycles) Nehalem: - sse2: ~2.94x (409 vs. 139 cycles) Skylake: - sse2: ~3.10x (370 vs. 119 cycles) - avx: ~3.29x (370 vs. 112 cycles)
2017-02-18	x86util: import MOVHL macro	James Darnley
	Originally committed to x264 in 1637239a by Henrik Gramner who has agreed to re-license it as LGPL. Original commit message follows. x86: Avoid some bypass delays and false dependencies A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning between int and float domains, so try to avoid that if possible.
2017-02-18	avcodec/x86: deduplicate PASS8ROWS macro	James Darnley

2017-02-03	asm: Consistently uppercase SECTION markers	Diego Biurrun

2017-01-31	Merge commit '8e9cd81d291b1010c625b2766058aadf4affb537'	James Almer
	* commit '8e9cd81d291b1010c625b2766058aadf4affb537': x86: cpu: Detect Conroe CPUs and their slow shuffle unit Merged-by: James Almer <jamrial@gmail.com>