github.com/FFmpeg/FFmpeg.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2015-01-26	x86/sbrdsp: add ff_sbr_autocorrelate_{sse,sse3}	James Almer
	2 to 2.5 times faster. Signed-off-by: James Almer <jamrial@gmail.com>
2014-05-16	x86: sbrdsp: implement SSE qmf_deint_neg	Christophe Gisquet
	From 133 (unrolled av_intfloat32 C) to 59 cycles on Arrandale/Win64. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-08-30	Reinstate proper FFmpeg license for all files.	Thilo Borgmann

2013-07-18	Merge remote-tracking branch 'qatar/master'	Michael Niedermayer
	* qatar/master: Consistently use "cpu_flags" as variable/parameter name for CPU flags Conflicts: libavcodec/x86/dsputil_init.c libavcodec/x86/h264dsp_init.c libavcodec/x86/hpeldsp_init.c libavcodec/x86/motion_est.c libavcodec/x86/mpegvideo.c libavcodec/x86/proresdsp_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-18	Consistently use "cpu_flags" as variable/parameter name for CPU flags	Diego Biurrun

2013-05-10	x86: sbrdsp: implement SSE2 qmf_pre_shuffle	Christophe Gisquet
	From 253 to 51 cycles on Arrandale and Win64. 44 cycles on SandyBridge. Signed-off-by: Anton Khirnov <anton@khirnov.net>
2013-05-08	x86: sbrdsp: force PIC addressing for Win64	Christophe Gisquet
	MSVC complains about the 32bits addressing, while mingw/gcc does not. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-05-03	x86: sbrdsp: Implement SSE2 qmf_deint_bfly	Christophe Gisquet
	Sandybridge: 47 cycles Having a loop counter is a 7 cycle gain. Unrolling is another 7 cycle gain. Working in reverse scan is another 6 cycles. Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-04-24	avcodec/x86/sbrdsp_init: disable using the noise code in x86_64 MSVC, Try #2	Michael Niedermayer
	This should fix building with MSVC until someone can change the code so it works with MSVC Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-04-23	avcodec/x86/sbrdsp_init: disable using the noise code in x86_64 MSVC	Michael Niedermayer
	This should fix building with MSVC until someone can change the code so it works with MSVC Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-04-19	x86: sbrdsp: implement SSE2 hf_apply_noise	Christophe Gisquet
	233 to 105 cycles on Arrandale and Win64. Replacing the multiplication by s_m[m] by a pand and a pxor with appropriate vectors is slower. Unrolling is a 15 cycles win. A SSE version was 4 cycles slower. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-04-10	x86: sbrdsp: implement SSE2 qmf_pre_shuffle	Christophe Gisquet
	From 253 to 51 cycles on Arrandale and Win64. 44 cycles on SandyBridge. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-04-08	x86: sbrdsp: implement SSE qmf_deint_bfly	Christophe Gisquet
	From 312 to 89/68 (sse/sse2) cycles on Arrandale and Win64. Sandybridge: 68/47 cycles. Having a loop counter is a 7 cycle gain. Unrolling is another 7 cycle gain. Working in reverse scan is another 6 cycles. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-04-06	x86: sbrdsp: Implement SSE neg_odd_64	Christophe Gisquet
	Timing on Arrandale: C SSE Win32: 57 44 Win64: 47 38 Unrolling and not storing mask both save some cycles. Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-02-05	Add av_cold attributes to arch-specific init functions	Diego Biurrun

2013-01-06	x86: sbrdsp: Implement SSE qmf_post_shuffle	Christophe Gisquet
	255 to 174 cycles on Arrandale / Win64. Unrolling yields no gain. Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-01-06	x86: sbrdsp: Implement SSE sum64x5	Christophe Gisquet
	698 to 174 cycles on Arrandale. Unrolling is a 6 cycles gain. Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-12-07	SBR DSP x86: implement SSE sbr_hf_gen	Christophe Gisquet
	Start and end index are multiple of 2, therefore guaranteeing aligned access. Also, this allows to generate 4 floats per loop, keeping the alignment all along. Timing: - 32 bits: 326c -> 172c - 64 bits: 323c -> 156c Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-09-08	x86: Replace checks for CPU extensions and flags by convenience macros	Diego Biurrun
	This separates code relying on inline from that relying on external assembly and fixes instances where the coalesced check was incorrect.
2012-02-24	SBR DSP x86: implement SSE sbr_hf_g_filt	Christophe GISQUET
	Unrolling the main loop to process, instead of 4 elements: - 8: minor gain of 2 cycles (not worth the extra object size) - 2: loss of 8 cycles. Assigning STEP to a register is a loss. Output address (Y) is almost always unaligned. Timings: - C (32/64 bits): 117/109 cycles - SSE: 57 cycles Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-02-24	SBR DSP x86: implement SSE sbr_sum_square_sse	Christophe GISQUET
	The 32bits targets have been compiled with -mfpmath=sse for proper reference. sbr_sum_square C /32bits: 82c (unrolled)/102c C /64bits: 69c (unrolled)/82c SSE/32bits: 42c SSE/64bits: 31c Use of SSE4.1 dpps to perform the final sum is slower. Not unrolling to perform 8 operations in a loop yields 10 more cycles. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>