github.com/mpc-hc/FFmpeg.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2015-10-22	lavc/x86/vc1dsp_init: Fix compilation with --disable-yasm.	Carl Eugen Hoyos

2015-10-22	x86/Makefile: move decoder/encoder objects out of the subsystems section	James Almer
	Signed-off-by: James Almer <jamrial@gmail.com>
2015-10-22	vc1dsp: Port ff_vc1_put_ver_16b_shift2_mmx to yasm	Timothy Gu
	This function is only used within other inline asm functions, hence the HAVE_MMX_INLINE guard. Per recent discussions, we should not worry about the performance of inline asm-only builds.
2015-10-21	huffyuvencdsp: Cherry pick changes left out in the last commit	Timothy Gu
	Oops.
2015-10-21	huffyuvencdsp: Add ff_diff_bytes_{sse2,avx2}	Timothy Gu
	SSE2 version 4%-35% faster than MMX depending on the width. AVX2 version 1%-13% faster than SSE2 depending on the width.
2015-10-21	huffyuvencdsp: Convert ff_diff_bytes_mmx to yasm	Timothy Gu
	Heavily based upon ff_add_bytes by Christophe Gisquet. Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Timothy Gu <timothygu99@gmail.com>
2015-10-20	huffyuvencdsp: Use intptr_t for width	Timothy Gu
	It is done this way in huffyuvdsp as well.
2015-10-20	x86: vc1dsp_mmx: Move yasm initiation steps to vc1dsp_init	Timothy Gu
	That's where all yasm initiation steps are. Also removes the overlap between the two files.
2015-10-20	x86: fpel: Remove erroneous ff_put_pixels8_mmxext prototype	Timothy Gu
	This function does not exist.
2015-10-20	x86: fpel: Move prototypes for 4-px block functions	Timothy Gu

2015-10-14	x86/vp9itxfm: fix register clobbering in ff_vp9_idct_idct_4x4_add_12_sse2	James Almer
	Reviewed-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: James Almer <jamrial@gmail.com>
2015-10-13	x86: simple_idct10_template: use const	Christophe Gisquet
	This avoid going through constants.c while still sharing them with proresdsp.asm Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13	vp9: use registers for constant loading where possible.	Ronald S. Bultje

2015-10-13	vp9: refactor itx coefficients and share between 8 and 10/12bpp.	Ronald S. Bultje

2015-10-13	vp9: add itxfm_add eob shortcuts to 10/12bpp functions.	Ronald S. Bultje
	These aren't quite as helpful as the ones in 8bpp, since over there, we can use pmulhrsw, but here the coefficients have too many bits to be able to take advantage of pmulhrsw. However, we can still skip cols for which all coefs are 0, and instead just zero the input data for the row itx. This helps a few % on overall decoding speed.
2015-10-13	vp9: add 10/12bpp idct_idct_32x32 sse2 SIMD version.	Ronald S. Bultje

2015-10-13	vp9: 10/12bpp sse2 SIMD for iadst16.	Ronald S. Bultje

2015-10-13	vp9: refactor 10/12bpp dc-only code in 4x4/8x8 and add to 16x16.	Ronald S. Bultje

2015-10-13	vp9: add 10/12bpp sse2 SIMD version for idct_idct_16x16.	Ronald S. Bultje

2015-10-13	vp9: add 10/12bpp sse2 SIMD versions of iadst8x8.	Ronald S. Bultje

2015-10-13	vp9: add 10/12bpp sse2 SIMD for idct_idct_8x8.	Ronald S. Bultje

2015-10-13	vp9: add 12bpp sse2 versions of iadst4.	Ronald S. Bultje

2015-10-13	vp9: initial attempt at a idct_idct_4x4 12bpp x86 simd (sse2) impl.	Ronald S. Bultje
	The trouble with this function is that intermediates overflow 31+sign bits, so I've added some helpers (that will also be used in 10/12bpp 8x8, 16x16 and 32x32) to make that easier, basically emulating a half- assed pmaddqd using 2xpmaddwd. It's currently sse2-only, if anyone sees potential in adding ssse3, I'd love to hear it.
2015-10-13	vp9: add x86 simd (sse2/ssse3) for iadst4 10bpp functions.	Ronald S. Bultje

2015-10-13	vp9: add 10bpp simd (mmxext/ssse3) for idct_idct_4x4.	Ronald S. Bultje

2015-10-13	vp9: add 10/12bpp mmxext-optimized iwht_iwht_4x4 function.	Ronald S. Bultje

2015-10-13	x86: dct-test: add more idcts	Christophe Gisquet
	In particular for 10 and 12 bits. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13	x86: simple_idct: 12bits versions	Christophe Gisquet
	On 12 frames of a 444p 12 bits DNxHR sequence, _put function: C: 78902 decicycles in idct, 262071 runs, 73 skips avx: 32478 decicycles in idct, 262045 runs, 99 skips Difference between the 2: stddev: 0.39 PSNR:104.47 MAXDIFF: 2 This is unavoidable and due to the scale factors used in the x86 version, which cannot match the C ones. In addition, the trick of adding an initial bias to the input of a pass can overflow, as the input coefficients are already 15bits, which is the maximum this function can handle. Overall, however, the omse on 12 bits samples goes from 0.16916 to 0.16883. Reducing rowshift by 1 improves to 0.0908, but causes overflows. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13	x86: simple_idct(_put): 10bits versions	Christophe Gisquet
	Modeled from the prores version. Clips to [0;1023] and is bitexact. Bitexactness requires to add offsets in different places compared to prores or C, and makes the function approximately 2% slower. For 16 frames of a DNxHD 4:2:2 10bits test sequence: C: 60861 decicycles in idct, 1048205 runs, 371 skips sse2: 27567 decicycles in idct, 1048216 runs, 360 skips avx: 26272 decicycles in idct, 1048171 runs, 405 skips The add version is not implemented, so the corresponding dsp function is set to NULL to make it clear in a code executing it. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13	x86: simple_idct10_template: fix overflow in pass	Christophe Gisquet
	When the input of a pass has 15 or 16 bits of precision (in particular the column pass), the addition of a bias to W4 may lead to overflows in the input to pmaddwd. This requires postponing the adding of the bias to after the first butterfly. To do so, the fact that m15, unused although zeroed, is exploited. In case the pass is safe, an address can be directly used, and the number of xmm regs can be decreased. Otherwise, the 32bits bias is loaded into it. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-13	x86: prores: templatize 10 bits simple_idct	Christophe Gisquet
	This should be reused for a generic simple_idct10 function. Requires a bit of trickery to declare common constants in C. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-10	x86/takdsp: use arithmetic shift instructions	James Almer
	p1 and p2 are int32_t. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>
2015-10-09	avcodec/takdec: add x86 SIMD for rest of decorrelation modes	Paul B Mahol
	Signed-off-by: Paul B Mahol <onemda@gmail.com>
2015-10-07	vp9: don't keep a stack pointer if we don't need it.	Ronald S. Bultje
	This saves one register in a few cases on 32bit builds with unaligned stack (e.g. MSVC), making the code slightly easier to maintain. (Can someone please test this on 32bit+msvc and confirm make fate-vp9 and tests/checkasm/checkasm still work after this patch?)
2015-10-07	x86/alacdsp: add simd optimized functions	James Almer
	Signed-off-by: James Almer <jamrial@gmail.com>
2015-10-05	vp9: fix msvc build by using 6 GPRs on 32bit if stack!=aligned.	Ronald S. Bultje

2015-10-04	blockdsp: reindent after parameter removal	Christophe Gisquet
	Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-03	vp9: 10/12bpp SIMD (sse2/ssse3/avx) for directional intra prediction.	Ronald S. Bultje

2015-10-03	vp9: 16bpp tm/dc/h/v intra pred simd (mostly sse2) functions.	Ronald S. Bultje

2015-10-03	vp9: sse2/ssse3/avx 16bpp loopfilter x86 simd.	Ronald S. Bultje

2015-10-03	avcodec/x86/hpeldsp_rnd_template: silence -Wunused-function on --disable-mmx	Ganesh Ajjanagadde
	This silences some of the -Wunused-function warnings when compiled with --disable-mmx, e.g http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx. Header guards are too brittle and ugly for this case. Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-10-02	blockdsp: remove high bitdepth parameter	Christophe Gisquet
	It is only (mis-)used to set the dsp fucntions clear_block(s). But these functions always work on 16bits-wide elements, which make the parameter useless and actually harmful, as it causes all content on more than 8-bits to not use accelerated functions. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-30	x86/hevc_sao: move 10/12bit functions into a separate file	James Almer
	Tested-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-29	avcodec/x86/rnd_template: silence -Wunused-function on --disable-mmx	Ganesh Ajjanagadde
	This silences some of the -Wunused-function warnings when compiled with --disable-mmx, e.g http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx. Header guards are too brittle and ugly for this case. Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-29	avcodec/x86/sbrdsp: Fix using uninitialized upper 32bit of noise	Michael Niedermayer
	Fixes crash Fixes: flicker-1.scout3d21443372922.28.m4a Found-by: Dale Curtis <dalecurtis@google.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-24	avcodec/x86/cavsdsp: silence -Wunused-variable on --disable-mmx	Ganesh Ajjanagadde
	This silences -Wunused-variable when compiled with --disable-mmx, e.g http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx. The alternative of header guards will make it far too ugly. Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-23	avcodec/x86/mpegaudiodsp: silence -Wunused-variable on --disable-mmx	Ganesh Ajjanagadde
	This silences -Wunused-variable when compiled with --disable-mmx, e.g http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx. The alternative of header guards will make it far too ugly. Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-23	avcodec/x86/rv40dsp_init: silence -Wunused-variable on --disable-mmx	Ganesh Ajjanagadde
	This silences -Wunused-variable when compiled with --disable-mmx, e.g http://fate.ffmpeg.org/log.cgi?time=20150919094617&log=compile&slot=x86_64-archlinux-gcc-disable-mmx. The alternative of header guards will make it far too ugly. Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-09-21	x86/vp9dsp: fix local header include	James Almer
	Signed-off-by: James Almer <jamrial@gmail.com>
2015-09-21	x86/vp9dsp: add missing header include	James Almer
	Fixes make checkheaders Signed-off-by: James Almer <jamrial@gmail.com>