Age | Commit message (Collapse) | Author |
|
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
|
Signed-off-by: Paul B Mahol <onemda@gmail.com>
|
|
* qatar/master:
Consistently use "cpu_flags" as variable/parameter name for CPU flags
Conflicts:
libavcodec/x86/dsputil_init.c
libavcodec/x86/h264dsp_init.c
libavcodec/x86/hpeldsp_init.c
libavcodec/x86/motion_est.c
libavcodec/x86/mpegvideo.c
libavcodec/x86/proresdsp_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
|
|
Always use the special filter for the first and last 3 columns (only).
Changes made in 64ed397 slowed the filter to just under 3/4 of what it
was. This commit restores the speed while maintaining identical output.
For reference, on my Athlon64:
1733222 decicycles in old
2358563 decicycles in new
1727558 decicycles in this
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
* commit '6e9f8d6a7d7392a236df19fef6f4eba41f18167e':
x86: vf_yadif: Remove stray dsputil_mmx #include
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
* commit '093804a93cc5da3f95f98265a5df116912443cec':
avfilter: Add av_cold attributes to init/uninit functions
Conflicts:
libavfilter/af_ashowinfo.c
libavfilter/af_volume.c
libavfilter/src_movie.c
libavfilter/vf_lut.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
* qatar/master:
x86: Move some conditional code around to avoid unused variable warnings
Conflicts:
libavcodec/x86/dsputil_mmx.c
libavfilter/x86/vf_yadif_init.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
There is no noticable benefit for such precision.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
Current dithering only uses the first 4 instead of the whole 8 random values.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
Current code divides before increasing precision.
Also reduce upper bound for strength from 255 to 64. This will prevent
an overflow in the SSSE3 and MMX filter_line code: delta is expressed as
an u16 being shifted by 2 to the left. If it overflows, having a
strength not above 64 will make sure that m is set to 0 (making the
m*m*delta >> 14 expression void).
A value above 64 should not make any sense unless gradfun is used as
a blur filter.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
The filter already checks that width (and height) are greater than 3.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
These smaller samples do not need to be unpacked to double words
allowing the code to process more pixels every iteration (still 2 in MMX
but 6 in SSE2). It also avoids emulating the missing double word
instructions on older instruction sets.
Like with the previous code for 16-bit samples this has been tested on
an Athlon64 and a Core2Quad.
Athlon64:
1809275 decicycles in C, 32718 runs, 50 skips
911675 decicycles in mmx, 32727 runs, 41 skips, 2.0x faster
495284 decicycles in sse2, 32747 runs, 21 skips, 3.7x faster
Core2Quad:
921363 decicycles in C, 32756 runs, 12 skips
486537 decicycles in mmx, 32764 runs, 4 skips, 1.9x faster
293296 decicycles in sse2, 32759 runs, 9 skips, 3.1x faster
284910 decicycles in ssse3, 32759 runs, 9 skips, 3.2x faster
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
This is a fairly dumb copy of the assembly for 8-bit samples but it
works and produces identical output to the C version. The options have
been tested on an Athlon64 and a Core2Quad.
Athlon64:
1810385 decicycles in C, 32726 runs, 42 skips
1080744 decicycles in mmx, 32744 runs, 24 skips, 1.7x faster
818315 decicycles in sse2, 32735 runs, 33 skips, 2.2x faster
Core2Quad:
924025 decicycles in C, 32750 runs, 18 skips
623995 decicycles in mmx, 32767 runs, 1 skips, 1.5x faster
406223 decicycles in sse2, 32764 runs, 4 skips, 2.3x faster
387842 decicycles in ssse3, 32767 runs, 1 skips, 2.4x faster
307726 decicycles in sse4, 32763 runs, 5 skips, 3.0x faster
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Always use the special filter for the first and last 3 columns (only).
Changes made in 64ed397 slowed the filter to just under 3/4 of what it
was. This commit restores the speed while maintaining identical output.
For reference, on my Athlon64:
1733222 decicycles in old
2358563 decicycles in new
1727558 decicycles in this
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
CC:libav-stable@libav.org
Signed-off-by: Anton Khirnov <anton@khirnov.net>
|
|
* commit '64ed397635ef2666b0ca0c8d8c60a8bc44581d82':
vf_yadif: fix out-of line reads
Conflicts:
libavfilter/vf_yadif.c
tests/ref/fate/filter-yadif-mode0
tests/ref/fate/filter-yadif-mode1
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Some changes in the border pixels, visually indistinguishable.
|
|
* commit '238614de679a71970c20d7c3fee08a322967ec40':
cdgraphics: do not rely on get_buffer() initializing the frame.
svq1: replace struct svq1_frame_size with an array.
vf_yadif: silence a warning.
Conflicts:
libavcodec/svq1dec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
clang says:
libavfilter/vf_yadif.c:192:28: warning: incompatible pointer types assigning to
'void (*)(uint8_t *, uint8_t *, uint8_t *, uint8_t *, int, int, int, int, int)'
from 'void (uint16_t *, uint16_t *, uint16_t *, uint16_t *, int, int, int, int, int)'
|
|
* qatar/master:
avfilter: x86: consistent filenames for filter optimizations
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
Reference: 7a1944b907179212e7073b821fdc60e27d536e4a
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
|
|
* qatar/master:
vf_hqdn3d: x86: Add proper arch optimization initialization
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
* commit 'a1c525f7eb0783d31ba7a653865b6cbd3dc880de':
pcx: return meaningful error codes.
tmv: return meaningful error codes.
msrle: return meaningful error codes.
cscd: return meaningful error codes.
yadif: x86: fix build for compilers without aligned stack
lavc: introduce the convenience function init_get_bits8
lavc: check for overflow in init_get_bits
Conflicts:
libavcodec/cscd.c
libavcodec/pcx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Manually load registers to avoid using 8 registers on x86_32 with
compilers that do not align the stack (e.g. MSVC).
Signed-off-by: Diego Biurrun <diego@biurrun.de>
|
|
* commit 'f7bf72a4a1146a7583577c9bdc066767e1ba3c6a':
idcinvideo: correctly set AVFrame defaults
yadif: Port inline assembly to yasm
au: remove unnecessary casts
au: return AVERROR codes instead of -1
Conflicts:
libavcodec/idcinvideo.c
libavfilter/x86/yadif_template.c
libavformat/au.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
|
|
There is no noticable benefit for such precision.
|
|
Current dithering only use the first 4w instead of the whole 8 random values.
|
|
Current code divide before increasing precision.
|
|
|
|
* commit 'b519298a1578e0c895d53d4b4ed8867b1c031a56':
pixdesc: fix yuva 10bit bit depth
avconv: deprecate the -vol option
x86: af_volume: add SSE2/SSSE3/AVX-optimized s32 volume scaling
x86: af_volume: add SSE2-optimized s16 volume scaling
Conflicts:
ffmpeg.c
tests/ref/lavfi/pixdesc
tests/ref/lavfi/pixfmts_copy
tests/ref/lavfi/pixfmts_null
tests/ref/lavfi/pixfmts_scale
tests/ref/lavfi/pixfmts_vflip
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
|
|
* commit 'fa8fcab1e0d31074c0644c4ac5194474c6c26415':
x86: h264_chromamc_10bit: drop pointless PAVG %define
x86: mmx2 ---> mmxext in function names
swscale: do not forget to swap data in formats with different endianness
Conflicts:
libavcodec/x86/dsputil_mmx.c
libavfilter/x86/gradfun.c
libswscale/input.c
libswscale/utils.c
libswscale/x86/swscale.c
tests/ref/lavfi/pixfmts_scale
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|
|
* commit '04581c8c77ce779e4e70684ac45302972766be0f':
x86: yasm: Use complete source path for macro helper %includes
Conflicts:
Makefile
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
* commit '6860b4081d046558c44b1b42f22022ea341a2a73':
x86: include x86inc.asm in x86util.asm
cng: Reindent some incorrectly indented lines
cngdec: Allow flushing the decoder
cngdec: Make the dbov variable have the right unit
cngdec: Fix the memset size to cover the full array
cngdec: Update the LPC coefficients after averaging the reflection coefficients
configure: fix print_config() with broke awks
Conflicts:
libavcodec/x86/ac3dsp.asm
libavcodec/x86/dct32.asm
libavcodec/x86/deinterlace.asm
libavcodec/x86/dsputil.asm
libavcodec/x86/dsputilenc.asm
libavcodec/x86/fft.asm
libavcodec/x86/fmtconvert.asm
libavcodec/x86/h264_chromamc.asm
libavcodec/x86/h264_deblock.asm
libavcodec/x86/h264_deblock_10bit.asm
libavcodec/x86/h264_idct.asm
libavcodec/x86/h264_idct_10bit.asm
libavcodec/x86/h264_intrapred.asm
libavcodec/x86/h264_intrapred_10bit.asm
libavcodec/x86/h264_weight.asm
libavcodec/x86/vc1dsp.asm
libavcodec/x86/vp3dsp.asm
libavcodec/x86/vp56dsp.asm
libavcodec/x86/vp8dsp.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
This is more consistent with the way we handle C #includes and
it simplifies the build system.
|
|
This is necessary to allow refactoring some x86util macros with cpuflags.
|
|
* commit 'f6c38c5f4ed6683a6a61db2ed418a68bbe5f5507':
avfilter: call x86 init functions under if (ARCH_X86), not if (HAVE_MMX)
rtspdec: Set the default port for listen mode, if none is specified
tscc2: Fix an out of array access
rtmpproto: Fix an out of array write
rtspdec: Fix use of uninitialized byte
vp8: reset loopfilter delta values at keyframes.
avutil: add yuva422p and yuva444p formats
Conflicts:
libavutil/pixdesc.c
libavutil/pixfmt.h
tests/ref/lavfi/pixdesc
tests/ref/lavfi/pixfmts_copy
tests/ref/lavfi/pixfmts_null
tests/ref/lavfi/pixfmts_scale
tests/ref/lavfi/pixfmts_vflip
Merged-by: Michael Niedermayer <michaelni@gmx.at>
|
|
|