Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/mpc-hc/FFmpeg.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-02-01Merge commit 'd639dcdae022130078c9c84b7b691c5e9694786c'Clément Bœsch
* commit 'd639dcdae022130078c9c84b7b691c5e9694786c': ratecontrol: Move Xvid-related functions to the place they are actually used Merged-by: Clément Bœsch <u@pkh.me>
2017-02-01pgssubdec: reset rle_data_len/rle_remaining_len on allocation errorAndreas Cadhalpun
The code relies on their validity and otherwise can try to access a NULL object->rle pointer, causing segmentation faults. Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
2017-02-01Revert "Merge commit '0a39c9ac0bfd7345fe676b4e2707d9cec3cbb553'"Michael Niedermayer
The assumption this is based on is wrong, the code is not always run with bitexact flags This reverts commit a956164e1eb3418922cae949f02ad4035f013213, reversing changes made to f6005907fdeb9e4de37568ed5c1a8e7b869126f6. Approved-by: James Almer <jamrial@gmail.com>
2017-02-01avcodec/mjpegdec: Check for for the bitstream end in ↵Michael Niedermayer
mjpeg_decode_scan_progressive_ac() Fixes timeout Fixes: 496/clusterfuzz-testcase-5805083497332736 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-01-31Merge commit 'b4bb9593834460bbbe0e70823f2c503cb01ad052'James Almer
* commit 'b4bb9593834460bbbe0e70823f2c503cb01ad052': ratecontrol: Drop commented out cruft Merged-by: James Almer <jamrial@gmail.com>
2017-01-31Merge commit 'd06dfaa5cbdd20acfd2364b16c0f4ae4ddb30a65'James Almer
* commit 'd06dfaa5cbdd20acfd2364b16c0f4ae4ddb30a65': x86: huffyuv: Use EXTERNAL_SSSE3_FAST convenience macro where appropriate Merged-by: James Almer <jamrial@gmail.com>
2017-01-31Merge commit '4efab89332ea39a77145e8b15562b981d9dbde68'James Almer
* commit '4efab89332ea39a77145e8b15562b981d9dbde68': x86: Use *_FAST/*_SLOW CPU feature detection macros where appropriate Merged-by: James Almer <jamrial@gmail.com>
2017-01-31Merge commit '0a39c9ac0bfd7345fe676b4e2707d9cec3cbb553'James Almer
* commit '0a39c9ac0bfd7345fe676b4e2707d9cec3cbb553': x86: hpeldsp: Don't check for bitexact flag when initializing VP3-specific code Merged-by: James Almer <jamrial@gmail.com>
2017-01-31Merge commit '95c1df929b92d81454656c222a35ec5f7db576b4'James Almer
* commit '95c1df929b92d81454656c222a35ec5f7db576b4': x86: hpeldsp: Drop unused function parameters Merged-by: James Almer <jamrial@gmail.com>
2017-01-31Merge commit 'c3e83ad3b7d75f3597f47ada2616ba4479665009'James Almer
* commit 'c3e83ad3b7d75f3597f47ada2616ba4479665009': x86: hpeldsp: Use EXTERNAL_SSE2_FAST where appropriate Merged-by: James Almer <jamrial@gmail.com>
2017-01-31Merge commit '1dfc3cf89d0eb026af28be46294b85d79499ffb5'James Almer
* commit '1dfc3cf89d0eb026af28be46294b85d79499ffb5': x86: hpeldsp: Split off VP3-specific bits into a separate file Merged-by: James Almer <jamrial@gmail.com>
2017-01-31Merge commit '76f7e70aa04fc5dbef5242b11cbf8fe4499f61d4'Clément Bœsch
* commit '76f7e70aa04fc5dbef5242b11cbf8fe4499f61d4': h264dec: handle zero-sized NAL units in get_last_needed_nal() See 641dccc2aa5e0bf6b3c06998f9a7f24a5cf725e7 Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-01-31lavc/hevc: remove a few random spaces to reduce diff with libavClément Bœsch
2017-01-31Merge commit 'fca3c3b61952aacc45e9ca54d86a762946c21942'Clément Bœsch
* commit 'fca3c3b61952aacc45e9ca54d86a762946c21942': hevc: Add AVX2 DC IDCT Mostly noop as we already have that code. In the ASM, code is merged with the exception of SECTION which is kept uppercase for consistency with the rest of the codebase. Still in the ASM, the prototype comment is fixed to honor the '_' added from the original commit. idct_dc_proto() is dropped as it's not used anymore here. Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-01-31Merge commit 'cc16da75c2f99d92f7a6461100f041352deb6d88'Clément Bœsch
* commit 'cc16da75c2f99d92f7a6461100f041352deb6d88': hevc: Add coefficient limiting to speed up IDCT Noop again as we have these changes already, only random spacing changes. Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-01-31Merge commit 'a92fd8a06256e71a0be87b03751ec3c2a4a8aa21'Clément Bœsch
* commit 'a92fd8a06256e71a0be87b03751ec3c2a4a8aa21': hevc: Add DC IDCT Noop, only spacing adjusted. Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-01-31Merge commit '4f247de3b797cdc9d243d26534412f81c306e5b5'Clément Bœsch
* commit '4f247de3b797cdc9d243d26534412f81c306e5b5': hevcdsp_template: Templatize IDCT This commit is a noop as we already have that code from a previous commits (see 92cccb7bcd79845020ed8abebf35170c182443b2). Spacing is adjusted to reduce the diff. Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-01-31Merge commit '1bd890ad173d79e7906c5e1d06bf0a06cca4519d'Clément Bœsch
* commit '1bd890ad173d79e7906c5e1d06bf0a06cca4519d': hevc: Separate adding residual to prediction from IDCT This commit should be a noop but isn't because of the following renames: - transform_add → add_residual - transform_skip → dequant - idct_4x4_luma → transform_4x4_luma Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-01-31lavc/alac: Export samplerate.Carl Eugen Hoyos
Fixes ticket #6096.
2017-01-30lavc/hevcdsp: fix pretty printing mistakeClément Bœsch
"Issue" introduced in 83976e40e89655162e5394cf8915d9b6d89702d9.
2017-01-29lavc/mjpegdec: consume SOS data even if the frame is discardedMatthieu Bouron
Speeds up next marker search when a SOS marker is found but the frame is discarded (which happens in avformat_find_stream_info).
2017-01-27avcodec/h264dec: Clear ref_count on slice header processing failureMichael Niedermayer
Fixes using freed memory Introduced in 744801989099df26e90b00062c645969c5347533 Fixes: 471/fuzz-1-ffmpeg_VIDEO_AV_CODEC_ID_H264_fuzzer Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-01-25avcodecc/ccaption_dec: remove extra word from long codec descriptionPaul B Mahol
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-01-25avcodec/utils: correct align value for interplayMichael Niedermayer
Fixes out of array access Fixes: 452/fuzz-1-ffmpeg_VIDEO_AV_CODEC_ID_INTERPLAY_VIDEO_fuzzer Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-01-25lavc/svq3: Fail for media key encryption.Carl Eugen Hoyos
Tested-by: ami_stuff Fixes a part of ticket #6094.
2017-01-25avcodec/vp56: Check for the bitstream end, pass error codes onMichael Niedermayer
Fixes timeout Fixes: 446/fuzz-3-ffmpeg_VIDEO_AV_CODEC_ID_VP6_fuzzer Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-01-24aarch64: Add NEON optimizations for 10 and 12 bit vp9 loop filterMartin Storsjö
This work is sponsored by, and copyright, Google. This is similar to the arm version, but due to the larger registers on aarch64, we can do 8 pixels at a time for all filter sizes. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_loop_filter_h_4_8_10bpp_neon: 213.2 172.6 vp9_loop_filter_h_8_8_10bpp_neon: 281.2 244.2 vp9_loop_filter_h_16_8_10bpp_neon: 657.0 444.5 vp9_loop_filter_h_16_16_10bpp_neon: 1280.4 877.7 vp9_loop_filter_mix2_h_44_16_10bpp_neon: 397.7 358.0 vp9_loop_filter_mix2_h_48_16_10bpp_neon: 465.7 429.0 vp9_loop_filter_mix2_h_84_16_10bpp_neon: 465.7 428.0 vp9_loop_filter_mix2_h_88_16_10bpp_neon: 533.7 499.0 vp9_loop_filter_mix2_v_44_16_10bpp_neon: 271.5 244.0 vp9_loop_filter_mix2_v_48_16_10bpp_neon: 330.0 305.0 vp9_loop_filter_mix2_v_84_16_10bpp_neon: 329.0 306.0 vp9_loop_filter_mix2_v_88_16_10bpp_neon: 386.0 365.0 vp9_loop_filter_v_4_8_10bpp_neon: 150.0 115.2 vp9_loop_filter_v_8_8_10bpp_neon: 209.0 175.5 vp9_loop_filter_v_16_8_10bpp_neon: 492.7 345.2 vp9_loop_filter_v_16_16_10bpp_neon: 951.0 682.7 This is significantly faster than the ARM version in almost all cases except for the mix2 functions. Based on START_TIMER/STOP_TIMER wrapping around a few individual functions, the speedup vs C code is around 2-3x. Signed-off-by: Martin Storsjö <martin@martin.st>
2017-01-24aarch64: Add NEON optimizations for 10 and 12 bit vp9 itxfmMartin Storsjö
This work is sponsored by, and copyright, Google. Compared to the arm version, on aarch64 we can keep the full 8x8 transform in registers, and for 16x16 and 32x32, we can process it in slices of 4 pixels instead of 2. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_inv_adst_adst_4x4_sub4_add_10_neon: 111.0 109.7 vp9_inv_adst_adst_8x8_sub8_add_10_neon: 914.0 733.5 vp9_inv_adst_adst_16x16_sub16_add_10_neon: 5184.0 3745.7 vp9_inv_dct_dct_4x4_sub1_add_10_neon: 65.0 65.7 vp9_inv_dct_dct_4x4_sub4_add_10_neon: 100.0 96.7 vp9_inv_dct_dct_8x8_sub1_add_10_neon: 111.0 119.7 vp9_inv_dct_dct_8x8_sub8_add_10_neon: 618.0 494.7 vp9_inv_dct_dct_16x16_sub1_add_10_neon: 295.1 284.6 vp9_inv_dct_dct_16x16_sub2_add_10_neon: 2303.2 1883.9 vp9_inv_dct_dct_16x16_sub8_add_10_neon: 2984.8 2189.3 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 3890.0 2799.4 vp9_inv_dct_dct_32x32_sub1_add_10_neon: 1044.4 1012.7 vp9_inv_dct_dct_32x32_sub2_add_10_neon: 13333.7 9695.1 vp9_inv_dct_dct_32x32_sub16_add_10_neon: 18531.3 12459.8 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 24470.7 16160.2 vp9_inv_wht_wht_4x4_sub4_add_10_neon: 83.0 79.7 The larger transforms are significantly faster than the corresponding ARM versions. The speedup vs C code is smaller than in 32 bit mode, probably because the 64 bit intermediates in the C code can be expressed more efficiently in aarch64. Signed-off-by: Martin Storsjö <martin@martin.st>
2017-01-24aarch64: Add NEON optimizations for 10 and 12 bit vp9 MCMartin Storsjö
This work is sponsored by, and copyright, Google. This has mostly got the same differences to the 8 bit version as in the arm version. For the horizontal filters, we do 16 pixels in parallel as well. For the 8 pixel wide vertical filters, we can accumulate 4 rows before storing, just as in the 8 bit version. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_avg4_10bpp_neon: 35.7 30.7 vp9_avg8_10bpp_neon: 93.5 84.7 vp9_avg16_10bpp_neon: 324.4 296.6 vp9_avg32_10bpp_neon: 1236.5 1148.2 vp9_avg64_10bpp_neon: 4639.6 4571.1 vp9_avg_8tap_smooth_4h_10bpp_neon: 130.0 128.0 vp9_avg_8tap_smooth_4hv_10bpp_neon: 440.0 440.5 vp9_avg_8tap_smooth_4v_10bpp_neon: 114.0 105.5 vp9_avg_8tap_smooth_8h_10bpp_neon: 327.0 314.0 vp9_avg_8tap_smooth_8hv_10bpp_neon: 918.7 865.4 vp9_avg_8tap_smooth_8v_10bpp_neon: 330.0 300.2 vp9_avg_8tap_smooth_16h_10bpp_neon: 1187.5 1155.5 vp9_avg_8tap_smooth_16hv_10bpp_neon: 2663.1 2591.0 vp9_avg_8tap_smooth_16v_10bpp_neon: 1107.4 1078.3 vp9_avg_8tap_smooth_64h_10bpp_neon: 17754.6 17454.7 vp9_avg_8tap_smooth_64hv_10bpp_neon: 33285.2 33001.5 vp9_avg_8tap_smooth_64v_10bpp_neon: 16066.9 16048.6 vp9_put4_10bpp_neon: 25.5 21.7 vp9_put8_10bpp_neon: 56.0 52.0 vp9_put16_10bpp_neon/armv8: 183.0 163.1 vp9_put32_10bpp_neon/armv8: 678.6 563.1 vp9_put64_10bpp_neon/armv8: 2679.9 2195.8 vp9_put_8tap_smooth_4h_10bpp_neon: 120.0 118.0 vp9_put_8tap_smooth_4hv_10bpp_neon: 435.2 435.0 vp9_put_8tap_smooth_4v_10bpp_neon: 107.0 98.2 vp9_put_8tap_smooth_8h_10bpp_neon: 303.0 290.0 vp9_put_8tap_smooth_8hv_10bpp_neon: 893.7 828.7 vp9_put_8tap_smooth_8v_10bpp_neon: 305.5 263.5 vp9_put_8tap_smooth_16h_10bpp_neon: 1089.1 1059.2 vp9_put_8tap_smooth_16hv_10bpp_neon: 2578.8 2452.4 vp9_put_8tap_smooth_16v_10bpp_neon: 1009.5 933.5 vp9_put_8tap_smooth_64h_10bpp_neon: 16223.4 15918.6 vp9_put_8tap_smooth_64hv_10bpp_neon: 32153.0 31016.2 vp9_put_8tap_smooth_64v_10bpp_neon: 14516.5 13748.1 These are generally about as fast as the corresponding ARM routines on the same CPU (at least on the A53), in most cases marginally faster. The speedup vs C code is around 4-9x. Signed-off-by: Martin Storsjö <martin@martin.st>
2017-01-24aarch64: vp9dsp: Restructure the bpp checksMartin Storsjö
This work is sponsored by, and copyright, Google. This is more in line with how it will be extended for more bitdepths. Signed-off-by: Martin Storsjö <martin@martin.st>
2017-01-24arm: Add NEON optimizations for 10 and 12 bit vp9 loop filterMartin Storsjö
This work is sponsored by, and copyright, Google. This is pretty much similar to the 8 bpp version, but in some senses simpler. All input pixels are 16 bits, and all intermediates also fit in 16 bits, so there's no lengthening/narrowing in the filter at all. For the full 16 pixel wide filter, we can only process 4 pixels at a time (using an implementation very much similar to the one for 8 bpp), but we can do 8 pixels at a time for the 4 and 8 pixel wide filters with a different implementation of the core filter. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_loop_filter_h_4_8_10bpp_neon: 1.83 2.16 1.40 2.09 vp9_loop_filter_h_8_8_10bpp_neon: 1.39 1.67 1.24 1.70 vp9_loop_filter_h_16_8_10bpp_neon: 1.56 1.47 1.10 1.81 vp9_loop_filter_h_16_16_10bpp_neon: 1.94 1.69 1.33 2.24 vp9_loop_filter_mix2_h_44_16_10bpp_neon: 2.01 2.27 1.67 2.39 vp9_loop_filter_mix2_h_48_16_10bpp_neon: 1.84 2.06 1.45 2.19 vp9_loop_filter_mix2_h_84_16_10bpp_neon: 1.89 2.20 1.47 2.29 vp9_loop_filter_mix2_h_88_16_10bpp_neon: 1.69 2.12 1.47 2.08 vp9_loop_filter_mix2_v_44_16_10bpp_neon: 3.16 3.98 2.50 4.05 vp9_loop_filter_mix2_v_48_16_10bpp_neon: 2.84 3.64 2.25 3.77 vp9_loop_filter_mix2_v_84_16_10bpp_neon: 2.65 3.45 2.16 3.54 vp9_loop_filter_mix2_v_88_16_10bpp_neon: 2.55 3.30 2.16 3.55 vp9_loop_filter_v_4_8_10bpp_neon: 2.85 3.97 2.24 3.68 vp9_loop_filter_v_8_8_10bpp_neon: 2.27 3.19 1.96 3.08 vp9_loop_filter_v_16_8_10bpp_neon: 3.42 2.74 2.26 4.40 vp9_loop_filter_v_16_16_10bpp_neon: 2.86 2.44 1.93 3.88 The speedup vs C code measured in checkasm is around 1.1-4x. These numbers are quite inconclusive though, since the checkasm test runs multiple filterings on top of each other, so later rounds might end up with different codepaths (different decisions on which filter to apply, based on input pixel differences). Based on START_TIMER/STOP_TIMER wrapping around a few individual functions, the speedup vs C code is around 2-4x. Signed-off-by: Martin Storsjö <martin@martin.st>
2017-01-24arm: Add NEON optimizations for 10 and 12 bit vp9 itxfmMartin Storsjö
This work is sponsored by, and copyright, Google. This is structured similarly to the 8 bit version. In the 8 bit version, the coefficients are 16 bits, and intermediates are 32 bits. Here, the coefficients are 32 bit. For the 4x4 transforms for 10 bit content, the intermediates also fit in 32 bits, but for all other transforms (4x4 for 12 bit content, and 8x8 and larger for both 10 and 12 bit) the intermediates are 64 bit. For the existing 8 bit case, the 8x8 transform fit all coefficients in registers; for 10/12 bit, when the coefficients are 32 bit, the 8x8 transform also has to be done in slices of 4 pixels (just as 16x16 and 32x32 for 8 bit). The slice width also shrinks from 4 elements to 2 elements in parallel for the 16x16 and 32x32 cases. The 16 bit coefficients from idct_coeffs and similar tables also need to be lenghtened to 32 bit in order to be used in multiplication with vectors with 32 bit elements. This leads to the fixed coefficient vectors needing more space, leading to more cases where they have to be reloaded within the transform (in iadst16). This technically would need testing in checkasm for subpartitions in increments of 2, but that slows down normal checkasm runs excessively. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_inv_adst_adst_4x4_sub4_add_10_neon: 4.83 11.36 5.22 6.77 vp9_inv_adst_adst_8x8_sub8_add_10_neon: 4.12 7.60 4.06 4.84 vp9_inv_adst_adst_16x16_sub16_add_10_neon: 3.93 8.16 4.52 5.35 vp9_inv_dct_dct_4x4_sub1_add_10_neon: 1.36 2.57 1.41 1.61 vp9_inv_dct_dct_4x4_sub4_add_10_neon: 4.24 8.66 5.06 5.81 vp9_inv_dct_dct_8x8_sub1_add_10_neon: 2.63 4.18 1.68 2.87 vp9_inv_dct_dct_8x8_sub4_add_10_neon: 4.52 9.47 4.24 5.39 vp9_inv_dct_dct_8x8_sub8_add_10_neon: 3.45 7.34 3.45 4.30 vp9_inv_dct_dct_16x16_sub1_add_10_neon: 3.56 6.21 2.47 4.32 vp9_inv_dct_dct_16x16_sub2_add_10_neon: 5.68 12.73 5.28 7.07 vp9_inv_dct_dct_16x16_sub8_add_10_neon: 4.42 9.28 4.24 5.45 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 3.41 7.29 3.35 4.19 vp9_inv_dct_dct_32x32_sub1_add_10_neon: 4.52 8.35 3.83 6.40 vp9_inv_dct_dct_32x32_sub2_add_10_neon: 5.86 13.19 6.14 7.04 vp9_inv_dct_dct_32x32_sub16_add_10_neon: 4.29 8.11 4.59 5.06 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 3.31 5.70 3.56 3.84 vp9_inv_wht_wht_4x4_sub4_add_10_neon: 1.89 2.80 1.82 1.97 The speedup compared to the C functions is around 1.3 to 7x for the full transforms, even higher for the smaller subpartitions. Signed-off-by: Martin Storsjö <martin@martin.st>
2017-01-24arm: Add NEON optimizations for 10 and 12 bit vp9 MCMartin Storsjö
This work is sponsored by, and copyright, Google. The plain pixel put/copy functions are used from the 8 bit version, for the double size (e.g. put16 uses ff_vp9_copy32_neon), and a new copy128 is added. Compared with the 8 bit version, the filters can no longer use the trick to accumulate in 16 bit with only saturation at the end, but now the accumulators need to be 32 bit. This avoids the need to keep track of which filter index is the largest though, reducing the size of the executable code for these filters. For the horizontal filters, we only do 4 or 8 pixels wide in parallel (while doing two rows at a time), since we don't have enough register space to filter 16 pixels wide. For the vertical filters, we still do 4 and 8 pixels in parallel just as in the 8 bit case, but we need to store the output after every 2 rows instead of after every 4 rows. Examples of relative speedup compared to the C version, from checkasm: Cortex A7 A8 A9 A53 vp9_avg4_10bpp_neon: 2.25 2.44 3.05 2.16 vp9_avg8_10bpp_neon: 3.66 8.48 3.86 3.50 vp9_avg16_10bpp_neon: 3.39 8.26 3.37 2.72 vp9_avg32_10bpp_neon: 4.03 10.20 4.07 3.42 vp9_avg64_10bpp_neon: 4.15 10.01 4.13 3.70 vp9_avg_8tap_smooth_4h_10bpp_neon: 3.38 6.22 3.41 4.75 vp9_avg_8tap_smooth_4hv_10bpp_neon: 3.89 6.39 4.30 5.32 vp9_avg_8tap_smooth_4v_10bpp_neon: 5.32 9.73 6.34 7.31 vp9_avg_8tap_smooth_8h_10bpp_neon: 4.45 9.40 4.68 6.87 vp9_avg_8tap_smooth_8hv_10bpp_neon: 4.64 8.91 5.44 6.47 vp9_avg_8tap_smooth_8v_10bpp_neon: 6.44 13.42 8.68 8.79 vp9_avg_8tap_smooth_64h_10bpp_neon: 4.66 9.02 4.84 7.71 vp9_avg_8tap_smooth_64hv_10bpp_neon: 4.61 9.14 4.92 7.10 vp9_avg_8tap_smooth_64v_10bpp_neon: 6.90 14.13 9.57 10.41 vp9_put4_10bpp_neon: 1.33 1.46 2.09 1.33 vp9_put8_10bpp_neon: 1.57 3.42 1.83 1.84 vp9_put16_10bpp_neon: 1.55 4.78 2.17 1.89 vp9_put32_10bpp_neon: 2.06 5.35 2.14 2.30 vp9_put64_10bpp_neon: 3.00 2.41 1.95 1.66 vp9_put_8tap_smooth_4h_10bpp_neon: 3.19 5.81 3.31 4.63 vp9_put_8tap_smooth_4hv_10bpp_neon: 3.86 6.22 4.32 5.21 vp9_put_8tap_smooth_4v_10bpp_neon: 5.40 9.77 6.08 7.21 vp9_put_8tap_smooth_8h_10bpp_neon: 4.22 8.41 4.46 6.63 vp9_put_8tap_smooth_8hv_10bpp_neon: 4.56 8.51 5.39 6.25 vp9_put_8tap_smooth_8v_10bpp_neon: 6.60 12.43 8.17 8.89 vp9_put_8tap_smooth_64h_10bpp_neon: 4.41 8.59 4.54 7.49 vp9_put_8tap_smooth_64hv_10bpp_neon: 4.43 8.58 5.34 6.63 vp9_put_8tap_smooth_64v_10bpp_neon: 7.26 13.92 9.27 10.92 For the larger 8tap filters, the speedup vs C code is around 4-14x. Signed-off-by: Martin Storsjö <martin@martin.st>
2017-01-24arm: vp9dsp: Restructure the bpp checksMartin Storsjö
This work is sponsored by, and copyright, Google. This is more in line with how it will be extended for more bitdepths. Signed-off-by: Martin Storsjö <martin@martin.st>
2017-01-24avcodec/mjpegdec: Check remaining bitstream in ljpeg_decode_yuv_scan()Michael Niedermayer
Fixes timeout Fixes: 445/fuzz-3-ffmpeg_VIDEO_AV_CODEC_ID_MJPEG_fuzzer Fixes: 456/fuzz-2-ffmpeg_VIDEO_AV_CODEC_ID_JPEGLS_fuzzer Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-01-24Merge commit '38efff92f1ef81f3de20ff0460ec7b70c253d714'Clément Bœsch
* commit '38efff92f1ef81f3de20ff0460ec7b70c253d714': FATE: add a test for H.264 with two fields per packet h264: fix decoding multiple fields per packet with slice threads This merge includes two commits because the FATE test was useful in order to make proper testing. The merge gets rid of the now unused: - SLICE_SINGLETHREAD and SLICE_SKIPED macros - max_contexts - "again" label in decode_nal_units() This commit also includes the fix from d3e4d406b. Thanks to wm4 and Michael Niedermayer for their testing. Merged-by: Clément Bœsch <u@pkh.me> Merged-by: Matthieu Bouron <matthieu.bouron@gmail.com>
2017-01-24avcodec/h264dec: Fix regression with "make fate-h264-attachment-631 THREADS=8"Michael Niedermayer
This treats the case of no slices like no frames which it basically is. The field is added to the context as other nal related fields are also there and passing the has_slices field per *arguments is ugly and not consistent Found-by: ubitux Approved-by: ubitux Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-01-23avcodec/cuvid: fail early if GPU can't handle video resolutionPavel Koshevoy
CUVID on GeForce GT 730 and GeForce GTX 1060 does not report any error when decoding 8K h264 packets. However, it does return an error during cuvidCreateDecoder call if the indicated video resolution is not supported. Given that stream resolution is typically known as a result of probing it is better to use this information during avcodec_open2 call to fail immediately, rather than proceeding to decode and never receiving any frames from the decoder nor receiving any indication of decode failure. Signed-off-by: Timo Rothenpieler <timo@rothenpieler.org>
2017-01-23avcodec/pngdec: Fix off by 1 size in decode_zbuf()Michael Niedermayer
Fixes out of array access Fixes: 444/fuzz-2-ffmpeg_VIDEO_AV_CODEC_ID_PNG_fuzzer Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-01-22avcodec/error_resilience: update indention after last commitMichael Niedermayer
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-01-22avcodec/error_resilience: Optimize motion recovery code by using blcok listsMichael Niedermayer
This makes the code 7 times faster with the testcase from libfuzzer and should reduce the amount of timeouts we hit in automated fuzzing. (for example 438/fuzz-2-ffmpeg_VIDEO_AV_CODEC_ID_RV40_fuzzer) The code is also faster with more realistic input though the difference is small here as that is far from the worst cases the fuzzers pick out Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/targets/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-01-22avcodec/ac3dec: add consistent noise generation option.Jonathan Campbell
use av_lfg_init_from_data() to seed AC-3 dithering from the AC-3 frame data to make it consistent given the same AC-3 frame, if option is set. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-01-22vaapi_mpeg4: Restore changes overwritten by mergeMark Thompson
From 2aa8e33d7d86fbc4a4060c363a5733067c160654.
2017-01-21avcodec/fraps: add support for PAL8Paul B Mahol
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-01-21avcodec: Add FF_CODEC_CAP_SKIP_FRAME_FILL_PARAM to most h263 based codecsMichael Niedermayer
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-01-20lavc/h264dec: re-indent after previous commitMatthieu Bouron
2017-01-20lavc/h264dec: make sure a slice is decoded before finishing setupMatthieu Bouron
Fixes regression in fate-h264-attachment-631 with THREADS=8 introduced by bdbbb8f11edbf10add874508c5125c174d8939be.
2017-01-20avcodec/wmaprodec: add xma_flush for seeking in XMA2Paul B Mahol
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-01-20avcodec: add XMA2 parserPaul B Mahol
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-01-20avcodec/wmaprodec: unbreak XMA mono decodingPaul B Mahol
Signed-off-by: Paul B Mahol <onemda@gmail.com>