Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/FFmpeg/FFmpeg.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-07-19avutil/mips: refactor msa load and store macros.Shiyou Yin
Replace STnxm_UB and LDnxm_SH with new macros ST_{H/W/D}{1/2/4/8}. The old macros are difficult to use because they don't follow the same parameter passing rules. Changing details as following: 1. remove LD4x4_SH. 2. replace ST2x4_UB with ST_H4. 3. replace ST4x2_UB with ST_W2. 4. replace ST4x4_UB with ST_W4. 5. replace ST4x8_UB with ST_W8. 6. replace ST6x4_UB with ST_W2 and ST_H2. 7. replace ST8x1_UB with ST_D1. 8. replace ST8x2_UB with ST_D2. 9. replace ST8x4_UB with ST_D4. 10. replace ST8x8_UB with ST_D8. 11. replace ST12x4_UB with ST_D4 and ST_W4. Examples of new macro: ST_H4(in, idx0, idx1, idx2, idx3, pdst, stride) ST_H4 store four half-word elements in vector 'in' to pdst with stride. About the macro name: 1) 'ST' means store operation. 2) 'H/W/D' means type of vector element is 'half-word/word/double-word'. 3) Number '1/2/4/8' means how many elements will be stored. About the macro parameter: 1) 'in0, in1...' 128-bits vector. 2) 'idx0, idx1...' elements index. 3) 'pdst' destination pointer to store to 4) 'stride' stride of each store operation. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-07-10avcodec/mips/cabac: replace addi with addiuYunQiang Su
addi/daddi are deprecated by MIPS for years, and MIPS r6 remove them. They should be replace with addiu: ADDIU performs the same arithmetic operation but does not trap on overflow. Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-05-26avcodec/mips: [loongson] fix mpeg4 decoding error on loongson platform.Shiyou Yin
In function ff_dct_unquantize_mpeg2_intra_mmi, addr0 shoudn't be changed before storage operation. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-02-27avcodec/mips: [loongson] mmi optimizations for VP9 put and avg functionsgxw
VP9 decoding speed improved about 60.5%(from 38fps to 61fps, tested on loongson 3A3000). Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-02-16avcodec/mips: [loongson] optimize theora decoding with mmi.gxw
Optimize theora decoding with mmi in functions: 1. ff_vp3_idct_add_mmi 2. ff_vp3_idct_put_mmi 3. ff_vp3_idct_dc_add_mmi 4. ff_put_no_rnd_pixels_l2_mmi Theora decoding speed improved about 32%(from 88fps to 116fps, Tested on loongson 3A3000). Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-02-02avcodec/mips: [loongson] optimize put_hevc_qpel_h_8 with mmi.Shiyou Yin
Optimize put_hevc_qpel_h_8 with mmi in the case width=4/8/12/16/24/32/48/64. This optimization improved HEVC decoding performance 2%(2.39x to 2.44x, tested on loongson 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-02-02avcodec/mips: [loongson] optimize put_hevc_qpel_bi_h_8 with mmi.Shiyou Yin
Optimize put_hevc_qpel_bi_h_8 with mmi in the case width=4/8/12/16/24/32/48/64. This optimization improved HEVC decoding performance 2.1%(2.34x to 2.39x, tested on loongson 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-02-02avcodec/mips: [loongson] optimize put_hevc_epel_bi_hv_8 with mmi.Shiyou Yin
Optimize put_hevc_epel_bi_hv_8 with mmi in the case width=4/8/12/16/24/32. This optimization improved HEVC decoding performance 1.7%(2.30x to 2.34x, tested on loongson 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-02-02avcodec/mips: [loongson] optimize put_hevc_qpel_uni_hv_8 with mmi.Shiyou Yin
Optimize put_hevc_qpel_uni_hv_8 with mmi in the case width=4/8/12/16/24/32/48/64. This optimization improved HEVC decoding performance 2.7%(2.24x to 2.30x, tested on loongson 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-01-22avcodec/mips: [loongson] optimize put_hevc_qpel_bi_hv_8 with mmi.Shiyou Yin
Optimize put_hevc_qpel_bi_hv_8 with mmi in the case width=4/8/12/16/24/32/48/64. This optimization improved HEVC decoding performance 11.4%(2.01x to 2.24x, tested on loongson 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-01-22avcodec/mips: [loongson] optimize put_hevc_qpel_hv_8 with mmi.Shiyou Yin
Optimize put_hevc_qpel_hv_8 with mmi in the case width=4/8/12/16/24/32/48/64. This optimization improved HEVC decoding performance 11%(1.81x to 2.01x, tested on loongson 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-01-20avcodec/mips: [loongson] optimize put_hevc_pel_bi_pixels_8 with mmi.Shiyou Yin
Optimize put_hevc_pel_bi_pixels_8 with mmi in the case width=8/16/24/32/48/64. This optimization improved HEVC decoding performance 2%(1.77x to 1.81x, tested on loongson 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-12-28avcodec/mips: [loongson] optimize theora decoding in vp3dsp.gxw
Optimize theora decoding with msa in functions: 1. ff_vp3_idct_add_msa 2. ff_vp3_idct_put_msa 3. ff_vp3_idct_dc_add_msa 4. ff_vp3_v_loop_filter_msa 5. ff_vp3_h_loop_filter_msa 6. ff_put_no_rnd_pixels_l2_msa Theora decoding speed improved about 36%(from 22fps to 30fps, Tested on loongson 2K1000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-12-25avcodec/mips: Fix failed case: hevc-conformance-AMP_A_Samsung_* when enable msagxw
The AV_INPUT_BUFFER_PADDING_SIZE has been increased to 64, but the value is still 32 in function ff_hevc_sao_edge_filter_8_msa. So, use AV_INPUT_BUFFER_PADDING_SIZE directly. Also, use MAX_PB_SIZE directly instead of 64. Fate tests passed. Reviewed-by: Derek Buitenhuis <derek.buitenhuis@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-12-18avcodec/mips: [loongson] enable MSA optimization for loongson platform.Shiyou Yin
Set initialization order of MSA after MMI to make it work on loongson platform(msa is supported by loongson2k、3a4000 etc.). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-12-01avcodec/mips: [loongson] refine optimization in h264_chroma.Shiyou Yin
Remove invalid operation in the case x and y all equal 0, this refine made about 2% speedup for H264 decode on loongson platform. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-09-19avcodec: [loongson] optimize get_cabac_inline.Shiyou Yin
This optimization improved h264 decoding performance about 4%(from 74fps to 77fps, tested on loongson 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-09-19avcodec/mips: [loongson] refine ff_vc1_inv_trans_8x8_mmi.Shiyou Yin
Combined 1st and 2nd loop into one inline asm in function ff_vc1_inv_trans_8x8_mmi to reduce memory operation, and made some small optimization in ff_vc1_inv_trans_4x8_mmi. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-09-14avcodec/mips: [loongson] fix bug of svq3-watermark failed in fate test.Shiyou Yin
Failed case: svq3-watermark When minimum loop count of following functions are greater than parameter h passed to them, svq3-watermark failed. 1. ff_put_pixels4_8_mmi 2. ff_avg_pixels4_8_mmi 3. ff_put_pixels4_l2_8_mmi 4. ff_avg_pixels4_l2_8_mmi Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-09-09avutil/mips: [loongson] simplify macro TRANSPOSE_4H and TRANSPOSE_8BShiyou Yin
Simplify macro TRANSPOSE_4H in mmiutils.h and add TRANSPOSE_8B as a common macro. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-09-09avcodec/mips: [loongson] optimize vp8 decoding in vp8dsp.gxw
Optimize vp8 loop filter with mmi, four functions optimized: 1. ff_vp8_h_loop_filter8uv_mmi. 2. ff_vp8_v_loop_filter8uv_mmi. 3. ff_vp8_h_loop_filter16_mmi. 4. ff_vp8_v_loop_filter16_mmi. Vp8 decoding speed improved about 50%(from 73fps to 110fps, Tested on loongson 3A3000). Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-09-07avcodec/mips: [loongson] fix improper use of register constraints.Shiyou Yin
Constraint "g" means compiler can store variable in memory or register. When we use constraint "g" for a variable and this variable was operated by instruction which only support register operands may lead "invalid operands" error. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-09-05avcodec/mips: [loongson] reoptimize put and add pixels clamped functions.Shiyou Yin
Simplify the usage of intermediate variable addr and remove unused variable all64 in following functions: 1. ff_put_pixels_clamped_mmi 2. ff_put_signed_pixels_clamped_mmi 3. ff_add_pixels_clamped_mmi This optimization speed up mpeg4 decode about 2% on loongson platform(tested with 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-09-04avcodec/mips: [loongson] simplify the usage of intermediate variable addr.Shiyou Yin
Simplify the usage of intermediate variable addr in following functions: 1. ff_put_pixels4_8_mmi 2. ff_put_pixels8_8_mmi 3. ff_put_pixels16_8_mmi 4. ff_avg_pixels16_8_mmi. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-09-04avcodec: [loongson] fix bug of mss2-wmv failed in fate test.Shiyou Yin
Failed case: mss2-wmv In following functions, pmullh was used to multiply two 16-bit data, this will cause data overflow. 1. ff_vc1_inv_trans_8x8_dc_mmi 2. ff_vc1_inv_trans_8x8_mmi 3. ff_vc1_inv_trans_8x4_mmi 4. ff_vc1_inv_trans_4x8_mmi 5. ff_vc1_inv_trans_4x4_mmi Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-09-02avcodec/mips: [loongson] optimize memset in h264dsp.Shiyou Yin
Optimized memset with mmi in following functions: 1. ff_h264_add_pixels4_8_mmi. 2. ff_h264_idct_add_8_mmi. 3. ff_h264_idct8_add_8_mmi. This optimization improved h264 decoding performance about 1.3%(tested on loongson 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-09-02avcodec/mips: [loongson] reoptimize h264_chroma_mc8_mmi v2.Shiyou Yin
Reoptimize function ff_put_h264_chroma_mc8_mmi and ff_avg_h264_chroma_mc8_mmi. Performance of h264 decoding improved about 5%(from 69fps to 73fps, tested on loongson 3A3000). Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-09-02avcodec/mips: [loongson] reoptimize simple idct with mmi.Shiyou Yin
Performance of mpeg4 decoding improved about 23%(from 128fps to 158fps, tested on loongson 3A3000). Reoptimized following functions with mmi. 1. ff_simple_idct_put_8_mmi 2. ff_simple_idct_add_8_mmi 3. ff_simple_idct_8_mmi Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-07-14avcodec/mips: fix conflicting types error of ff_vc1_h_s_overlap_mmi.Shiyou Yin
In commit 975a1a8,function ff_vc1_h_s_overlap_mmi was refactored, but the declaration in libavcodec/mips/vc1dsp_mips.h was unchanged. Change-Id: I90beae683511622a0cc1130ab1660ac8669ec3ef Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn> Reviewed-by: Jerome Borsboom <jerome.borsboom@carpalis.nl> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2018-06-29avcodec/vc1: fix overlap filter for frame interlaced picturesJerome Borsboom
The overlap filter is not correct for vertical edges in frame interlaced I and P pictures. When filtering macroblocks with different FIELDTX values, we have to match the lines at both sides of the vertical border. In addition, we have to use the correct rounding values, depending on the line we are filtering. Signed-off-by: Jerome Borsboom <jerome.borsboom@carpalis.nl>
2017-11-14avcodec/mips: Improve hevc non-uni hz and vt mc msa functionsKaustubh Raste
Use mask buffer. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-14avcodec/mips: cleanup unused macrosKaustubh Raste
Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-08avcodec/mips: Improve hevc non-uni hv mc msa functionsKaustubh Raste
Use mask buffer. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-08avcodec/mips: Improve hevc uni weighted 4 tap vt mc msa functionsKaustubh Raste
Use global mask buffer for appropriate mask load. Use immediate unsigned saturation for clip to max saving one vector register. Remove unused macro. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-08avcodec/mips: Improve hevc uni 4 tap hv mc msa functionsKaustubh Raste
Use global mask buffer for appropriate mask load. Remove unused macro and table. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-08avcodec/mips: Improve hevc bi wgt 4 tap hv mc msa functionsKaustubh Raste
Use global mask buffer for appropriate mask load. Use immediate unsigned saturation for clip to max saving one vector register. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-07avcodec/mips: Improve hevc bi 4 tap hv mc msa functionsKaustubh Raste
Use global mask buffer for appropriate mask load. Use immediate unsigned saturation for clip to max saving one vector register. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-07avcodec/mips: Improve avc avg mc 10, 30, 01 and 03 msa functionsKaustubh Raste
Align the mask buffer to 64 bytes. Load the specific destination bytes instead of MSA load and pack. Remove unused macros and functions. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-05avcodec/mips: Improve hevc uni weighted 4 tap hz mc msa functionsKaustubh Raste
Use global mask buffer for appropriate mask load. Use immediate unsigned saturation for clip to max saving one vector register. Remove unused macro. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-05avcodec/mips: Improve hevc uni 4 tap hz and vt mc msa functionsKaustubh Raste
Use global mask buffer for appropriate mask load. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-04avcodec/mips: Improve hevc bi wgt 4 tap hz and vt mc msa functionsKaustubh Raste
Use global mask buffer for appropriate mask load. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-04avcodec/mips: Improve hevc bi 4 tap hz and vt mc msa functionsKaustubh Raste
Use global mask buffer for appropriate mask load. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-04avcodec/mips: Improve avc avg mc 20, 21 and 23 msa functionsKaustubh Raste
Load the specific destination bytes instead of MSA load and pack. Remove unused macros and functions. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-03avcodec/mips: Improve hevc uni weighted hv mc msa functionsKaustubh Raste
Use immediate unsigned saturation for clip to max saving one vector register. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-03avcodec/mips: Improve avc avg mc 02, 12 and 32 msa functionsKaustubh Raste
Remove loops and unroll as block sizes are known. Load the specific destination bytes instead of MSA load and pack. Remove unused macro and functions. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-01avcodec/mips: Improve hevc uni vt and hv mc msa functionsKaustubh Raste
Remove unused macro. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-01avcodec/mips: Improve hevc bi hz and hv mc msa functionsKaustubh Raste
Align the mask buffer. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-11-01avcodec/mips: Improve hevc bi weighted copy, hz and vt mc msa functionsKaustubh Raste
Pack the data to half word before clipping. Use immediate unsigned saturation for clip to max saving one vector register. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-10-30avcodec/mips: Improve avc chroma avg hv mc msa functionsKaustubh Raste
Replace generic with block size specific function. Load the specific destination bytes instead of MSA load and pack. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-10-30avcodec/mips: Improve avc avg mc 22, 11, 31, 13 and 33 msa functionsKaustubh Raste
Remove loops and unroll as block sizes are known. Load the specific destination bytes instead of MSA load and pack. Remove unused macro and functions. Signed-off-by: Kaustubh Raste <kaustubh.raste@imgtec.com> Reviewed-by: Manojkumar Bhosale <Manojkumar.Bhosale@imgtec.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>