Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/videolan/dav1d.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-07-06Eliminate unused C DSP functions at compile timeHenrik Gramner
When compiling with asm enabled there's no point in compiling C versions of DSP functions that have asm implementations using instruction sets that the compiler can unconditionally use. E.g. when compiling with -mssse3 we can remove the C version of all functions with SSSE3 implementations. This is accomplished using the compiler's dead code elimination functionality. Can be configured using the new 'trim_dsp' meson option, which by default is enabled when compiling in release mode.
2021-10-29Remove lpf_stride parameter from LR filtersVictorien Le Couviour--Tuffet
2021-09-08Simplify sgr_x_by_x calculationsHenrik Gramner
2021-05-04x86: Add high bitdepth (10-bit) sgr AVX2 asmHenrik Gramner
2021-02-11Add minor SGR optimizationsHenrik Gramner
Split the 5x5, 3x3, and mix cases into separate functions. Shrink some tables. Move some scalar calculations out of the DSP function. Make Wiener and SGR share the same function prototype to eliminate a branch in lr_stripe().
2020-12-13x86: Rewrite wiener SSE2/SSSE3/AVX2 asmHenrik Gramner
The previous implementation did two separate passes in the horizontal and vertical directions, with the intermediate values being stored in a buffer on the stack. This caused bad cache thrashing. By interleaving the horizontal and vertical passes in combination with a ring buffer for storing only a few rows at a time the performance is improved by a significant amount. Also split the function into 7-tap and 5-tap versions. The latter is faster and fairly common (always for chroma, sometimes for luma).
2020-12-13Add miscellaneous minor wiener optimizationsHenrik Gramner
Combine horizontal and vertical filter pointers into a single parameter when calling the wiener DSP function. Eliminate the +128 filter coefficient handling where possible.
2020-11-17Combine boxsum and boxsumsqr in SGR C codeLuc Trudeau
Makes C code more alike ASM
2020-11-16use less memory in SGR C codeLuc Trudeau
2020-02-11looprestoration: Add a bpc parameter to the init funcMartin Storsjö
This allows using completely different codepaths for 10 and 12 bpc, or just adding SIMD functions for either of them.
2019-10-09Add VSX wiener filter implementationMichail Alvanos
2019-08-03Improve wiener filter C implementation using loop interchangeMichail Alvanos
2019-05-10Add __attribute__((cold)) to rarely used functionsHenrik Gramner
2019-02-08looprestoration: fix macro undef in C codeVictorien Le Couviour--Tuffet
2019-01-19Add SGR optimizationsHenrik Gramner
2018-12-0612 bits/component supportRonald S. Bultje
2018-11-26arm64: looprestoration: NEON optimized wiener filterMartin Storsjö
The relative speedup compared to C code is around 4.2 for a Cortex A53 and 5.1 for a Snapdragon 835 (compared to GCC's autovectorized code), 6-7x compared to GCC's output without autovectorization, and ~8x compared to clang's output (which doesn't seem to try to vectorize this function).
2018-11-12Re-add imax(v, 0) in SGR calculationRonald S. Bultje
Apparently this can happen for bitdepth > 8. I haven't seen it happen for bitdepth==8. Fixes #161.
2018-11-12Fix type mismatch (int32_t vs. int)Ronald S. Bultje
2018-11-05Simplify SGR C codeRonald S. Bultje
- remove unused entry from tables.h; - use non-sized types for scalar values; - reduce size of intermediate tables from int32 to int16.
2018-10-31Remove dav1d_sgr_one_by_xLuc Trudeau
Since n equals either 25 or 9, the dav1d_sgr_one_by_x table can be replaced with a ternary operation.
2018-10-25Build: Add suffix to templated BITDEPTH filesMarvin Scholz
Fix #96