github.com/videolan/dav1d.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2022-07-06	Eliminate unused C DSP functions at compile time	Henrik Gramner
	When compiling with asm enabled there's no point in compiling C versions of DSP functions that have asm implementations using instruction sets that the compiler can unconditionally use. E.g. when compiling with -mssse3 we can remove the C version of all functions with SSSE3 implementations. This is accomplished using the compiler's dead code elimination functionality. Can be configured using the new 'trim_dsp' meson option, which by default is enabled when compiling in release mode.
2021-10-29	Remove lpf_stride parameter from LR filters	Victorien Le Couviour--Tuffet

2021-05-16	Update some copyright dates to 20210.9.0	Jean-Baptiste Kempf

2021-05-04	x86: Add high bitdepth (10-bit) sgr AVX2 asm	Henrik Gramner

2021-02-11	Add minor SGR optimizations	Henrik Gramner
	Split the 5x5, 3x3, and mix cases into separate functions. Shrink some tables. Move some scalar calculations out of the DSP function. Make Wiener and SGR share the same function prototype to eliminate a branch in lr_stripe().
2021-02-03	looprestoration: Document how much filters are allowed to write past the ↵	Martin Storsjö
	right edge
2020-12-13	x86: Rewrite wiener SSE2/SSSE3/AVX2 asm	Henrik Gramner
	The previous implementation did two separate passes in the horizontal and vertical directions, with the intermediate values being stored in a buffer on the stack. This caused bad cache thrashing. By interleaving the horizontal and vertical passes in combination with a ring buffer for storing only a few rows at a time the performance is improved by a significant amount. Also split the function into 7-tap and 5-tap versions. The latter is faster and fairly common (always for chroma, sometimes for luma).
2020-12-13	Add miscellaneous minor wiener optimizations	Henrik Gramner
	Combine horizontal and vertical filter pointers into a single parameter when calling the wiener DSP function. Eliminate the +128 filter coefficient handling where possible.
2020-02-11	looprestoration: Add a bpc parameter to the init func	Martin Storsjö
	This allows using completely different codepaths for 10 and 12 bpc, or just adding SIMD functions for either of them.
2019-10-09	Add VSX wiener filter implementation	Michail Alvanos

2019-02-13	Remove leading double underscores from include guard defines	Martin Storsjö
	A symbol starting with two leading underscores is reserved for the compiler/standard library implementation. Also remove the trailing two double underscores for consistency and symmetry.
2018-12-06	12 bits/component support	Ronald S. Bultje

2018-11-26	arm64: looprestoration: NEON optimized wiener filter	Martin Storsjö
	The relative speedup compared to C code is around 4.2 for a Cortex A53 and 5.1 for a Snapdragon 835 (compared to GCC's autovectorized code), 6-7x compared to GCC's output without autovectorization, and ~8x compared to clang's output (which doesn't seem to try to vectorize this function).
2018-10-17	Add infrastructure for LR SIMD and unit tests.	Ronald S. Bultje
	wiener_luma_8bpc_c: 326272.1 wiener_luma_8bpc_avx2: 19841.5 Decoding time of first 1000 frames of Chimera-8bit-1920x1080.ivf goes from 27.471 to 23.558 seconds.
2018-10-16	Refactor left edge copying to reduce data copies by 50%.	Ronald S. Bultje
	Also copy 4 pixels so SIMD can use a padded write (movd).
2018-09-22	Initial decoder implementation.	Ronald S. Bultje
	With minor contributions from: - Jean-Baptiste Kempf <jb@videolan.org> - Marvin Scholz <epirat07@gmail.com> - Hugo Beauzée-Luyssen <hugo@videolan.org>