github.com/videolan/dav1d.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2019-04-16	arm64: looprestoration: Add a NEON implementation of SGR	Martin Storsjö
	Relative speedup vs (autovectorized) C code: Cortex A53 A72 A73 selfguided_3x3_8bpc_neon: 2.91 2.12 2.68 selfguided_5x5_8bpc_neon: 3.18 2.65 3.39 selfguided_mix_8bpc_neon: 3.04 2.29 2.98 The relative speedup vs non-vectorized C code is around 2.6-4.6x.
2019-03-07	arm: looprestoration: Simplify a few padding cases in wiener_filter_h_neon	Martin Storsjö

2019-02-04	arm64: looprestoration: Optimize loop termination checks in copy_narrow_neon	Martin Storsjö

2019-01-31	arm64: looprestoration: Simplify the horizontal filtering of one pixel at a time	Martin Storsjö

2019-01-31	arm64: looprestoration: Simplify the setup of wiener_filter_v_neon	Martin Storsjö

2019-01-31	arm64: looprestoration: Fix the loop condition in copy_narrow_neon	Martin Storsjö
	These cases looped round too many.
2019-01-31	arm64: looprestoration: Fix comment typos	Martin Storsjö

2018-11-26	arm64: looprestoration: NEON optimized wiener filter	Martin Storsjö
	The relative speedup compared to C code is around 4.2 for a Cortex A53 and 5.1 for a Snapdragon 835 (compared to GCC's autovectorized code), 6-7x compared to GCC's output without autovectorization, and ~8x compared to clang's output (which doesn't seem to try to vectorize this function).