Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/videolan/dav1d.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-04-16arm64: looprestoration: Add a NEON implementation of SGRMartin Storsjö
Relative speedup vs (autovectorized) C code: Cortex A53 A72 A73 selfguided_3x3_8bpc_neon: 2.91 2.12 2.68 selfguided_5x5_8bpc_neon: 3.18 2.65 3.39 selfguided_mix_8bpc_neon: 3.04 2.29 2.98 The relative speedup vs non-vectorized C code is around 2.6-4.6x.
2019-03-07arm: looprestoration: Simplify a few padding cases in wiener_filter_h_neonMartin Storsjö
2019-02-04arm64: looprestoration: Optimize loop termination checks in copy_narrow_neonMartin Storsjö
2019-01-31arm64: looprestoration: Simplify the horizontal filtering of one pixel at a timeMartin Storsjö
2019-01-31arm64: looprestoration: Simplify the setup of wiener_filter_v_neonMartin Storsjö
2019-01-31arm64: looprestoration: Fix the loop condition in copy_narrow_neonMartin Storsjö
These cases looped round too many.
2019-01-31arm64: looprestoration: Fix comment typosMartin Storsjö
2018-11-26arm64: looprestoration: NEON optimized wiener filterMartin Storsjö
The relative speedup compared to C code is around 4.2 for a Cortex A53 and 5.1 for a Snapdragon 835 (compared to GCC's autovectorized code), 6-7x compared to GCC's output without autovectorization, and ~8x compared to clang's output (which doesn't seem to try to vectorize this function).