diff options
author | Martin Storsjö <martin@martin.st> | 2020-02-10 11:03:27 +0300 |
---|---|---|
committer | Martin Storsjö <martin@martin.st> | 2020-02-11 11:45:29 +0300 |
commit | e3dbf92664918ecc830b4fde74b7cc0f6cd2065c (patch) | |
tree | 223e2de1fe66861a84e36983df164189a851960a /include | |
parent | 7cf5d7535f44d7c2d00e368575d0d26b66c73121 (diff) |
arm64: looprestoration: NEON implementation of SGR for 10 bpc
This only supports 10 bpc, not 12 bpc, as the sum and tmp buffers can
be int16_t for 10 bpc, but need to be int32_t for 12 bpc.
Make actual templates out of the functions in looprestoration_tmpl.S,
and add box3/5_h to looprestoration16.S.
Extend dav1d_sgr_calc_abX_neon with a mandatory bitdepth_max parameter
(which is passed even in 8bpc mode), add a define to bitdepth.h for
passing such a parameter in all modes. This makes this function
a few instructions slower in 8bpc mode than it was before (overall impact
seems to be around 1% of the total runtime of SGR), but allows using the
same actual function instantiation for all modes, saving a bit of code
size.
Examples of checkasm runtimes:
Cortex A53 A72 A73
selfguided_3x3_10bpc_neon: 516755.8 389412.7 349058.7
selfguided_5x5_10bpc_neon: 380699.9 293486.6 254591.6
selfguided_mix_10bpc_neon: 878142.3 667495.9 587844.6
Corresponding 8 bpc numbers for comparison:
selfguided_3x3_8bpc_neon: 491058.1 361473.4 347705.9
selfguided_5x5_8bpc_neon: 352655.0 266423.7 248192.2
selfguided_mix_8bpc_neon: 826094.1 612372.2 581943.1
Diffstat (limited to 'include')
-rw-r--r-- | include/common/bitdepth.h | 2 |
1 files changed, 2 insertions, 0 deletions
diff --git a/include/common/bitdepth.h b/include/common/bitdepth.h index 33b32d0..88a822a 100644 --- a/include/common/bitdepth.h +++ b/include/common/bitdepth.h @@ -56,6 +56,7 @@ typedef int16_t coef; #define HIGHBD_CALL_SUFFIX /* nothing */ #define HIGHBD_TAIL_SUFFIX /* nothing */ #define bitdepth_from_max(x) 8 +#define BITDEPTH_MAX 0xff #elif BITDEPTH == 16 typedef uint16_t pixel; typedef int32_t coef; @@ -72,6 +73,7 @@ static inline void pixel_set(pixel *const dst, const int val, const int num) { #define HIGHBD_CALL_SUFFIX , f->bitdepth_max #define HIGHBD_TAIL_SUFFIX , bitdepth_max #define bitdepth_from_max(bitdepth_max) (32 - clz(bitdepth_max)) +#define BITDEPTH_MAX bitdepth_max #define bitfn(x) x##_16bpc #define BF(x, suffix) x##_16bpc_##suffix static inline ptrdiff_t PXSTRIDE(const ptrdiff_t x) { |