Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/videolan/dav1d.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMartin Storsjö <martin@martin.st>2020-02-10 11:03:27 +0300
committerMartin Storsjö <martin@martin.st>2020-02-11 11:45:29 +0300
commite3dbf92664918ecc830b4fde74b7cc0f6cd2065c (patch)
tree223e2de1fe66861a84e36983df164189a851960a /include
parent7cf5d7535f44d7c2d00e368575d0d26b66c73121 (diff)
arm64: looprestoration: NEON implementation of SGR for 10 bpc
This only supports 10 bpc, not 12 bpc, as the sum and tmp buffers can be int16_t for 10 bpc, but need to be int32_t for 12 bpc. Make actual templates out of the functions in looprestoration_tmpl.S, and add box3/5_h to looprestoration16.S. Extend dav1d_sgr_calc_abX_neon with a mandatory bitdepth_max parameter (which is passed even in 8bpc mode), add a define to bitdepth.h for passing such a parameter in all modes. This makes this function a few instructions slower in 8bpc mode than it was before (overall impact seems to be around 1% of the total runtime of SGR), but allows using the same actual function instantiation for all modes, saving a bit of code size. Examples of checkasm runtimes: Cortex A53 A72 A73 selfguided_3x3_10bpc_neon: 516755.8 389412.7 349058.7 selfguided_5x5_10bpc_neon: 380699.9 293486.6 254591.6 selfguided_mix_10bpc_neon: 878142.3 667495.9 587844.6 Corresponding 8 bpc numbers for comparison: selfguided_3x3_8bpc_neon: 491058.1 361473.4 347705.9 selfguided_5x5_8bpc_neon: 352655.0 266423.7 248192.2 selfguided_mix_8bpc_neon: 826094.1 612372.2 581943.1
Diffstat (limited to 'include')
-rw-r--r--include/common/bitdepth.h2
1 files changed, 2 insertions, 0 deletions
diff --git a/include/common/bitdepth.h b/include/common/bitdepth.h
index 33b32d0..88a822a 100644
--- a/include/common/bitdepth.h
+++ b/include/common/bitdepth.h
@@ -56,6 +56,7 @@ typedef int16_t coef;
#define HIGHBD_CALL_SUFFIX /* nothing */
#define HIGHBD_TAIL_SUFFIX /* nothing */
#define bitdepth_from_max(x) 8
+#define BITDEPTH_MAX 0xff
#elif BITDEPTH == 16
typedef uint16_t pixel;
typedef int32_t coef;
@@ -72,6 +73,7 @@ static inline void pixel_set(pixel *const dst, const int val, const int num) {
#define HIGHBD_CALL_SUFFIX , f->bitdepth_max
#define HIGHBD_TAIL_SUFFIX , bitdepth_max
#define bitdepth_from_max(bitdepth_max) (32 - clz(bitdepth_max))
+#define BITDEPTH_MAX bitdepth_max
#define bitfn(x) x##_16bpc
#define BF(x, suffix) x##_16bpc_##suffix
static inline ptrdiff_t PXSTRIDE(const ptrdiff_t x) {