Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/videolan/dav1d.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMartin Storsjö <martin@martin.st>2019-03-30 01:02:24 +0300
committerMartin Storsjö <martin@martin.st>2019-04-16 13:46:13 +0300
commit0282f6f3d5916fdd169c3af6c3250ed214308f5c (patch)
tree3cea200fbf206581f879d3c59189e22b87ae927c /src/loopfilter.h
parent204bf2115ff91d1a594ab47c7d8adaf2b1b80b07 (diff)
arm64: loopfilter: Implement NEON loop filters
The exact relative speedup compared to C code is a bit vague and hard to measure, depending on eactly how many filtered blocks are skipped, as the NEON version always filters 16 pixels at a time, while the C code can skip processing individual 4 pixel blocks. Additionally, the checkasm benchmarking code runs the same function repeatedly on the same buffer, which can make the filter take different codepaths on each run, as the function updates the buffer which will be used as input for the next run. If tweaking the checkasm test data to try to avoid skipped blocks, the relative speedups compared to C is between 2x and 5x, while it is around 1x to 4x with the current checkasm test as such. Benchmark numbers from a tweaked checkasm that avoids skipped blocks: Cortex A53 A72 A73 lpf_h_sb_uv_w4_8bpc_c: 2954.7 1399.3 1655.3 lpf_h_sb_uv_w4_8bpc_neon: 895.5 650.8 692.0 lpf_h_sb_uv_w6_8bpc_c: 3879.2 1917.2 2257.7 lpf_h_sb_uv_w6_8bpc_neon: 1125.6 759.5 838.4 lpf_h_sb_y_w4_8bpc_c: 6711.0 3275.5 3913.7 lpf_h_sb_y_w4_8bpc_neon: 1744.0 1342.1 1351.5 lpf_h_sb_y_w8_8bpc_c: 10695.7 6155.8 6638.9 lpf_h_sb_y_w8_8bpc_neon: 2146.5 1560.4 1609.1 lpf_h_sb_y_w16_8bpc_c: 11355.8 6292.0 6995.9 lpf_h_sb_y_w16_8bpc_neon: 2475.4 1949.6 1968.4 lpf_v_sb_uv_w4_8bpc_c: 2639.7 1204.8 1425.9 lpf_v_sb_uv_w4_8bpc_neon: 510.7 351.4 334.7 lpf_v_sb_uv_w6_8bpc_c: 3468.3 1757.1 2021.5 lpf_v_sb_uv_w6_8bpc_neon: 625.0 415.0 397.8 lpf_v_sb_y_w4_8bpc_c: 5428.7 2731.7 3068.5 lpf_v_sb_y_w4_8bpc_neon: 1172.6 792.1 768.0 lpf_v_sb_y_w8_8bpc_c: 8946.1 4412.8 5121.0 lpf_v_sb_y_w8_8bpc_neon: 1565.5 1063.6 1062.7 lpf_v_sb_y_w16_8bpc_c: 8978.9 4411.7 5112.0 lpf_v_sb_y_w16_8bpc_neon: 1775.0 1288.1 1236.7
Diffstat (limited to 'src/loopfilter.h')
-rw-r--r--src/loopfilter.h1
1 files changed, 1 insertions, 0 deletions
diff --git a/src/loopfilter.h b/src/loopfilter.h
index 7f8fd13..c159050 100644
--- a/src/loopfilter.h
+++ b/src/loopfilter.h
@@ -53,6 +53,7 @@ typedef struct Dav1dLoopFilterDSPContext {
} Dav1dLoopFilterDSPContext;
bitfn_decls(void dav1d_loop_filter_dsp_init, Dav1dLoopFilterDSPContext *c);
+bitfn_decls(void dav1d_loop_filter_dsp_init_arm, Dav1dLoopFilterDSPContext *c);
bitfn_decls(void dav1d_loop_filter_dsp_init_x86, Dav1dLoopFilterDSPContext *c);
#endif /* DAV1D_SRC_LOOPFILTER_H */