Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/videolan/dav1d.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMartin Storsjö <martin@martin.st>2019-11-05 16:40:17 +0300
committerMartin Storsjö <martin@martin.st>2019-11-12 12:37:55 +0300
commit9a100261b911d1bc96a36a3ec9bcbf6ee29dd228 (patch)
tree1f42746a3d15ab7e70e0d6668f961e890e2eb645 /src/arm/32/util.S
parentc02ec6cffb864fb54d27b61e7c72e40946e301be (diff)
arm: 32: Port the arm64 NEON loopfilter to arm32
The code is a fairly exact 1:1 port of the ARM64 code, but operating on 8 pixels at a time, instead of 16. Relative speedup over C code according to checkasm: Cortex A7 A8 A9 A53 A72 A73 lpf_h_sb_uv_w4_8bpc_neon: 1.36 1.40 1.25 1.71 1.55 1.59 lpf_h_sb_uv_w6_8bpc_neon: 2.18 2.11 1.74 2.65 2.32 2.34 lpf_h_sb_y_w4_8bpc_neon: 1.48 1.43 1.20 1.91 1.49 1.64 lpf_h_sb_y_w8_8bpc_neon: 2.34 2.05 1.78 2.84 2.35 2.69 lpf_h_sb_y_w16_8bpc_neon: 2.13 1.83 1.63 2.51 2.10 2.35 lpf_v_sb_uv_w4_8bpc_neon: 1.69 1.66 1.60 2.16 2.24 2.24 lpf_v_sb_uv_w6_8bpc_neon: 2.68 2.43 2.22 3.53 3.44 3.35 lpf_v_sb_y_w4_8bpc_neon: 1.74 1.74 1.43 2.34 2.14 2.18 lpf_v_sb_y_w8_8bpc_neon: 2.92 2.47 2.19 3.55 3.22 3.54 lpf_v_sb_y_w16_8bpc_neon: 2.62 2.19 1.98 3.25 2.80 3.10 Comparison to the original ARM64 assembly: ARM64: A53 A72 A73 lpf_h_sb_uv_w4_8bpc_neon: 702.5 518.2 529.1 lpf_h_sb_uv_w6_8bpc_neon: 1007.3 672.6 736.6 lpf_h_sb_y_w4_8bpc_neon: 1652.8 1261.2 1276.5 lpf_h_sb_y_w8_8bpc_neon: 2144.7 1559.8 1638.7 lpf_h_sb_y_w16_8bpc_neon: 2318.3 1757.2 1792.8 lpf_v_sb_uv_w4_8bpc_neon: 447.1 302.0 292.4 lpf_v_sb_uv_w6_8bpc_neon: 600.0 397.7 406.9 lpf_v_sb_y_w4_8bpc_neon: 1212.6 840.1 818.4 lpf_v_sb_y_w8_8bpc_neon: 1623.3 1167.4 1156.7 lpf_v_sb_y_w16_8bpc_neon: 1694.9 1237.9 1182.3 ARM32: lpf_h_sb_uv_w4_8bpc_neon: 821.2 501.1 500.8 lpf_h_sb_uv_w6_8bpc_neon: 1232.0 715.7 746.6 lpf_h_sb_y_w4_8bpc_neon: 2208.1 1373.2 1414.7 lpf_h_sb_y_w8_8bpc_neon: 3138.3 1843.1 1915.2 lpf_h_sb_y_w16_8bpc_neon: 3293.1 1842.5 1975.9 lpf_v_sb_uv_w4_8bpc_neon: 619.9 326.7 324.9 lpf_v_sb_uv_w6_8bpc_neon: 855.9 446.7 468.2 lpf_v_sb_y_w4_8bpc_neon: 1737.6 935.5 1007.0 lpf_v_sb_y_w8_8bpc_neon: 2346.7 1232.8 1298.3 lpf_v_sb_y_w16_8bpc_neon: 2353.4 1283.4 1379.9
Diffstat (limited to 'src/arm/32/util.S')
-rw-r--r--src/arm/32/util.S7
1 files changed, 7 insertions, 0 deletions
diff --git a/src/arm/32/util.S b/src/arm/32/util.S
index 53d60af..ea4afc3 100644
--- a/src/arm/32/util.S
+++ b/src/arm/32/util.S
@@ -84,4 +84,11 @@
vtrn.8 \r6, \r7
.endm
+.macro transpose_4x8b q0, q1, r0, r1, r2, r3
+ vtrn.16 \q0, \q1
+
+ vtrn.8 \r0, \r1
+ vtrn.8 \r2, \r3
+.endm
+
#endif /* DAV1D_SRC_ARM_32_UTIL_S */