Add ARMv4/ARMv5E macros.

Original patch by Aurélien Zanelli <aurelien.zanelli@parrot.com>: http://lists.xiph.org/pipermail/opus/2013-May/002078.html Revised version: - Add autconf detection (ported from libtheora). - Rename ARM5E to ARMv5E (an ARM5 is not the same thing as ARMv5!). - Use actual macros so they can still be selectively overridden. - Split out ARMv4 parts and add a few more ARMv4 macros. - Label blocks to make them easy to find in generated assembly. - Fix MULT16_32_Q15() so we can pass make check. The MDCT test passes in values larger than 2**30 for b. The new version should be just as fast (or faster, since it's easier to merge the shift with following instructions), and there's no appreciable impact on accuracy (FFT/MDCT SNR actually goes up in most cases). - Fix register constraints. We were using early-clobber flags in a bunch of places that didn't need them, and commutative-pair flags in a bunch of places that weren't actually commutative. This was Jean-Marc's fault (the original code came from Speex). - Simplify silk_CLZ16(). - Port over iFFT C_MULC asm by Andree Buschmann <AndreeBuschmann@t-online.de> from Rockbox. - Speed up the C_MULC asm by using LDRD, allowing more flexible addressing, re-ordering instructions to avoid some stalls, allowing more flexible register allocation, and getting things out of the inline asm block so the compiler can schedule them better. - Add C_MUL and C_MUL4 asm for the FFT to the encoder based, on the new C_MULC. In total, this patch gives a 22.3% speed-up on test_opus_encoder on a 600 MHz Cortex A8 using gcc 4.2.1, When restricted to ARMv4 optimizations, it gives a 9.6% speed-up on the same processor/compiler. On the conformance test vectors: Average mono quality is 97.0583 % Average stereo quality is 97.775 %
author: Timothy B. Terriberry <tterribe@xiph.org> 2013-05-20 04:11:17 +0400
committer: Timothy B. Terriberry <tterribe@xiph.org> 2013-05-20 06:12:51 +0400
commit: 972a34ec2c79d241318af24389b8ee042d10556a (patch)
tree: 18894d8e576d351923ed57aacbdec125919d3ba8 /configure.ac
parent: b7bd4c20acfd951ba46647e07411285997d952f4 (diff)
1 files changed, 31 insertions, 1 deletions
diff --git a/configure.ac b/configure.ac
index 060bb523..1ccdca80 100644
--- a/configure.ac
+++ b/configure.ac
@@ -18,7 +18,6 @@ AC_CONFIG_SRCDIR(src/opus_encoder.c)
 dnl enable silent rules on automake 1.11 and later
 m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])])
 
-
 # For libtool.
 dnl Please update these for releases.
 OPUS_LT_CURRENT=4
@@ -155,6 +154,36 @@ if test "x${float_approx}" = "xyes"; then
     AC_DEFINE([FLOAT_APPROX], , [Float approximations])
 fi
 
+cpu_arm=no
+AC_ARG_ENABLE(asm,
+    AS_HELP_STRING([--disable-asm], [Disable assembly optimizations]),
+    [ ac_enable_asm=$enableval ], [ ac_enable_asm=yes] )
+if test "x${ac_enable_asm}" = xyes ; then
+    asm_optimization="no asm for your platform, please send patches"
+    case $host_cpu in
+    arm*)
+        cpu_arm=yes
+        AS_GCC_INLINE_ASSEMBLY([asm_optimization="ARM"],
+            [asm_optimization="disabled"])
+        if test "x${asm_optimization}" = "xARM" ; then
+            AC_DEFINE([ARMv4_ASM], [], [Use generic ARMv4 asm optimizations])
+            AS_ASM_ARM_EDSP([ARMv5E_ASM=1],[ARMv5E_ASM=0])
+            if test "x${ARMv5E_ASM}" = "x1" ; then
+                AC_DEFINE(ARMv5E_ASM, 1, [Use ARMv5E asm optimizations])
+                asm_optimization="${asm_optimization} (EDSP)"
+            fi
+            AS_ASM_ARM_MEDIA([ARMv6_ASM=1],[ARMv6_ASM=0])
+            if test "x${ARMv6_ASM}" = "x1" ; then
+                AC_DEFINE(ARMv6_ASM, 1, [Use ARMv6 asm optimizations])
+                asm_optimization="${asm_optimization} (Media)"
+            fi
+        fi
+        ;;
+    esac
+else
+    asm_optimization="disabled"
+fi
+
 ac_enable_assertions="no"
 AC_ARG_ENABLE(assertions, [  --enable-assertions        enable additional software error checking],
 [if test "$enableval" = yes; then
@@ -281,6 +310,7 @@ AC_MSG_RESULT([
       Floating point support: ........ ${ac_enable_float}
       Fast float approximations: ..... ${float_approx}
       Fixed point debugging: ......... ${ac_enable_fixed_debug}
+      Assembly optimization: ......... ${asm_optimization}
       Custom modes: .................. ${ac_enable_custom_modes}
       Assertion checking: ............ ${ac_enable_assertions}
       Fuzzing: ....................... ${ac_enable_fuzzing}
author	Timothy B. Terriberry <tterribe@xiph.org>	2013-05-20 04:11:17 +0400
committer	Timothy B. Terriberry <tterribe@xiph.org>	2013-05-20 06:12:51 +0400
commit	972a34ec2c79d241318af24389b8ee042d10556a (patch)
tree	18894d8e576d351923ed57aacbdec125919d3ba8 /configure.ac
parent	b7bd4c20acfd951ba46647e07411285997d952f4 (diff)