diff options
author | Timothy B. Terriberry <tterribe@xiph.org> | 2013-05-20 04:11:17 +0400 |
---|---|---|
committer | Timothy B. Terriberry <tterribe@xiph.org> | 2013-05-20 06:12:51 +0400 |
commit | 972a34ec2c79d241318af24389b8ee042d10556a (patch) | |
tree | 18894d8e576d351923ed57aacbdec125919d3ba8 /configure.ac | |
parent | b7bd4c20acfd951ba46647e07411285997d952f4 (diff) |
Add ARMv4/ARMv5E macros.
Original patch by Aurélien Zanelli <aurelien.zanelli@parrot.com>:
http://lists.xiph.org/pipermail/opus/2013-May/002078.html
Revised version:
- Add autconf detection (ported from libtheora).
- Rename ARM5E to ARMv5E (an ARM5 is not the same thing as ARMv5!).
- Use actual macros so they can still be selectively overridden.
- Split out ARMv4 parts and add a few more ARMv4 macros.
- Label blocks to make them easy to find in generated assembly.
- Fix MULT16_32_Q15() so we can pass make check.
The MDCT test passes in values larger than 2**30 for b.
The new version should be just as fast (or faster, since it's
easier to merge the shift with following instructions), and
there's no appreciable impact on accuracy (FFT/MDCT SNR actually
goes up in most cases).
- Fix register constraints.
We were using early-clobber flags in a bunch of places that
didn't need them, and commutative-pair flags in a bunch of
places that weren't actually commutative.
This was Jean-Marc's fault (the original code came from Speex).
- Simplify silk_CLZ16().
- Port over iFFT C_MULC asm by Andree Buschmann
<AndreeBuschmann@t-online.de> from Rockbox.
- Speed up the C_MULC asm by using LDRD, allowing more flexible
addressing, re-ordering instructions to avoid some stalls,
allowing more flexible register allocation, and getting things
out of the inline asm block so the compiler can schedule them
better.
- Add C_MUL and C_MUL4 asm for the FFT to the encoder based, on the
new C_MULC.
In total, this patch gives a 22.3% speed-up on test_opus_encoder on
a 600 MHz Cortex A8 using gcc 4.2.1,
When restricted to ARMv4 optimizations, it gives a 9.6% speed-up
on the same processor/compiler.
On the conformance test vectors:
Average mono quality is 97.0583 %
Average stereo quality is 97.775 %
Diffstat (limited to 'configure.ac')
-rw-r--r-- | configure.ac | 32 |
1 files changed, 31 insertions, 1 deletions
diff --git a/configure.ac b/configure.ac index 060bb523..1ccdca80 100644 --- a/configure.ac +++ b/configure.ac @@ -18,7 +18,6 @@ AC_CONFIG_SRCDIR(src/opus_encoder.c) dnl enable silent rules on automake 1.11 and later m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])]) - # For libtool. dnl Please update these for releases. OPUS_LT_CURRENT=4 @@ -155,6 +154,36 @@ if test "x${float_approx}" = "xyes"; then AC_DEFINE([FLOAT_APPROX], , [Float approximations]) fi +cpu_arm=no +AC_ARG_ENABLE(asm, + AS_HELP_STRING([--disable-asm], [Disable assembly optimizations]), + [ ac_enable_asm=$enableval ], [ ac_enable_asm=yes] ) +if test "x${ac_enable_asm}" = xyes ; then + asm_optimization="no asm for your platform, please send patches" + case $host_cpu in + arm*) + cpu_arm=yes + AS_GCC_INLINE_ASSEMBLY([asm_optimization="ARM"], + [asm_optimization="disabled"]) + if test "x${asm_optimization}" = "xARM" ; then + AC_DEFINE([ARMv4_ASM], [], [Use generic ARMv4 asm optimizations]) + AS_ASM_ARM_EDSP([ARMv5E_ASM=1],[ARMv5E_ASM=0]) + if test "x${ARMv5E_ASM}" = "x1" ; then + AC_DEFINE(ARMv5E_ASM, 1, [Use ARMv5E asm optimizations]) + asm_optimization="${asm_optimization} (EDSP)" + fi + AS_ASM_ARM_MEDIA([ARMv6_ASM=1],[ARMv6_ASM=0]) + if test "x${ARMv6_ASM}" = "x1" ; then + AC_DEFINE(ARMv6_ASM, 1, [Use ARMv6 asm optimizations]) + asm_optimization="${asm_optimization} (Media)" + fi + fi + ;; + esac +else + asm_optimization="disabled" +fi + ac_enable_assertions="no" AC_ARG_ENABLE(assertions, [ --enable-assertions enable additional software error checking], [if test "$enableval" = yes; then @@ -281,6 +310,7 @@ AC_MSG_RESULT([ Floating point support: ........ ${ac_enable_float} Fast float approximations: ..... ${float_approx} Fixed point debugging: ......... ${ac_enable_fixed_debug} + Assembly optimization: ......... ${asm_optimization} Custom modes: .................. ${ac_enable_custom_modes} Assertion checking: ............ ${ac_enable_assertions} Fuzzing: ....................... ${ac_enable_fuzzing} |