Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/FFmpeg/FFmpeg.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorGanesh Ajjanagadde <gajjanagadde@gmail.com>2016-01-14 01:59:26 +0300
committerGanesh Ajjanagadde <gajjanagadde@gmail.com>2016-01-16 00:46:13 +0300
commit5989add4ab4e8e4daa406a66319b0a3b3faaa73d (patch)
treefa59b1ead68044bf0a10a9a75b3f112ee36b0a36 /libavutil/x86/lls_init.c
parentd4ce63a1bf2520be7015df78dd8b042abe456c23 (diff)
lavu/x86/lls: add fma3 optimizations for update_lls
This improves accuracy (very slightly) and speed for processors having fma3. Sample benchmark (fate flac-16-lpc-cholesky, Haswell): old: 5993610 decicycles in ff_lpc_calc_coefs, 64 runs, 0 skips 5951528 decicycles in ff_lpc_calc_coefs, 128 runs, 0 skips new: 5252410 decicycles in ff_lpc_calc_coefs, 64 runs, 0 skips 5232869 decicycles in ff_lpc_calc_coefs, 128 runs, 0 skips Tested with FATE and --disable-fma3, also examined contents of lavu/lls-test. Reviewed-by: James Almer <jamrial@gmail.com> Reviewed-by: Henrik Gramner <henrik@gramner.com> Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
Diffstat (limited to 'libavutil/x86/lls_init.c')
-rw-r--r--libavutil/x86/lls_init.c4
1 files changed, 4 insertions, 0 deletions
diff --git a/libavutil/x86/lls_init.c b/libavutil/x86/lls_init.c
index 81f141cbeb..9f0d862b0e 100644
--- a/libavutil/x86/lls_init.c
+++ b/libavutil/x86/lls_init.c
@@ -25,6 +25,7 @@
void ff_update_lls_sse2(LLSModel *m, const double *var);
void ff_update_lls_avx(LLSModel *m, const double *var);
+void ff_update_lls_fma3(LLSModel *m, const double *var);
double ff_evaluate_lls_sse2(LLSModel *m, const double *var, int order);
av_cold void ff_init_lls_x86(LLSModel *m)
@@ -38,4 +39,7 @@ av_cold void ff_init_lls_x86(LLSModel *m)
if (EXTERNAL_AVX_FAST(cpu_flags)) {
m->update_lls = ff_update_lls_avx;
}
+ if (EXTERNAL_FMA3(cpu_flags) && !(cpu_flags & AV_CPU_FLAG_AVXSLOW)) {
+ m->update_lls = ff_update_lls_fma3;
+ }
}