Age | Commit message (Collapse) | Author |
|
Rename the called dsp init functions to *_init_x86.
|
|
The length is even, so some unrolling can be performed. Timings are for x86:
- 32bits: 102c -> 82c
- 64bits: 82c -> 69c
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
The 32bits targets have been compiled with -mfpmath=sse for proper reference.
sbr_sum_square C /32bits: 82c (unrolled)/102c
C /64bits: 69c (unrolled)/82c
SSE/32bits: 42c
SSE/64bits: 31c
Use of SSE4.1 dpps to perform the final sum is slower.
Not unrolling to perform 8 operations in a loop yields 10 more cycles.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
Overall speedup of HE-AAC decoding 2.3x on Cortex-A8, 1.2x on A9.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|
|
This prepares for assembly optimisations by moving the most
time-consuming loops to functions called through pointers
in a new context.
Signed-off-by: Mans Rullgard <mans@mansr.com>
|