Age | Commit message (Collapse) | Author |
|
|
|
Call celt_inner_prod_neon() and remove redundant code.
Change-Id: I980e94330ae75c10297b9035fac221515aee144f
Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
|
|
The floating-point optimizations are not bit exact with C functions,
because of the different orders of floating-point operations.
But they are bit exact with the simulation C functions which simulate
the floating operations in the optimizations.
Change-Id: I149fda5b602fd5712b16fc8983df3c6c0c9e76ad
Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
|
|
This optimization is bit exact with C functions.
Change-Id: Ia9ce6dd3c20d2f56dbd43ddc02d1a6fd6554608d
Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
|
|
Should call celt_inner_prod().
This requires the API change of celt_pitch_xcorr() by passing in
"arch".
We tested on x86 and arm, and got bit exact results as original.
Change-Id: I606915da6a196f327ce81f4a5ae32811f4c1fabb
Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
|
|
Should call celt_inner_prod().
This change is bit exact as original, except for x86 floating-point.
In x86 floating-point, it calls celt_inner_prod_sse() which may have
different output with the change of floating-point operations' orders.
Change-Id: Ia2381e2e198a84296ac28305183f15be842b3454
Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
|
|
|
|
|
|
We boost bands that either cause leakage or are filled with leakage
|
|
It seems like letting CBR use up to 2/3 of the bit is still a win
|
|
We now include the object files for those rather than attempt to
problems.
|
|
|
|
The code would have run fine on 32-bit archs, but would have overflowed
on a 16-bit arch
|
|
Some informal tests seem to confirm that reducing the trim at 32-64 kbps
improves quality (better HF). It's not clear whether it's also the case
at 96 kb/s and above, so we're leaving it as is for those rates.
This corresponds to buildC in this thread:
https://hydrogenaud.io/index.php/topic,113985.0.html
Also see:
https://hydrogenaud.io/index.php/topic,111798.0.html
|
|
- celt/modes.c:430:14: warning: cast from 'const unsigned char *' to
'opus_int16 *' increases required alignment from 1 to 2 [-Wcast-align]
- 'C[0][1]' may be used uninitialized [-Wmaybe-uninitialized]
- Unused variable/parameter
- Value stored is never read
- MSVC warnings about "possible loss of data" due to type conversions
- MSVC warning C4146: unary minus operator applied to unsigned type
- silk/NLSF_del_dec_quant.c:137:20: warning: array subscript is above
array bounds [-Warray-bounds] (gcc -O3 false positive)
- src/mlp_train.h:39:20: warning: function declaration isn't a prototype
[-Wstrict-prototypes]
- Remove SMALL_FOOTPRINT code from SSE 4.1 FIR implementation, matching
the C implementation.
The clang -Wcast-align warnings with SSE intrinsics are a known
clang issue: https://llvm.org/bugs/show_bug.cgi?id=20670
|
|
|
|
Wasn't worth it given the small code size of the alternative, which also
got refactored a little (still bit exact).
|
|
Cannot prove it's the correct value, but it's better than the previous
values, which sometimes segfaults. The increase was made necessary due
to 120 ms frame size support.
|
|
libopus only uses the DSP module of Ne10, and never uses the init functions.
Signed-off-by: Michael Bradshaw <mjbshaw@google.com>
Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
|
|
Signed-off-by: Michael Bradshaw <mjbshaw@google.com>
Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
|
|
Broken by 76e831d. Without the .type directive, SIGILL may be produced
if the C code is compiled in Thumb mode, because the compiler may assume
that the asm symbol is also Thumb and call it using a BL instruction.
|
|
Casting to unsigned to avoid shifting negative values left.
|
|
The "mem" in celt_fir_c() either is contained in the head of input "x"
in reverse order already, or can be easily attached to the head of "x"
before calling the function. Removing argument "mem" can eliminate the
redundant buffer copies inside.
Update celt_fir_sse4_1() accordingly.
|
|
this has the side-effect of removing some C++ style comments
Signed-off-by: Jean-Marc Valin <jmvalin@jmvalin.ca>
|
|
This could happen when we had more than 32 bits on the first hybrid band with
a transient just in the middle of the frame. The band would be split and the
first half of the frame could end up with non-zero energy, but not enough
bits for a pulse. Because it's the first band, no folding would be possible.
This would cause noise to be injected for the entire duration of the first
half and that noise should then get folded to higher bands.
|
|
|
|
The change also makes the analysis run for sampling rates of 16 kHz and 24 kHz
since the features are only computed on the 0-8 kHz band. The longer time
window (20 ms instead of 10 ms) makes the tonality estimator more reliable
for low-pitch harmonics.
|
|
That experiment never actually worked
|
|
We now try not to fold below band 17 since that produces a lot of harshness.
This mostly helps around 32-40 kb/s.
|
|
|
|
These rely on TF rather than short windows to avoid partial collapse.
|
|
The transient detector would trigger on low-pitch vowels, but we didn't have
enough bits to properly code the high bands as a transient, resulting in
partial collapse and unstable energy.
|
|
|
|
|
|
This makes it possible to use folding rather than LCG noise in the second CELT
band (9.6 to 12 kHz) in hybrid mode.
|
|
|
|
|
|
This makes the decoder ~2.5% faster on x86 because the stereo loop
takes the same processing time as one mono loop due to the dependency chain
|
|
|
|
Reordering the add with VERY_SMALL changes the dependencies cycle from 2 add + 1 mul
(11 cycles on haswell) to 1 add + 1 mul (8 cycles). This makes the entire decoder about
1.5% faster.
|
|
|
|
|
|
We used the SSE reciprocal square root instruction to vectorize the serch rather
than compare one at a time with multiplies. Speeds up the entire encoder by 8-10%.
|
|
No measurable speed change.
|
|
Speeds up encoding by another ~1-2%
|
|
Speeds up CELT encoding by around 5% on x86
|
|
|
|
|
|
|
|
|