Age | Commit message (Collapse) | Author |
|
When compiling with asm enabled there's no point in compiling
C versions of DSP functions that have asm implementations using
instruction sets that the compiler can unconditionally use.
E.g. when compiling with -mssse3 we can remove the C version
of all functions with SSSE3 implementations.
This is accomplished using the compiler's dead code elimination
functionality.
Can be configured using the new 'trim_dsp' meson option, which
by default is enabled when compiling in release mode.
|
|
|
|
Only the primary strength can ever be large enough to result in
a negative shift value that requires clipping to zero.
|
|
Avoids some pointer chasing and simplifies the DSP code, at the cost
of making the initialization a little bit more complicated.
Also reduces memory usage by a small amount due to properly sizing
the buffers instead of always allocating enough space for 4:4:4.
|
|
The main feature is splitting the main filter code into three
different code paths depending on the strength values.
Clipping is only required when both the primary and secondary
strengths are non-zero, which is an uncommon case. Being able to
skip that complexity in the common cases is significantly faster.
|
|
When compiling in release mode, instead of just deleting assertions,
use them to give hints to the compiler. This allows for slightly
better code generation in some cases.
|
|
clang-8:
cdef_filter_4x4_8bpc_c: 436.6
cdef_filter_4x4_8bpc_vsx: 101.1
cdef_filter_4x8_8bpc_c: 827.7
cdef_filter_4x8_8bpc_vsx: 183.5
cdef_filter_8x8_8bpc_c: 1510.2
cdef_filter_8x8_8bpc_vsx: 289.1
gcc-9:
cdef_filter_4x4_8bpc_c: 403.2
cdef_filter_4x4_8bpc_vsx: 105.6
cdef_filter_4x8_8bpc_c: 825.5
cdef_filter_4x8_8bpc_vsx: 192.2
cdef_filter_8x8_8bpc_c: 1586.3
cdef_filter_8x8_8bpc_vsx: 295.0
|
|
|
|
Removed arrays
|
|
Speedup vs C code: Cortex A53 A72 A73
cdef_filter_4x4_8bpc_neon: 4.62 4.48 4.76
cdef_filter_4x8_8bpc_neon: 4.82 4.80 5.08
cdef_filter_8x8_8bpc_neon: 5.29 5.33 5.79
|
|
This reduces potential ambiguity across the codebase, as the enum
is CDEF specific, and there is another similar enum for loop restoration.
Alternatively an independent enum could be used for both CDEF and loop
restoration.
|
|
|
|
Also reduce scope of tables to inside the function where they are used.
|
|
|
|
|
|
cdef_dir_8bpc_c: 629.3
cdef_dir_8bpc_avx2: 82.4
First 1000 frames of Chimera 1080p:
before: 0m23.084s
after: 0m21.860s
|
|
|
|
Fix #96
|