Age | Commit message (Collapse) | Author |
|
|
|
Use explicit parameter type detection and manually clobber the
upper bits instead of relying on internal compiler behavior.
|
|
|
|
|
|
The pattern matching feature has been improved and is now performed
under the new --function parameter, rendering this one obsolete.
|
|
Allows to run checkasm only for functions matching a given pattern.
|
|
When compiling with asm enabled there's no point in compiling
C versions of DSP functions that have asm implementations using
instruction sets that the compiler can unconditionally use.
E.g. when compiling with -mssse3 we can remove the C version
of all functions with SSSE3 implementations.
This is accomplished using the compiler's dead code elimination
functionality.
Can be configured using the new 'trim_dsp' meson option, which
by default is enabled when compiling in release mode.
|
|
Enabling/disabling signal handlers is very slow and requires a syscall.
A better approach is to keep the signal handlers enabled all the time,
and use a simple flag variable to determine if a given signal should
be handled or passed on to the default signal handler.
|
|
GetTickCount() increases at a very low frequency, >10ms per tick.
When running multiple loops of checkasm instances in parallel
different instances regularly ends up using identical seeds.
Prefer the use of QueryPerformanceCounter() instead, which ticks at
a significantly higher rate, which in turn increases randomness.
|
|
Fixes use of uninitialized value.
|
|
|
|
Verifying that the YMM state is clean when returning from assembly
functions helps catching potential issues with AVX/SSE transitions.
|
|
fg_data->num_y_points is used in generate_grain_uv, but is only set
after the call: move the initialization above.
|
|
Insert missing space.
|
|
|
|
Makes it possible to benchmark the different code paths individually.
|
|
Alternate between buffers when benchmarking in order to more
accurately measure throughout instead of latency.
|
|
|
|
|
|
|
|
resize_8bpc_c: 542599.0
resize_8bpc_ssse3: 87635.4
resize_8bpc_avx2: 67401.1
resize_8bpc_avx512icl: 50263.6
resize_16bpc_c: 573438.9
resize_16bpc_ssse3: 121505.2
resize_16bpc_avx2: 83293.4
resize_16bpc_avx512icl: 77974.8
|
|
|
|
|
|
This can't catch out of bounds reads (which is what caused the
crash in #380), but as long as reads and writes are properly matched,
it should catch the corresponding issues.
|
|
This is necessary if the dimensions set aren't properly aligned.
|
|
|
|
|
|
|
|
|
|
|
|
Causes collisions with stdbool.h on some systems.
|
|
Merges the 3 threading parameters into a single `--threads=` argument.
Frame threading can still be controlled via the `--framedelay=` argument.
Internally, the threading model is now a global thread/task pool design.
Co-authored-by: Ronald S. Bultje <rsbultje@gmail.com>
|
|
Silences warnings when building using recent meson versions.
|
|
|
|
Improve the error message on failure to specify which registers
that have been clobbered.
|
|
|
|
Improves detection of register preservation issues etc.
|
|
|
|
|
|
This should help catch issues like the one fixed in
185194be2f4daf907c76ad8fdd763a701d3d4005, by making sure that we
call the benchmarked function at least once with the given parameters,
even if not benchmarking. Otherwise the benchmark codepath is
essentially dead untested code until somebody works on that piece
of code.
|
|
The pixel data as initialized by the test above only have proper
pixels up to whatever random 'w' it used last.
|
|
|
|
Clang 13 got support for warning about variables that are set but
not used. We disable warnings for unused parameters, but in this case,
the parameter variable is updated within the function too, which
Clang warns about.
|
|
In 16 bpc, the pixels are 16 bit integers, but valid pixels only
are up to 12 bits, and the scaling buffer only contains 4096
elements.
The src pixels are, normally, supposed to be valid pixels, but when
processing blocks of 32 pixels at a time, it can operate on
uninitialized pixels past the right edge.
Before: Cortex A53 A72 A73 Apple M1
fgy_32x32xn_16bpc_neon: 10372.5 8194.4 8612.1 24.2
After:
fgy_32x32xn_16bpc_neon: 10837.9 8469.5 8885.1 24.6
|
|
Windows Desktop
Don't call them when targeting e.g. UWP.
This requires building with a new enough SDK that does have the
winapifamily.h header (and that it's included implicitly by regular
platform headers); it's been available since the Windows 8.0 SDK
(and since mingw-w64 v3.0.0) so it should be safe.
Also rewrite the GetProcAddress call to avoid calling it if
GetModuleHandleW(L"kernel32.dll") would return NULL for some reason.
|
|
|
|
This clearly specifies how much overwrite is allowed.
|
|
This allows specifying that the actual buffers are allocated with some
alignment, allowing the implementations to overwrite the area between
the intended width and the aligned width, but not past that.
|
|
|
|
|