Age | Commit message (Collapse) | Author |
|
PiperOrigin-RevId: 406772541
|
|
PiperOrigin-RevId: 407005437
|
|
PiperOrigin-RevId: 406766575
|
|
fall back on AVX, not StandardCpp
PiperOrigin-RevId: 405900310
|
|
PiperOrigin-RevId: 404698692
|
|
PiperOrigin-RevId: 404697829
|
|
I was getting incorrect results on some environments and this turned out to be the cause.
Closes https://github.com/google/ruy/pull/276
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/276 from keichi:add-missing-volatile e2d89fe29ce36510a08c704b603f513729713faf
PiperOrigin-RevId: 396351130
|
|
This fixes the following compilation error:
```
In file included from /usr/include/c++/10/vector:67,
from /home/keichi/Projects/ruy/ruy/test_overflow_dst_zero_point.cc:32:
/usr/include/c++/10/bits/stl_vector.h: In instantiation of ‘class std::vector<const signed char>’:
/home/keichi/Projects/ruy/ruy/test_overflow_dst_zero_point.cc:75:24: required from here
/usr/include/c++/10/bits/stl_vector.h:401:66: error: static assertion failed: std::vector must have a non-const, non-volatile value_type
401 | static_assert(is_same<typename remove_cv<_Tp>::type, _Tp>::value,
| ^~~~~
```
Closes https://github.com/google/ruy/pull/278
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/278 from keichi:fix-compilation-err 2e30471e9ce525f3a62337078cf2e80f17c966ff
PiperOrigin-RevId: 395965795
|
|
It build failed in ARM 32bit, I think it's just a typo.
Closes https://github.com/google/ruy/pull/274
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/274 from metarutaiga:master f40e88396c1031289dbda5a0c98893557509542e
PiperOrigin-RevId: 380820893
|
|
PiperOrigin-RevId: 380178286
|
|
Kernels perform the addition of the destination zero_point in int16.
This addition needed to be saturating to avoid wrapping around.
Thanks to Marat Dukhan for reporting and debugging this issue.
Additionally, this commit:
- makes the new Cortex-X1 tuned kernels tested.
- adds Context::get_runtime_enabled_paths() to query the runtime
CPU detection from the public Context interface.
- updates the Bazel-to-CMake converter to support some minor
recent BUILD changes.
PiperOrigin-RevId: 379778779
|
|
PiperOrigin-RevId: 374225894
|
|
PiperOrigin-RevId: 373147434
|
|
|
|
PiperOrigin-RevId: 371435881
|
|
PiperOrigin-RevId: 370495857
|
|
AVX-512:
- broadcast without extra instruction (code size)
- use native mask ops
- re-roll mmm loop
AVX2: avoid slow permute, especially for AMD
PiperOrigin-RevId: 369907385
|
|
PiperOrigin-RevId: 369496892
|
|
to every task execution of every thread.
PiperOrigin-RevId: 366919663
|
|
application.
PiperOrigin-RevId: 361953871
|
|
PiperOrigin-RevId: 361951187
|
|
overflow.
PiperOrigin-RevId: 360298662
|
|
Alter sequence to a single rounded scaling with normal rounded shift.
Double rounding and symmetric rounding are removed compared to
reference. Double rounding seems unnecessary and can complicate
implementations. Moreover, symmetric rounding also adds implementation
complexity.
For NEON the new sequence can be translated to VQDMULH + VRSHR.
Closes https://github.com/google/ruy/pull/227
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/227 from GeorgeARM:mul_pr dec00bd87a8815fdad79d302494430aa63522752
PiperOrigin-RevId: 356539687
|
|
Closes https://github.com/google/ruy/pull/251
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/251 from bjacob:relax c8d2cf94d15abd4a9fd4222619c42413952f0fb1
PiperOrigin-RevId: 356340585
|
|
|
|
Closes https://github.com/google/ruy/pull/249
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/249 from bjacob:tests-disabled-when-submodule 3a33bb081acadca3520edeae2c226827e9fe0f89
PiperOrigin-RevId: 353298619
|
|
Multiplying by 0 by default is unfriendly to people getting familiar
with ruy having to debug why their output values are all 0.
With a default of 1, tiny toy examples might output sane values,
anything beyond that will saturate, and seeing all saturated values will
be a hint that something needs to be set to rescale values.
Closes https://github.com/google/ruy/pull/248
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/248 from bjacob:multiplier-default 3fb1152e899fffc1f9fa9103b533348599ca494f
PiperOrigin-RevId: 353077204
|
|
|
|
|
|
- Following XNNPACK's example, in CMakeLists.txt, skip including our own
third_party/ directories if the target is already defined. This means that
IREE embedding ruy as a third_party/ dep does not need to have its
submodules checked out, ruy can use IREE's own cpuinfo and googletest.
- Switch open-source builds to using the stripped-include-paths flavor
of cpuinfo (like IREE is already using).
PiperOrigin-RevId: 352871140
|
|
- Switch to same colors as in ruy html traces
- Move `:thread_pool` to its own yellow color for consistency with ruy traces
- Drop `:validate`
- Drop the legend, will be redundant in the context of markdown docs showing these different materials in the same context.
preview: https://github.com/google/ruy/blob/84dd41f433b3befad6c711248a5d0f00fd8b2711/doc/depgraph.svg
Closes https://github.com/google/ruy/pull/241
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/241 from bjacob:depgraph-update 8f2fa1d9a178c62b80fcc940c9d6ca5cf8ce3c41
PiperOrigin-RevId: 352858626
|
|
Reverts google/ruy#243
Closes https://github.com/google/ruy/pull/244
PiperOrigin-RevId: 352711630
|
|
(Sorry I merged this the wrong way the first time)
PiperOrigin-RevId: 352705468
|
|
This isn't a performance tracing framework (unlike the old ruy tracing).
This is about understanding what happens inside a ruy::Mul with a view
toward documenting how ruy works.
Added a 'parametrized_example' to help play with this tracing on any
flavor of ruy::Mul call. This also serves as a more elaborate example
of how to call ruy::Mul, and as a single binary instantiating several
different instantiations of the ruy::Mul template, which is useful
for measuring binary size and showing a breakdown of ruy symbols in
a document.
A few code changes beyond tracing slipped in:
- Improved logic in determining the traversal order in MakeBlockMap:
In rectangular cases, since we first do the top-level rectangularness
subdivision with linear traversal anyway, the traversal order only
applies within each subdivision past that, so it should be based
on sizes already divided by rectangularness. In practice this nudges
1000x400x2000 from kFractalHilbert to kFractalU on Pixel4, without
making an observable perf difference in that case.
- Removed the old RUY_BLOCK_MAP_DEBUG logging code: superseded.
Kept only a minimal hook to force a block_size_log2 choice.
- Wrote new comments on BlockMap internals.
- Fixed Ctx::set_runtime_enabled_paths to behave as documented:
passing Path::kNone reverts to the default behavior (auto detect).
- Exposed Context::set_runtime_enabled_paths.
- Renamed UseSimpleLoop -> GetUseSimpleLoop (easier to read trace).
PiperOrigin-RevId: 352695092
|
|
This reverts commit b87d6d2e65ca24ba38e9afbf1e9d0744dbda82d3.
|
|
* Add CMake support with a converter from Bazel, update by running:
cmake/bazel_to_cmake.sh
This supports building and running tests also on Android, e.g.
```
cmake ../ruy -G Ninja \
-DCMAKE_TOOLCHAIN_FILE=~/android-ndk-r21d/build/cmake/android.toolchain.cmake \
-DANDROID_ABI=arm64-v8a \
-DANDROID_PLATFORM=android-29
cmake --build . -j12
ctest . -j12
```
Some parts of this were forked from IREE's cmake setup.
|
|
|
|
|
|
|
|
* Add git submodules: googletest and cpuinfo
* Let the Bazel WORKSPACE point to the git submodules.
|
|
PiperOrigin-RevId: 351931688
|
|
PiperOrigin-RevId: 351929118
|
|
PiperOrigin-RevId: 351657429
|
|
belong.
Also remove a useless #include in context.h.
PiperOrigin-RevId: 350645020
|
|
channel', as there is no multiplier here.
PiperOrigin-RevId: 348522764
|
|
between Eigen commits
011e0db31d1bed8b7f73662be6d57d9f30fa457a and bec72345d69917f475e577d23df0ca4ed967a4f0.
PiperOrigin-RevId: 348522159
|
|
PiperOrigin-RevId: 348517342
|
|
int32 accumulators.
PiperOrigin-RevId: 348511323
|
|
PiperOrigin-RevId: 342509771
|
|
PiperOrigin-RevId: 340457081
|