Age | Commit message (Collapse) | Author |
|
PiperOrigin-RevId: 474367386
|
|
It was accidentally forgotten in the change that introduced support for install rules. This module defines some of the variables that are used by the install rules.
PiperOrigin-RevId: 464915500
|
|
accidentally reverted by copybara export in
https://github.com/google/ruy/commit/fd42b25c4512913fd47a86826aecec7c9c3ee2b4
PiperOrigin-RevId: 460471200
|
|
Some crash reports on Android tell of segfaults at address
(2^32 - 8) at the line below evaluating is_local,
which I guess would happen if processor_count==0.
PiperOrigin-RevId: 460454326
|
|
cpuinfo CMake build now defines the cpuinfo::cpuinfo alias so we don't
need to define it in ruy.
|
|
toolchains with -mcpu=cortex-a32.
PiperOrigin-RevId: 457055609
|
|
This allows projects that depend on ruy to use namespace qualified
target names regardless of whether they consume ruy through
add_subdirectory or find_package.
Closes https://github.com/google/ruy/pull/311
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/311 from petrhosek:cmake-alias c00fae2a56a567b216dea0b0fe7378c28ddadbbf
PiperOrigin-RevId: 451184466
|
|
This allows using ruy from other projects through find_package.
Closes https://github.com/google/ruy/pull/309
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/309 from petrhosek:cmake-install b6168af3fe06ec794f5b5253b5cc626ffac21916
PiperOrigin-RevId: 450959295
|
|
This was generated using cmake/bazel_to_cmake.sh.
Closes https://github.com/google/ruy/pull/310
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/310 from petrhosek:cmake-update ae8e526af2c2b5e915f542c008ac724929f848a6
PiperOrigin-RevId: 449763106
|
|
This pulls in the recent CMake build changes, specifically the support
for find_package, that we would like to utilize in ruy CMake build in
the future.
|
|
On the clarity side, the thread main loop is now just:
while (GetNewStateOtherThanReady() == State::HasWork) {
RevertToReadyState();
}
On the efficiency side:
* Locking and atomic ops have been reduced, we used to lock state_mutex_
around the entire thread task execution,now we are only locking it
anymore around notify/wait on the state_cond_ condition_variable, so
this mutex is renamed state_cond_mutex_, which clarifies its purpose.
* We used to perform a redundant reload-acquire of the new state_ in
the main thread loop.
* Some accesses are demoted to relaxed because they are already ordered
by other release-acquire relationships.
* A notify_all becomes notify_one.
* Send all thread exit requests upfront so threads can exit in parallel.
A comment is added on Thread::task_ to explain the release-acquire
relationships making this all work.
Internal code is broken into functions that are only ever called
from the main thread, and functions that are only ever called from the
worker thread. That specialization
made further simplifications and performance gains obvious.
It was found by continuous integration that some TFLite users construct and destroy the context from two different threads, due to the use of reference-counting. That means that the notion of "main thread" is not that solid. Accordingly, instances of "main thread" in comments and identifiers have been rephrased as "outside thread" as opposed to worker thread.
Tested with TSan (also enabled on presubmits) so fairly confident that this
is correct.
PiperOrigin-RevId: 442697771
|
|
PiperOrigin-RevId: 440105621
|
|
PiperOrigin-RevId: 437140449
|
|
PiperOrigin-RevId: 436879056
|
|
It was complaining that the `TrMulTask* tasks` pointer was temporarily pointing to garbage as we first set it to point to the allocated buffer, then perform the actual TrMulTask object constructions using placement-new. Rewritten so that the pointer is kept a `char*` pointer during this allocation and placement-new business, and only assigned to a TrMulTasks* pointer at the end.
Note: this is *not* a strict-aliasing issue. The CFI diagnostic here is unaffected by `-f[no-]strict-aliasing` and the reason why this isn't a strict-aliasing violation is that the C++ spec makes an exception for `char` and a couple of other byte types, allowing to make byte buffers alias objects of other types. So CFI is going beyond the C++ spec here --- this isn't an undefined-behavior report. This is apparently trying to enforce that if pointers are set at all then they must be set to point to a valid object of their element type. At least for types that have a vtable.
PiperOrigin-RevId: 423326986
|
|
PiperOrigin-RevId: 415133840
|
|
PiperOrigin-RevId: 414576763
|
|
PiperOrigin-RevId: 410946378
|
|
PiperOrigin-RevId: 407507985
|
|
PiperOrigin-RevId: 407005437
|
|
PiperOrigin-RevId: 406766575
|
|
fall back on AVX, not StandardCpp
PiperOrigin-RevId: 405900310
|
|
PiperOrigin-RevId: 404698692
|
|
PiperOrigin-RevId: 404697829
|
|
I was getting incorrect results on some environments and this turned out to be the cause.
Closes https://github.com/google/ruy/pull/276
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/276 from keichi:add-missing-volatile e2d89fe29ce36510a08c704b603f513729713faf
PiperOrigin-RevId: 396351130
|
|
This fixes the following compilation error:
```
In file included from /usr/include/c++/10/vector:67,
from /home/keichi/Projects/ruy/ruy/test_overflow_dst_zero_point.cc:32:
/usr/include/c++/10/bits/stl_vector.h: In instantiation of ‘class std::vector<const signed char>’:
/home/keichi/Projects/ruy/ruy/test_overflow_dst_zero_point.cc:75:24: required from here
/usr/include/c++/10/bits/stl_vector.h:401:66: error: static assertion failed: std::vector must have a non-const, non-volatile value_type
401 | static_assert(is_same<typename remove_cv<_Tp>::type, _Tp>::value,
| ^~~~~
```
Closes https://github.com/google/ruy/pull/278
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/278 from keichi:fix-compilation-err 2e30471e9ce525f3a62337078cf2e80f17c966ff
PiperOrigin-RevId: 395965795
|
|
It build failed in ARM 32bit, I think it's just a typo.
Closes https://github.com/google/ruy/pull/274
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/274 from metarutaiga:master f40e88396c1031289dbda5a0c98893557509542e
PiperOrigin-RevId: 380820893
|
|
PiperOrigin-RevId: 380178286
|
|
Kernels perform the addition of the destination zero_point in int16.
This addition needed to be saturating to avoid wrapping around.
Thanks to Marat Dukhan for reporting and debugging this issue.
Additionally, this commit:
- makes the new Cortex-X1 tuned kernels tested.
- adds Context::get_runtime_enabled_paths() to query the runtime
CPU detection from the public Context interface.
- updates the Bazel-to-CMake converter to support some minor
recent BUILD changes.
PiperOrigin-RevId: 379778779
|
|
PiperOrigin-RevId: 374225894
|
|
PiperOrigin-RevId: 373147434
|
|
|
|
PiperOrigin-RevId: 371435881
|
|
PiperOrigin-RevId: 370495857
|
|
AVX-512:
- broadcast without extra instruction (code size)
- use native mask ops
- re-roll mmm loop
AVX2: avoid slow permute, especially for AMD
PiperOrigin-RevId: 369907385
|
|
PiperOrigin-RevId: 369496892
|
|
to every task execution of every thread.
PiperOrigin-RevId: 366919663
|
|
application.
PiperOrigin-RevId: 361953871
|
|
PiperOrigin-RevId: 361951187
|
|
overflow.
PiperOrigin-RevId: 360298662
|
|
Alter sequence to a single rounded scaling with normal rounded shift.
Double rounding and symmetric rounding are removed compared to
reference. Double rounding seems unnecessary and can complicate
implementations. Moreover, symmetric rounding also adds implementation
complexity.
For NEON the new sequence can be translated to VQDMULH + VRSHR.
Closes https://github.com/google/ruy/pull/227
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/227 from GeorgeARM:mul_pr dec00bd87a8815fdad79d302494430aa63522752
PiperOrigin-RevId: 356539687
|
|
Closes https://github.com/google/ruy/pull/251
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/251 from bjacob:relax c8d2cf94d15abd4a9fd4222619c42413952f0fb1
PiperOrigin-RevId: 356340585
|
|
|
|
Closes https://github.com/google/ruy/pull/249
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/249 from bjacob:tests-disabled-when-submodule 3a33bb081acadca3520edeae2c226827e9fe0f89
PiperOrigin-RevId: 353298619
|
|
Multiplying by 0 by default is unfriendly to people getting familiar
with ruy having to debug why their output values are all 0.
With a default of 1, tiny toy examples might output sane values,
anything beyond that will saturate, and seeing all saturated values will
be a hint that something needs to be set to rescale values.
Closes https://github.com/google/ruy/pull/248
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/248 from bjacob:multiplier-default 3fb1152e899fffc1f9fa9103b533348599ca494f
PiperOrigin-RevId: 353077204
|
|
|
|
|
|
- Following XNNPACK's example, in CMakeLists.txt, skip including our own
third_party/ directories if the target is already defined. This means that
IREE embedding ruy as a third_party/ dep does not need to have its
submodules checked out, ruy can use IREE's own cpuinfo and googletest.
- Switch open-source builds to using the stripped-include-paths flavor
of cpuinfo (like IREE is already using).
PiperOrigin-RevId: 352871140
|
|
- Switch to same colors as in ruy html traces
- Move `:thread_pool` to its own yellow color for consistency with ruy traces
- Drop `:validate`
- Drop the legend, will be redundant in the context of markdown docs showing these different materials in the same context.
preview: https://github.com/google/ruy/blob/84dd41f433b3befad6c711248a5d0f00fd8b2711/doc/depgraph.svg
Closes https://github.com/google/ruy/pull/241
COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/241 from bjacob:depgraph-update 8f2fa1d9a178c62b80fcc940c9d6ca5cf8ce3c41
PiperOrigin-RevId: 352858626
|
|
Reverts google/ruy#243
Closes https://github.com/google/ruy/pull/244
PiperOrigin-RevId: 352711630
|