Welcome to mirror list, hosted at ThFree Co, Russian Federation.

github.com/google/ruy.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-09-14Create API to determine how many threads to useHEADmasterRuy Contributors
PiperOrigin-RevId: 474367386
2022-08-03Include GNUInstallDirs module in top-level CMake fileRuy Contributors
It was accidentally forgotten in the change that introduced support for install rules. This module defines some of the variables that are used by the install rules. PiperOrigin-RevId: 464915500
2022-07-12Redo CMakeLists change from https://github.com/google/ruy/pull/313Benoit Jacob
accidentally reverted by copybara export in https://github.com/google/ruy/commit/fd42b25c4512913fd47a86826aecec7c9c3ee2b4 PiperOrigin-RevId: 460471200
2022-07-12Skip caches that have processor_count==0.Benoit Jacob
Some crash reports on Android tell of segfaults at address (2^32 - 8) at the line below evaluating is_local, which I guess would happen if processor_count==0. PiperOrigin-RevId: 460454326
2022-06-28Update cpuinfo (#313)Petr Hosek
cpuinfo CMake build now defines the cpuinfo::cpuinfo alias so we don't need to define it in ruy.
2022-06-24Fix assembler deprecated instruction warnings (as errors) on some Aarch32 ↵Benoit Jacob
toolchains with -mcpu=cortex-a32. PiperOrigin-RevId: 457055609
2022-05-26Define namespace prefixed aliases for targets in the CMake buildPetr Hosek
This allows projects that depend on ruy to use namespace qualified target names regardless of whether they consume ruy through add_subdirectory or find_package. Closes https://github.com/google/ruy/pull/311 COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/311 from petrhosek:cmake-alias c00fae2a56a567b216dea0b0fe7378c28ddadbbf PiperOrigin-RevId: 451184466
2022-05-25Support install rules in the CMake buildPetr Hosek
This allows using ruy from other projects through find_package. Closes https://github.com/google/ruy/pull/309 COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/309 from petrhosek:cmake-install b6168af3fe06ec794f5b5253b5cc626ffac21916 PiperOrigin-RevId: 450959295
2022-05-19Update CMake buildPetr Hosek
This was generated using cmake/bazel_to_cmake.sh. Closes https://github.com/google/ruy/pull/310 COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/310 from petrhosek:cmake-update ae8e526af2c2b5e915f542c008ac724929f848a6 PiperOrigin-RevId: 449763106
2022-05-18Update cpuinfo (#308)Petr Hosek
This pulls in the recent CMake build changes, specifically the support for find_package, that we would like to utilize in ruy CMake build in the future.
2022-04-19Refactor Thread internals for clarity and efficiency.Benoit Jacob
On the clarity side, the thread main loop is now just: while (GetNewStateOtherThanReady() == State::HasWork) { RevertToReadyState(); } On the efficiency side: * Locking and atomic ops have been reduced, we used to lock state_mutex_ around the entire thread task execution,now we are only locking it anymore around notify/wait on the state_cond_ condition_variable, so this mutex is renamed state_cond_mutex_, which clarifies its purpose. * We used to perform a redundant reload-acquire of the new state_ in the main thread loop. * Some accesses are demoted to relaxed because they are already ordered by other release-acquire relationships. * A notify_all becomes notify_one. * Send all thread exit requests upfront so threads can exit in parallel. A comment is added on Thread::task_ to explain the release-acquire relationships making this all work. Internal code is broken into functions that are only ever called from the main thread, and functions that are only ever called from the worker thread. That specialization made further simplifications and performance gains obvious. It was found by continuous integration that some TFLite users construct and destroy the context from two different threads, due to the use of reference-counting. That means that the notion of "main thread" is not that solid. Accordingly, instances of "main thread" in comments and identifiers have been rephrased as "outside thread" as opposed to worker thread. Tested with TSan (also enabled on presubmits) so fairly confident that this is correct. PiperOrigin-RevId: 442697771
2022-04-07Simplification of ThreadPool code - merge asserts into main logicBenoit Jacob
PiperOrigin-RevId: 440105621
2022-03-25Fix an integer overflow, and take some extra defensive steps.Benoit Jacob
PiperOrigin-RevId: 437140449
2022-03-24Update GetTentativeThreadCount to use int64 typesKarim Nosir
PiperOrigin-RevId: 436879056
2022-01-21Accomodate Clang's CFI sanitizerBenoit Jacob
It was complaining that the `TrMulTask* tasks` pointer was temporarily pointing to garbage as we first set it to point to the allocated buffer, then perform the actual TrMulTask object constructions using placement-new. Rewritten so that the pointer is kept a `char*` pointer during this allocation and placement-new business, and only assigned to a TrMulTasks* pointer at the end. Note: this is *not* a strict-aliasing issue. The CFI diagnostic here is unaffected by `-f[no-]strict-aliasing` and the reason why this isn't a strict-aliasing violation is that the C++ spec makes an exception for `char` and a couple of other byte types, allowing to make byte buffers alias objects of other types. So CFI is going beyond the C++ spec here --- this isn't an undefined-behavior report. This is apparently trying to enforce that if pointers are set at all then they must be set to point to a valid object of their element type. At least for types that have a vtable. PiperOrigin-RevId: 423326986
2021-12-09Ruy:Fix 16bit-packing msan error.Dayeong Lee
PiperOrigin-RevId: 415133840
2021-12-07Ruy:Add new packing for 16bit ColMajor for Avx512.Dayeong Lee
PiperOrigin-RevId: 414576763
2021-11-19Modify use of Eigen::array to use syntax compatible with std::array in c++17.Ruy Contributors
PiperOrigin-RevId: 410946378
2021-11-04Ruy: Support 8x16 avx512/avx2_fma kernel for single_column.Dayeong Lee
PiperOrigin-RevId: 407507985
2021-11-02Ruy: Support 8x16 avx512 kernelDayeong Lee
PiperOrigin-RevId: 407005437
2021-11-01Ruy: Support 8x16 avx2_fma kernelDayeong Lee
PiperOrigin-RevId: 406766575
2021-10-27fix inheritance of kernels on x86. When an AVX2 kernel is not available, ↵Benoit Jacob
fall back on AVX, not StandardCpp PiperOrigin-RevId: 405900310
2021-10-21test i8xi16 casesBenoit Jacob
PiperOrigin-RevId: 404698692
2021-10-21Disable the internal test-only variants of the StandardCpp path in benchmarksBenoit Jacob
PiperOrigin-RevId: 404697829
2021-09-13Add missing volatile qualifier in Pack8bitRowMajorForNeonDotprodKeichi Takahashi
I was getting incorrect results on some environments and this turned out to be the cause. Closes https://github.com/google/ruy/pull/276 COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/276 from keichi:add-missing-volatile e2d89fe29ce36510a08c704b603f513729713faf PiperOrigin-RevId: 396351130
2021-09-10Fix error when compiling ruy_test_overflow_dst_zero_point with GCCKeichi Takahashi
This fixes the following compilation error: ``` In file included from /usr/include/c++/10/vector:67, from /home/keichi/Projects/ruy/ruy/test_overflow_dst_zero_point.cc:32: /usr/include/c++/10/bits/stl_vector.h: In instantiation of ‘class std::vector<const signed char>’: /home/keichi/Projects/ruy/ruy/test_overflow_dst_zero_point.cc:75:24: required from here /usr/include/c++/10/bits/stl_vector.h:401:66: error: static assertion failed: std::vector must have a non-const, non-volatile value_type 401 | static_assert(is_same<typename remove_cv<_Tp>::type, _Tp>::value, | ^~~~~ ``` Closes https://github.com/google/ruy/pull/278 COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/278 from keichi:fix-compilation-err 2e30471e9ce525f3a62337078cf2e80f17c966ff PiperOrigin-RevId: 395965795
2021-06-22Fix typo in Windows on ARM 32bitmetarutaiga
It build failed in ARM 32bit, I think it's just a typo. Closes https://github.com/google/ruy/pull/274 COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/274 from metarutaiga:master f40e88396c1031289dbda5a0c98893557509542e PiperOrigin-RevId: 380820893
2021-06-18Fix the bazel build by dropping a xtensa-specific select entry.Benoit Jacob
PiperOrigin-RevId: 380178286
2021-06-16Fix integer overflow causing incorrect results.Benoit Jacob
Kernels perform the addition of the destination zero_point in int16. This addition needed to be saturating to avoid wrapping around. Thanks to Marat Dukhan for reporting and debugging this issue. Additionally, this commit: - makes the new Cortex-X1 tuned kernels tested. - adds Context::get_runtime_enabled_paths() to query the runtime CPU detection from the public Context interface. - updates the Bazel-to-CMake converter to support some minor recent BUILD changes. PiperOrigin-RevId: 379778779
2021-05-17remove pthread requirement for cc_target_os:xtensaRuy Contributors
PiperOrigin-RevId: 374225894
2021-05-11Fork Neon Float kernel for X1T.J. Alumbaugh
PiperOrigin-RevId: 373147434
2021-05-06IWYU: include limits for std::numeric_limits (#253)stha09
2021-05-01Remove runtime assertion on size of shift in reference codeT.J. Alumbaugh
PiperOrigin-RevId: 371435881
2021-04-26Remove non-ASCII character in commentT.J. Alumbaugh
PiperOrigin-RevId: 370495857
2021-04-221.02x speedup of Ruy AVX2 f32 and AVX-512 f32/i8Ruy Contributors
AVX-512: - broadcast without extra instruction (code size) - use native mask ops - re-roll mmm loop AVX2: avoid slow permute, especially for AMD PiperOrigin-RevId: 369907385
2021-04-20Fork 8bit Neon Dotprod kernel for X1 and support resolving to X1 coreT.J. Alumbaugh
PiperOrigin-RevId: 369496892
2021-04-06Create a utility library to suppress floating-point denormals, and apply it ↵Chao Mei
to every task execution of every thread. PiperOrigin-RevId: 366919663
2021-03-10Simplify some code and add release assertions to help debug a crash in an ↵Benoit Jacob
application. PiperOrigin-RevId: 361953871
2021-03-10rollback hopefully fixing some application crashBenoit Jacob
PiperOrigin-RevId: 361951187
2021-03-02Use std::ptrdiff_t instead of int when calculating memory size to avoid int ↵Chao Mei
overflow. PiperOrigin-RevId: 360298662
2021-02-09Simplify quantized multiplierGeorgios Pinitas
Alter sequence to a single rounded scaling with normal rounded shift. Double rounding and symmetric rounding are removed compared to reference. Double rounding seems unnecessary and can complicate implementations. Moreover, symmetric rounding also adds implementation complexity. For NEON the new sequence can be translated to VQDMULH + VRSHR. Closes https://github.com/google/ruy/pull/227 COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/227 from GeorgeARM:mul_pr dec00bd87a8815fdad79d302494430aa63522752 PiperOrigin-RevId: 356539687
2021-02-09Update test tolerance ahead of merging PR #227bjacob
Closes https://github.com/google/ruy/pull/251 COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/251 from bjacob:relax c8d2cf94d15abd4a9fd4222619c42413952f0fb1 PiperOrigin-RevId: 356340585
2021-01-23Allow late definitions of cpuinfo but only when ruy is a subdir. (#250)bjacob
2021-01-22Disable tests by default when ruy is a subproject.bjacob
Closes https://github.com/google/ruy/pull/249 COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/249 from bjacob:tests-disabled-when-submodule 3a33bb081acadca3520edeae2c226827e9fe0f89 PiperOrigin-RevId: 353298619
2021-01-21Change the default MulParams multiplier values to multiply by 1, not 0.bjacob
Multiplying by 0 by default is unfriendly to people getting familiar with ruy having to debug why their output values are all 0. With a default of 1, tiny toy examples might output sane values, anything beyond that will saturate, and seeing all saturated values will be a hint that something needs to be set to rescale values. Closes https://github.com/google/ruy/pull/248 COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/248 from bjacob:multiplier-default 3fb1152e899fffc1f9fa9103b533348599ca494f PiperOrigin-RevId: 353077204
2021-01-21Add basic gitignore (#246)Geoffrey Martin-Noble
2021-01-21Simplify cpuinfo build overlay (#247)Geoffrey Martin-Noble
2021-01-21Fixes for builds in open source projects with cpuinfo and googletest deps.Benoit Jacob
- Following XNNPACK's example, in CMakeLists.txt, skip including our own third_party/ directories if the target is already defined. This means that IREE embedding ruy as a third_party/ dep does not need to have its submodules checked out, ruy can use IREE's own cpuinfo and googletest. - Switch open-source builds to using the stripped-include-paths flavor of cpuinfo (like IREE is already using). PiperOrigin-RevId: 352871140
2021-01-21Update depgraphbjacob
- Switch to same colors as in ruy html traces - Move `:thread_pool` to its own yellow color for consistency with ruy traces - Drop `:validate` - Drop the legend, will be redundant in the context of markdown docs showing these different materials in the same context. preview: https://github.com/google/ruy/blob/84dd41f433b3befad6c711248a5d0f00fd8b2711/doc/depgraph.svg Closes https://github.com/google/ruy/pull/241 COPYBARA_INTEGRATE_REVIEW=https://github.com/google/ruy/pull/241 from bjacob:depgraph-update 8f2fa1d9a178c62b80fcc940c9d6ca5cf8ce3c41 PiperOrigin-RevId: 352858626
2021-01-20Revert "Revert "Add CMake support with a converter from Bazel""bjacob
Reverts google/ruy#243 Closes https://github.com/google/ruy/pull/244 PiperOrigin-RevId: 352711630